1 Introduction

The nonlinear complementarity problem, denoted by the \(\operatorname{NCP}(F)\), is to find a vector \(x\in\mathbb{R}^{n}\) such that

$$x\geq0,\qquad F(x)\geq0,\qquad x^{\top} F(x)=0, $$

where F is a mapping from \(\mathbb{R}^{n}\) into itself. The set of solutions to this problem is denoted by \(\operatorname{SOL}(F)\). Throughout this paper, we assume \(\operatorname{SOL}(F)\ne\emptyset\).

NCPs have various important applications in economics and engineering, such as Nash equilibrium problems, traffic equilibrium problems, contact mechanics problems, option pricing. Extensive studies of NCPs have been done; see [13] and the references therein. Numerical methods for solving NCPs, such as filter method, continuation method, nonsmooth Newton’s method, smoothing Newton methods, Levenberg-Marquardt method, projection method, descent method, interior-point method have been extensively investigated in the literature. However, it seems that there is a vacant study of sparse solutions for NCPs. In fact, in real applications, it is very necessary to investigate the sparse solution of the NCPs. For example this is so in bimatrix games [1] and portfolio selections [4]. For more details, see [5].

In this paper, we try to compute a sparse solution of the \(\operatorname{NCP}(F)\), which has the smallest number of nonzero entries. To be specific, we seek a vector \(x\in\mathbb{R}^{n}\) by solving the \(\ell_{0}\)-norm minimization problem

$$\begin{aligned} \min &\|x\|_{0} \\ \mbox{s.t. } &x\geq0, F(x)\geq0, x^{\top} F(x)=0, \end{aligned}$$
(1)

where \(\|x\|_{0}\) stands for the number of nonzero components of x. A solution of (1) is called the sparse solution of \(\operatorname{NCP}(F)\).

The above minimization problem (1) is in fact a sparse optimization [69] with equilibrium constraints. In the view of the objection function, the problem is \(\ell_{0}\)-norm minimization problem, which is combinatorial and generally NP-hard. From the point of view of constraint conditions, it is in fact a mathematical program with equilibrium constraints (MPEC) [1013]. It is not easy to get solutions due to the equilibrium constraints, even for a continuous objective function.

To overcome the difficulty for the \(\ell_{0}\)-norm, many researchers have suggested to relax the \(\ell_{0}\) norm and, instead, to consider the \(\ell_{1}\) norm; see [8, 9, 1417]. Motivated by this outstanding work, we consider applying \(\ell_{1}\) norm minimization to find the sparse solution of NCPs, and we obtain the following minimization problem to approximate (1):

$$\begin{aligned} \min\limits _{x\in\mathbb{R}^{n}}& \|x\|_{1} \\ \mbox{s.t.} &x\geq0, F(x)\geq0, x^{\top} F(x)=0, \end{aligned}$$
(2)

where \(\|x\|_{1}=\sum_{i=1}^{n} |x_{i}|\).

To overcome the difficulty for the complementarity constraint, we make use of the C-function \(\mathbf{F}_{\min} (x)\) to construct the penalty of violating the complementarity constraints. The C-function \(\mathbf{F}_{\min}\) associated with the ‘min’ function can be given by

$$\begin{aligned} \mathbf{F}_{\min}(x)=x-\Pi_{\mathbb{R}_{+}^{n}}\bigl(x- F(x) \bigr)\triangleq x-\bigl[x-F(x)\bigr]_{+}, \end{aligned}$$
(3)

where F is a mapping from \(\mathbb{R}^{n}\) into itself, and \(\Pi_{\mathbb{R}_{+}^{n}}\) is the Euclidean metric projector onto the nonnegative orthant.

It is well known [2] that solving \(\operatorname{NCP}(F)\) is equivalent to solving the fixed point equation \(\mathbf{F}_{\min}(x)=0\), that is,

$$\begin{aligned} x \in \operatorname{SOL}(F) \quad\Leftrightarrow\quad x= \bigl[x-F(x)\bigr]_{+}, \end{aligned}$$
(4)

where \([\cdot]_{+}\) is the Euclidean metric projector onto the nonnegative orthant.

Combining (2) and (4), by introducing a new variable \(z\in\mathbb{R}^{n}\), we obtain the following regularized minimization problem:

$$\begin{aligned} \min\limits _{x,z\in\mathbb{R}^{n}}&f_{\lambda}(x,z):=\|x-z\|^{2} +\lambda\|x \|_{1} \\ \mbox{s.t.} &z=\bigl[x- F(x)\bigr]_{+}, \end{aligned}$$
(5)

where \(\lambda>0\) is a regularization parameter and \(\|\cdot\|\) is denoted as the Euclidean norm. We call (5) the \(\ell_{1}\) regularized projection minimization problem.

This paper is organized as follows. In Section 2, we approximate (1) by the \(\ell_{1}\) regularization projection minimization problem (5), and we show theoretically that (5) is a good approximation. In Section 3, we propose an extragradient thresholding algorithm (ETA) for (5) and also analyze the convergence of this algorithm. Numerical results are demonstrated in Section 4 to show that (5) is promising in providing a sparse solution of co-coercive NCPs.

2 The \(\ell_{1}\) regularized approximation

In this section, we study the relation between the solutions of model (5) and those of model (2).

Theorem 2.1

For any fixed \(\lambda>0\), the solution set of (5) is nonempty and bounded. Let \(\{(\widehat{x}_{\lambda_{k}},\widehat{z}_{\lambda_{k}})\}\) be a solution of (5), and \(\{\lambda_{k}\}\) be any positive sequence converging to 0. If \(\operatorname{SOL}(F)\ne\emptyset\), then \(\{(\widehat {x}_{\lambda_{k}},\widehat{z}_{\lambda_{k}})\}\) has at least one accumulation point, and any accumulation point \(x^{*}\) of \(\{\widehat {x}_{\lambda_{k}}\}\) is a solution of (2).

Proof

For any fixed \(\lambda>0\), it is easy to show the coercivity of \(f_{\lambda}(x,z)\) in (5), namely

$$\begin{aligned} f_{\lambda} (x,z)\to+\infty \quad\mbox{as } \bigl\| (x,z)\bigr\| \to \infty. \end{aligned}$$
(6)

We also note that for any \(x\in\mathbb{R}^{n}\) and \(z\in\mathbb{R}^{n}\), \(f_{\lambda}(x,z)\geq0\). This together with (6) implies the level set

$$\begin{aligned} L= \bigl\{ (x,z)\in\mathbb{R}^{n}\times\mathbb{R}^{n} \mid f_{\lambda}(x,z)\le f_{\lambda}(x_{0},z_{0}) \mbox{ and } z=\bigl[x- F(x)\bigr]_{+} \bigr\} \end{aligned}$$

is nonempty and compact, where \(x_{0}\in\mathbb{R}^{n}\) and \(z_{0}=[x_{0}- F(x_{0})]_{+}\) are given points. The solution set of (5) is nonempty and bounded since \(f_{\lambda}(x,z)\) is continuous on L.

Now we show the second part of this theorem. Let \(\widehat{x}\in\operatorname{SOL}(F)\) and \(\widehat{z}= [\widehat{x}- F(\widehat{x})]_{+}\). From (5), we have \(\widehat{x}=\widehat{z}\). Since \((\widehat{x}_{\lambda_{k}}, \widehat{z}_{\lambda_{k}})\) is a solution of (5) with \(\lambda=\lambda_{k}\), where \(\widehat{z}_{\lambda _{k}}= [\widehat{x}_{\lambda_{k}}-F(\widehat{x}_{\lambda_{k}})]_{+}\), it follows that

$$\begin{aligned} \max \bigl\{ \|\widehat{x}_{\lambda_{k}}-\widehat{z}_{\lambda_{k}} \|^{2}, \lambda_{k} \|\widehat{x}_{\lambda_{k}} \|_{1} \bigr\} \le& \|\widehat{x}_{\lambda_{k}}-\widehat{z}_{\lambda_{k}} \|^{2}+ \lambda _{k} \|\widehat{x}_{\lambda_{k}} \|_{1} \\ \le& \|\widehat{x}-\widehat{z}\|^{2}+ \lambda_{k} \|\widehat{x}\|_{1} \\ =& \lambda_{k} \|\widehat{x}\|_{1}. \end{aligned}$$
(7)

From the above inequality, we derive that, for any \(\lambda_{k}>0\),

$$\begin{aligned} \|\widehat{x}_{\lambda_{k}}\|_{1} \le\|\widehat{x} \|_{1}. \end{aligned}$$
(8)

Hence the sequence \(\{\widehat{x}_{\lambda_{k}}\}\) is bounded and has at least one cluster point. Note that the sequence \(\{\widehat{z}_{\lambda _{k}}\}\) is also bounded because \(\|\widehat{x}_{\lambda_{k}}-\widehat {z}_{\lambda_{k}}\|^{2}\leq \lambda_{k}\|\widehat{x}\|_{1}\leq\lambda_{0}\| \widehat{x}\|_{1}\) (\(\lambda_{k}\to0\)).

Let \(x^{*}\) and \(z^{*} \) be any cluster points of \(\{\widehat{x}_{\lambda _{k}}\}\) and \(\{\widehat{z}_{\lambda_{k}}\}\), respectively. Then there exists a subsequence of \(\{\lambda_{k}\}\), say \(\{\lambda _{k_{j}}\}\), such that

$$\begin{aligned} \lim_{k_{j}\to\infty} \widehat{x}_{\lambda_{k_{j}}}=x^{*} \quad\mbox{and}\quad \lim_{k_{j}\to\infty} \widehat{z}_{\lambda_{k_{j}}}=z^{*}. \end{aligned}$$

We can obtain \(z^{*}=[x^{*}- F(x^{*})]_{+}\) by letting \(k_{j}\) tend to ∞ in \(z_{\lambda_{k_{j}}}= [x_{\lambda_{k_{j}}}-F(x_{\lambda_{k_{j}}})]_{+}\). Letting \(\lambda_{k_{j}}\) tend to 0 in

$$\begin{aligned} \|\widehat{x}_{\lambda_{k_{j}}}-\widehat{z}_{\lambda_{k_{j}}} \|^{2} \le \lambda_{k_{j}} \|\widehat{x}\|_{1} \end{aligned}$$

yields \(x^{*}=z^{*}\). Consequently, \(x^{*}=[ x^{*}- F(x^{*})]_{+}\) follows, which manifests \(x^{*}\in \operatorname{SOL}(F)\). From (8), namely \(\|\widehat{x}_{\lambda_{k_{j}}}\|_{1}\le\| \widehat{x}\|_{1}\), \(k_{j}\) tending to ∞, we get \(\|x^{*}\|_{1} \le\|\widehat{x}\|_{1}\). Then by the arbitrariness of \(\widehat{x}\in\operatorname{SOL}(F)\), we know \(x^{*}\) is a solution of problem (2). This completes the proof. □

3 Algorithm and convergence

In this section, we suggest the extragradient thresholding algorithm (ETA) to solve \(\ell_{1}\) regularization projection minimization problem (5) and give the convergence analysis of ETA.

First we state some basic operator concepts as regards monotonicity and some properties of the projection operator. Let \(P_{K}(\cdot)\) denote the projection operator from \(\mathbb{R}^{n}\) onto K, a nonempty closed convex subset of \(\mathbb{R}^{n}\). From the definition of projection operator, it follows that

$$\begin{aligned} \bigl\langle y-P_{K}(x), P_{K}(x)-x \bigr\rangle \geq0, \quad \forall y\in K, x\in\mathbb{R}^{n}. \end{aligned}$$
(9)

Consequently, we have

$$\begin{aligned} & \bigl\langle P_{K}(x)-P_{K}(y), x-y \bigr\rangle \geq\bigl\| P_{K}(x)-P_{K}(y)\bigr\| ^{2}, \quad \forall x, y\in\mathbb{R}^{n}, \end{aligned}$$
(10)
$$\begin{aligned} &\bigl\| P_{K}(x)-P_{K}(y)\bigr\| \leq \|x-y\|,\quad \forall x, y\in\mathbb{R}^{n}, \end{aligned}$$
(11)
$$\begin{aligned} &\bigl\| P_{K}(x)-y\bigr\| ^{2}\leq\|x-y\|^{2}- \bigl\| P_{K}(x)-x\bigr\| ^{2},\quad \forall y\in K, x\in \mathbb{R}^{n}. \end{aligned}$$
(12)

Lemma 3.1

[18]

Define a residue function

$$\begin{aligned} e(x,\alpha)=x-P_{K}\bigl[x-\alpha F(x)\bigr], \quad \alpha\geq0. \end{aligned}$$

The following statements are valid.

  1. (a)

    \(\forall\alpha>0\), \(F(x)^{\top}e(x,\alpha)\geq\frac {\|e(x,\alpha)\|^{2}}{\alpha}\);

  2. (b)

    for any \(\alpha>0\), \(\frac{\|e(x,\alpha)\|}{\alpha}\) is non-increasing;

  3. (c)

    for any \(\alpha\geq0\), \(\|e(x,\alpha)\|\) is non-decreasing.

In this paper, we suppose the mapping \(F:\mathbb{R}^{n}\rightarrow\mathbb {R}^{n}\) is co-coercive on a subset K of \(\mathbb{R}^{n}\). That is, there exists a constant \(c>0\) such that

$$\bigl\langle F(x)-F(y), x-y \bigr\rangle \geq c\bigl\| F(x)-F(y)\bigr\| ^{2}, \quad \forall x, y \in K. $$

It is clear that the co-coercive mapping is monotone, namely,

$$\begin{aligned} \bigl\langle F(x)-F(y), x-y \bigr\rangle \geq0,\quad \forall x, y \in K, \end{aligned}$$

but not necessarily strongly monotone, i.e., there is a constant \(c>0\) such that

$$\begin{aligned} \bigl\langle F(x)-F(y), x-y \bigr\rangle \geq c\|x-y\| ^{2}, \quad \forall x, y \in K. \end{aligned}$$

Remark 3.1

Every affine monotone function which is also symmetric must be co-coercive (on \(\mathbb{R}^{n}\)). The Euclidean projector \(P_{K}\) and \(I-P_{K}\) are both ‘co-coercive‘ functions [2, 19].

Lemma 3.2

Suppose that \(F(\cdot)\) is co-coercive on K with modulus \(c>0\). Then for any given positive real number α, when \(c>\alpha/2\), the operator \(I-\alpha F\) is nonexpansive, that is, for any \(x, y \in K\),

$$\bigl\| (I-\alpha F) (x)-(I-\alpha F) (y)\bigr\| \leq\|x-y\|. $$

Proof

For any \(x, y \in K\), when \(c>\alpha/2\), using the co-coercivity of F, it follows that

$$\begin{aligned} &\bigl\| (I-\alpha F) (x)-(I-\alpha F) (y)\bigr\| ^{2} \\ &\quad= \bigl\| (x-y)-\alpha\bigl[F(x)-F(y)\bigr]\bigr\| ^{2} \\ &\quad=\|x-y\|^{2}-2\alpha \bigl\langle x-y, F(x)-F(y) \bigr\rangle + \alpha ^{2}\bigl\| F(x)-F(y)\bigr\| ^{2} \\ &\quad\leq \|x-y\|^{2}-\alpha(2c-\alpha) \bigl\| F(x)-F(y)\bigr\| ^{2} \\ &\quad\leq\|x-y\|^{2}, \end{aligned}$$

which shows \(I-\alpha F\) is nonexpansive. □

For giving \(z^{k}\in\mathbb{R}^{n}_{+}\) and \(\lambda_{k}>0\), we consider an unconstrained minimization subproblem:

$$\begin{aligned} \min_{x\in\mathbb{R}^{n}} f_{\lambda_{k}} \bigl(x,z^{k}\bigr):=\bigl\| x-z^{k}\bigr\| ^{2} + \lambda_{k}\|x\|_{1}. \end{aligned}$$
(13)

Evidently, the minimizer \(x^{s}\) of the model (13) must satisfy the corresponding optimality condition

$$\begin{aligned} x^{s}=S_{\lambda_{k}}\bigl(z^{k}\bigr), \end{aligned}$$
(14)

where the shrinkage operator \(S_{\lambda}\) is defined by

$$\begin{aligned} \bigl(S_{\lambda}(z)\bigr)_{i}= \left \{ \begin{array}{@{}l@{\quad}l} z_{i}-\frac{\lambda}{2}, & z_{i}\geq\frac{\lambda}{2},\\ 0, & 0\leq z_{i}< \frac{\lambda}{2}. \end{array} \right . \end{aligned}$$
(15)

Evidently, the shrinkage operator \(S_{\lambda}\) is component-wise, i.e., \((S_{\lambda}(z))_{i}=S_{\lambda}(z_{i})\). Moreover, it is nonexpansive; i.e., \(\|S_{\lambda}(x)-S_{\lambda}(y)\|\leq\|x-y\|\), for any \(x, y\in \mathbb{R}_{+}^{n}\), see [20]. It demonstrates that a solution \(x\in\mathbb{R}^{n}\) of the subproblem (13) can be analytically expressed by (14).

By the solution representation, we construct the following extragradient thresholding algorithm (ETA) to solve the \(\ell_{1}\) regularized projection minimization problem (5).

  • Input: c-the co-coercive modulus of F.

  • Step 0: Choose \(0\ne z^{0}\in\mathbb{R}_{+}^{n}\), \(\lambda_{0},\beta>0\), \(\tau,\gamma,\mu\in(0,1)\), \(\beta\gamma<2c\), \(\epsilon>0\) and integers \(n_{\max}>K_{0}>0\). Set \(k=0\).

  • Step 1: Compute

    $$\begin{aligned}[b] &x^{k}=S_{\lambda_{k}} \bigl(z^{k} \bigr), \\ &y^{k}= \bigl[x^{k}-\alpha_{k}F \bigl(x^{k} \bigr) \bigr]_{+}, \end{aligned} $$

    where \(\alpha_{k}=\beta\gamma^{m_{k}}\) with \(m_{k}\) being the smallest nonnegative integer satisfying

    $$ \bigl\| F \bigl(x^{k} \bigr)-F \bigl(y^{k} \bigr) \bigr\| \leq\mu\frac{\|x^{k}-y^{k}\|}{\alpha_{k}}. $$
    (16)
  • Step 2: If \(\|x^{k}-z^{k}\|\le\epsilon\) or the number of iterations is greater than \(n_{\max}\), then return \(z^{k}\), \(x^{k}\), \(y^{k}\) and stop. Otherwise, compute

    $$\begin{aligned} z^{k+1}= \bigl[x^{k}-\alpha_{k}F \bigl(y^{k} \bigr) \bigr]_{+} \end{aligned}$$

    and update \(\lambda_{k+1}\) by

    $$\begin{aligned} \lambda_{k+1}=\left \{ \begin{array}{l@{\quad}l} \tau\lambda_{k}, & \mbox{if } k+1 \mbox{ is a multiple of } K_{0}, \\ \lambda_{k}, & \mbox{otherwise}, \end{array} \right . \end{aligned}$$

    and \(k=k+1\), then go to Step 1.

Before analyzing the convergence of ETA, we first present a key lemma as regards co-coercive mapping.

Lemma 3.3

Suppose that mapping F is co-coercive and \(\operatorname{SOL}(F)\neq\emptyset \). If \(x^{k}\) generated by ETA is not a solution of \(\operatorname{NCP}(F)\), then for any \(\widehat{x}\in\operatorname{SOL}(F)\), we have

$$\begin{aligned} \bigl\langle F\bigl(y^{k}\bigr), x^{k}- \widehat{x}\bigr\rangle \geq\bigl\langle F\bigl(y^{k}\bigr), x^{k}-y^{k}\bigr\rangle \geq(1-\mu)\frac{\|x^{k}-y^{k}\| ^{2}}{\beta}. \end{aligned}$$
(17)

Proof

For any \(\widehat{x}\in\operatorname{SOL}(F)\), we have \(F(\widehat{x})^{\top }\widehat{x}=0\). Since \(y^{k}\in\mathbb{R}^{n}_{+}\), it follows that \(\langle F(\widehat{x}), y^{k}-\widehat{x}\rangle\geq0\). It is clear that the co-coercive mapping is pseudo-monotone, that is,

$$\begin{aligned} \bigl\langle x-y, F(y) \bigr\rangle \geq0\quad\Rightarrow\quad \bigl\langle x-y, F(x) \bigr\rangle \geq0, \quad \forall x, y\in K \mbox{ and } x\neq y. \end{aligned}$$

By the definition of pseudo-monotonicity, it follows that \(\langle F(y^{k}), y^{k}-\widehat{x}\rangle\geq0\). Hence,

$$\begin{aligned} \bigl\langle F\bigl(y^{k}\bigr), x^{k}-\widehat{x}\bigr\rangle =&\bigl\langle F\bigl(y^{k}\bigr), x^{k}-y^{k}+y^{k}- \widehat{x}\bigr\rangle \\ \geq&\bigl\langle F\bigl(y^{k}\bigr), x^{k}-y^{k} \bigr\rangle \\ =&\bigl\langle F\bigl(x^{k}\bigr), x^{k}-y^{k} \bigr\rangle -\bigl\langle F\bigl(x^{k}\bigr)-F\bigl(y^{k} \bigr), x^{k}-y^{k}\bigr\rangle \\ \geq& \frac{1}{\alpha_{k}}\bigl\| x^{k}-y^{k}\bigr\| ^{2}- \frac{\mu}{\alpha_{k}}\bigl\| x^{k}-y^{k}\bigr\| ^{2} \\ \geq& \frac{1-\mu}{\beta}\bigl\| x^{k}-y^{k}\bigr\| ^{2}, \end{aligned}$$

where the last inequality but one follows from Lemma 3.1 and (16). □

We now begin to analyze the convergence of the proposed ETA.

Theorem 3.1

Suppose that the mapping F is co-coercive with modulus \(c>\beta\gamma /2\) and \(\operatorname{SOL}(F)\neq\emptyset\). Let \(\{(z^{k}, x^{k}, y^{k})\}\) and \(\{\lambda_{k}\}\) be sequences generated by ETA, then

  1. (i)

    the sequences \(\{z^{k}\}\), \(\{x^{k}\}\), and \(\{y^{k}\}\) are all bounded;

  2. (ii)

    any cluster point of the sequence \(\{x^{k}\}\) is a solution of \(\operatorname{NCP}(F)\).

Proof

(i) Let \(\widehat{x}\in\operatorname{SOL}(F)\). By the definition (15) of operator \(S_{\lambda}\), we have

$$\begin{aligned} \bigl\| x^{k}-\widehat{x}\bigr\| =\bigl\| S_{\lambda} \bigl(z^{k}\bigr)-\widehat{x}\bigr\| \leq\bigl\| z^{k}-\widehat {x}\bigr\| + \sqrt{n}\lambda_{k}/{2}\leq\bigl\| z^{k}-\widehat{x}\bigr\| +\sqrt{n} \lambda_{0}/{2}. \end{aligned}$$
(18)

In view of \(\widehat{x}\in\operatorname{SOL}(F)\), we have \(\widehat{x}=[\widehat {x}-\alpha_{k}F(\widehat{x})]_{+}\). Since \(c>\beta\gamma/2>\alpha_{k}/2\), by Lemma 3.2, we see that \(I-\alpha_{k} F\) is nonexpansive. Together with the nonexpansive property of the projection operator, it follows that

$$\begin{aligned} \bigl\| y^{k}-\widehat{x}\bigr\| =&\bigl\| \bigl[x^{k}- \alpha_{k}F\bigl(x^{k}\bigr)\bigr]_{+} -\widehat{x}\bigr\| \\ =&\bigl\| \bigl[x^{k}-\alpha_{k}F\bigl(x^{k}\bigr) \bigr]_{+}-\bigl[\widehat{x}-\alpha_{k}F(\widehat {x})\bigr]_{+}\bigr\| \\ \leq&\bigl\| (I-\alpha_{k} F) \bigl(x^{k}-\widehat{x}\bigr)\bigr\| \\ \leq& \bigl\| x^{k}-\widehat{x}\bigr\| \\ \leq&\bigl\| z^{k}-\widehat{x}\bigr\| +\sqrt{n}\lambda_{k}/{2} \\ \leq&\bigl\| z^{k}-\widehat{x}\bigr\| +\sqrt{n}\lambda_{0}/{2}. \end{aligned}$$
(19)

From (12) and (17), we obtain

$$\begin{aligned} \bigl\| z^{k+1}-\widehat{x}\bigr\| ^{2} =&\bigl\| \bigl[x^{k}-\alpha_{k}F\bigl(y^{k}\bigr)\bigr]_{+}- \widehat{x}\bigr\| ^{2} \\ \leq&\bigl\| x^{k}-\alpha_{k}F\bigl(y^{k}\bigr)- \widehat{x}\bigr\| ^{2}-\bigl\| z^{k+1}-x^{k}+ \alpha_{k} F\bigl(y^{k}\bigr)\bigr\| ^{2} \\ =& \bigl\| x^{k}-\widehat{x}\bigr\| ^{2}-2\alpha_{k} \bigl\langle F\bigl(y^{k}\bigr), z^{k+1}-\widehat {x} \bigr\rangle - \bigl\| z^{k+1}-x^{k}\bigr\| ^{2} \\ \leq& \bigl\| x^{k}-\widehat{x}\bigr\| ^{2}-2\alpha_{k} \bigl\langle F\bigl(y^{k}\bigr), z^{k+1}-y^{k} \bigr\rangle -\bigl\| z^{k+1}-x^{k}\bigr\| ^{2} \\ =& \bigl\| x^{k}-\widehat{x}\bigr\| ^{2}-\bigl\| z^{k+1}-y^{k} \bigr\| ^{2}-\|y^{k}-x^{k}\|^{2} \\ &{}+2\bigl\langle x^{k}-y^{k}-\alpha_{k} F \bigl(y^{k}\bigr), z^{k+1}-y^{k} \bigr\rangle . \end{aligned}$$
(20)

By \(y^{k}=[x^{k}-\alpha_{k}F(x^{k})]_{+}\) and (9), it follows that

$$\begin{aligned} &2 \bigl\langle x^{k}-y^{k}- \alpha_{k} F\bigl(y^{k}\bigr), z^{k+1}-y^{k} \bigr\rangle \\ &\quad\leq2 \bigl\langle x^{k}-y^{k}-\alpha_{k} F \bigl(y^{k}\bigr), z^{k+1}-y^{k} \bigr\rangle +2 \bigl\langle y^{k}-x^{k}+\alpha_{k} F \bigl(x^{k}\bigr), z^{k+1}-y^{k} \bigr\rangle \\ &\quad= 2\alpha_{k} \bigl\langle F\bigl(x^{k}\bigr)-F \bigl(y^{k}\bigr), z^{k+1}-y^{k} \bigr\rangle \\ &\quad\leq \alpha_{k}^{2}\bigl\| F\bigl(x^{k}\bigr)-F \bigl(y^{k}\bigr)\bigr\| ^{2}+\bigl\| z^{k+1}-y^{k} \bigr\| ^{2}. \end{aligned}$$
(21)

Replacing (21) into (20), by (16) and (18), we deduce

$$\begin{aligned} &\bigl\| z^{k+1}-\widehat{x}\bigr\| ^{2} \\ &\quad\leq \bigl\| x^{k}-\widehat{x}\bigr\| ^{2}-\bigl\| y^{k}-x^{k} \bigr\| ^{2}+\alpha_{k}^{2}\bigl\| F\bigl(x^{k} \bigr)-F\bigl(y^{k}\bigr)\bigr\| ^{2} \\ &\quad\leq \bigl\| x^{k}-\widehat{x}\bigr\| ^{2}-\bigl\| y^{k}-x^{k} \bigr\| ^{2}+\mu^{2}\bigl\| x^{k}-y^{k} \bigr\| ^{2} \\ &\quad= \bigl\| x^{k}-\widehat{x}\bigr\| ^{2}-\bigl(1-\mu^{2} \bigr)\bigl\| y^{k}-x^{k}\bigr\| ^{2} \\ &\quad\leq\bigl\| x^{k}-\widehat{x}\bigr\| ^{2}. \end{aligned}$$
(22)

Hence, by definition of \(\lambda_{k}\), it follows that

$$\begin{aligned} \bigl\| z^{k+1}-\widehat{x}\bigr\| \leq& \bigl\| x^{k}- \widehat{x}\bigr\| \leq \bigl\| z^{k}-\widehat{x}\bigr\| +\frac{\sqrt{n}}{2}\lambda_{k} \leq \bigl\| x^{k-1}-\widehat{x}\bigr\| +\frac{\sqrt{n}}{2}\lambda_{k} \\ \leq& \bigl\| z^{k-1}-\widehat{x}\bigr\| +\frac{\sqrt{n}}{2}(\lambda_{k}+ \lambda _{k-1})\leq \cdots \\ \leq& \bigl\| z^{0}-\widehat{x}\bigr\| +\frac{\sqrt{n}}{2}\sum _{i=0}^{k}\lambda _{i} \\ \leq& \bigl\| z^{0}-\widehat{x}\bigr\| +\frac{\sqrt{n}}{2}\frac{\lambda _{0}K_{0}}{1-\tau}:=C, \end{aligned}$$
(23)

which shows \(\{z^{k}\}\) is bounded. Together with (18) and (19), we see that \(\{x^{k}\}\) and \(\{y^{k}\}\) are both bounded.

(ii) Now we prove \(\lim_{k\rightarrow\infty}\|x^{k}-y^{k}\|=0\). By (22) and (18), it follows that

$$\begin{aligned} \bigl(1-\mu^{2}\bigr)\bigl\| y^{k}-x^{k} \bigr\| ^{2} \leq&\bigl\| x^{k}-\widehat{x}\bigr\| ^{2}- \bigl\| z^{k+1}-\widehat {x}\bigr\| ^{2} \\ \leq&\bigl\| x^{k}-\widehat{x}\bigr\| ^{2}- \bigl(\bigl\| x^{k+1}- \widehat{x}\bigr\| -\sqrt {n}\lambda_{k+1}/{2} \bigr)^{2} \\ =&\bigl\| x^{k}-\widehat{x}\bigr\| ^{2}-\bigl\| x^{k+1}-\widehat{x} \bigr\| ^{2}+\sqrt{n}\lambda _{k+1}\bigl\| x^{k+1}-\widehat{x}\bigr\| -n \lambda_{k+1}^{2}/4 \\ \leq& \bigl\| x^{k}-\widehat{x}\bigr\| ^{2}-\bigl\| x^{k+1}- \widehat{x}\bigr\| ^{2}+\sqrt{n}\lambda _{k+1}\bigl\| x^{k+1}- \widehat{x}\bigr\| , \end{aligned}$$

which leads to the following inequality:

$$\begin{aligned} \bigl(1-\mu^{2}\bigr)\sum_{k=0}^{\infty} \bigl\| y^{k}-x^{k}\bigr\| ^{2} \leq& \sum _{k=0}^{\infty} \bigl(\bigl\| x^{k}-\widehat{x} \bigr\| ^{2}-\bigl\| x^{k+1}-\widehat{x}\bigr\| ^{2}+\sqrt{n} \lambda_{k+1}\bigl\| x^{k+1}-\widehat{x}\bigr\| \bigr) \\ \leq& \bigl\| x^{0}-\widehat{x}\bigr\| ^{2}+\sqrt{n} \sum _{k=0}^{\infty}\lambda _{k+1}\bigl\| x^{k+1}- \widehat{x}\bigr\| \\ \leq& \bigl\| x^{0}-\widehat{x}\bigr\| ^{2}+\sqrt{n}C \sum _{k=0}^{\infty}\lambda _{k+1} \\ =& \bigl\| x^{0}-\widehat{x}\bigr\| ^{2}+\sqrt{n}C \frac{\lambda_{0}K_{0}}{1-\tau}< + \infty, \end{aligned}$$

where the third inequality holds from (23), and thus we have \(\lim_{k\rightarrow\infty}\|x^{k}-y^{k}\|=0\).

Since \(\{x^{k}\}\) is bounded, \(\{x^{k}\}\) has at least one cluster point. Let \(x^{*}\) be a cluster point of \(\{x^{k}\}\) and a subsequence \(\{ x^{k_{j}}\}\) converge to \(x^{*}\). Next we will show \(x^{*}\in\operatorname{SOL}(F)\).

If there is a positive low bounded \(\alpha_{\min}\) such that \(\alpha _{k_{i}}\geq\alpha_{\min} >0\), from Lemma 3.1(b) and (c), we get

$$\begin{aligned} \min\{1, \alpha\}\bigl\| e(x,1)\bigr\| \leq\bigl\| e(x,\alpha)\bigr\| \leq\max\{1,\alpha \}\bigl\| e(x,1)\bigr\| , \end{aligned}$$
(24)

where \(e(x,\alpha)=x-[x-\alpha F(x)]_{+}\). Together with the continuity of \(e(x,\alpha)\) for x and \(\lim_{k\rightarrow\infty}\|x^{k}-y^{k}\|=0\), we have

$$ \begin{aligned}[b] \bigl\| e\bigl(x^{*}, 1\bigr)\bigr\| &=\lim_{k_{i}\rightarrow\infty}\bigl\| e\bigl(x^{k_{i}}, 1\bigr)\bigr\| \leq\lim_{k_{i}\rightarrow\infty} \frac{\|e(x^{k_{i}}, \alpha_{k_{i}})\|}{\min\{1, \alpha_{k_{i}}\}}\\ &\leq \lim_{k_{i}\rightarrow\infty}\frac{\|e(x^{k_{i}}, \alpha _{k_{i}})\|}{\min\{1, \alpha_{\min}\}} =\lim_{k_{i}\rightarrow\infty} \frac{\|x^{k_{i}}-y^{k_{i}}\|}{\min\{ 1, \alpha_{\min}\}}=0. \end{aligned} $$
(25)

If \(\lim_{k_{i}\rightarrow\infty}\alpha_{k_{i}}=0\), for enough large \(k_{i}\), by Lemma 3.1(b) and (16), we get

$$\begin{aligned} \bigl\| e\bigl(x^{k_{i}}, 1\bigr)\bigr\| \leq\frac{\|e(x^{k_{i}}, \frac{1}{\beta}\alpha_{k_{i}})\| }{\frac{1}{\beta}\alpha_{k_{i}}}< \frac{1}{\mu}\bigl\| F \bigl(x^{k_{i}}\bigr)-F\bigl(\overline {y}^{k_{i}}\bigr)\bigr\| , \end{aligned}$$
(26)

where \(\overline{y}^{k_{i}}=[x^{k_{i}}-\frac{1}{\beta}\alpha_{k_{i}}F(x^{k_{i}})]_{+}\). Taking the limit in the above inequality, we have

$$\begin{aligned} \bigl\| e\bigl(x^{*}, 1\bigr)\bigr\| =\lim_{k_{i}\rightarrow\infty}\bigl\| e\bigl(x^{k_{i}}, 1 \bigr)\bigr\| \leq \lim_{k_{i}\rightarrow\infty}\frac{1}{\mu}\bigl\| F \bigl(x^{k_{i}}\bigr)-F\bigl(\overline {y}^{k_{i}}\bigr)\bigr\| =0. \end{aligned}$$
(27)

It means \(x^{*}=[x^{*} -F(x^{*})]_{+}\). Hence we get \(x^{*}\in\operatorname{SOL}(F)\). The proof is thus complete. □

4 Numerical experiments

In this section, we present some numerical experiments to demonstrate the effectiveness of our ETA algorithm. All the numerical experiments were performed on a laptop (2.50GHz, 6.00GB of RAM) by utilizing MATLAB R2013a.

We will stimulate three examples to implement the ETA algorithm. They will be ran 100 times for difference dimensions, and thus average results will be recorded. In each experiment, we set \(z^{0}=e\), \(\beta =2c\), \(\gamma=0.1\), \(\mu=1/c\), \(n_{\max}=2{,}000\).

4.1 Test for LCPs with Z-matrix [5]

The test is associated with the Z-matrix which has an important property, that is, there is a unique sparse solution of LCPs when M is a kind of Z-matrix [5]. Let us consider \(\operatorname{LCP}(q,M)\) where

$$M=I_{n}-\frac{1}{n}ee^{\top} = \begin{pmatrix} 1-\frac{1}{n} &-\frac{1}{n} &\cdots&-\frac{1}{n} \\ -\frac{1}{n} &1-\frac{1}{n} &\cdots&-\frac{1}{n} \\ \vdots &\vdots&\ddots&\vdots\\ -\frac{1}{n} &-\frac{1}{n} &\cdots&1-\frac{1}{n} \end{pmatrix} \quad \mbox{and}\quad q= \begin{pmatrix} \frac{1}{n}-1\\ \frac{1}{n}\\ \vdots\\ \frac{1}{n} \end{pmatrix}. $$

Here \(I_{n}\) is the identity matrix of order n and \(e=(1,1,\ldots ,1)^{\top}\in\mathbb{R}^{n}\). Such a matrix M is widely used in statistics. It is clear that M is a positive semidefinite Z-matrix. For any scalar \(a\ge0\), we know that the vector \(x=a e+e_{1}\) is a solution to \(\operatorname{LCP}(q,M)\), since it satisfies

$$x\ge0, \qquad Mx+q=Me_{1}+q=0, \qquad x^{\top} (Mx+q)=0. $$

Among all the solutions, the vector \(\hat{x}=e_{1}=(1,0,\ldots,0)^{\top}\) is the unique sparse solution.

We choose \(z^{0}=e\), \(c=1\), \(\lambda_{0}=0.2\), \(\beta=2c\), \(\tau=0.75\), \(\gamma =0.1\), \(\mu=1/c\), \(\epsilon=1e-6\), \(n_{\max}=2{,}000\), \(K_{0}=5\). We will take advantage of the recovery error \(\|x-\hat{x}\|\) to evaluate our algorithm. Apart from that, the average cpu time (in seconds), the average number of iteration times and the residual \(\|x-z\|\) will also be taken into consideration on judging the performance of the method.

As indicated in Table 1, the ETA algorithm behaves very robust because the average number of times of iteration is identically equal to 205; the recovered error \(\|x-\hat{x}\|\) and residual \(\| x-z\|\) are basically similar. In addition, the sparsity \(\|x\|_{0}\) of the recovered solution x is in all cases 1s, which means the recover is successful. Most importantly, the ETA algorithm is exceptionally fast, which results in only 36.70 seconds being needed to address the problem with dimension \(n=10{,}000\).

Table 1 ETA’s computational results on LCPs with Z -matrices.

In order to illustrate the effectiveness of the ETA algorithm we proposed, we introduce another method of tackling the LCPs. In [5], the authors established an \(l_{p}\) (\(0< p<1\)) regularized minimization model:

$$\begin{aligned} \min_{x\in R^{n}} f(x):= \frac {1}{2}\bigl\| \Phi_{FB}(x)\bigr\| ^{2}+\lambda\|x\|_{p}^{p} \end{aligned}$$
(28)

and designed a sequential smoothing gradient (SSG) method to solve the \(l_{p}\) regularized model and get a sparse solution of \(\operatorname{LCP}(q,M)\). The results are displayed in Table 2.

Table 2 SSG’s computational results on LCPs with Z -matrices.

It can be discerned in Table 2, where ‘- -’ denotes the method is invalid. Although the sparsity \(\|x\|_{0}\) of the recovered solution is in all cases as large as 1 and the recovered errors \(\|x-\hat{x}\|\) are pretty small, the average cpu time dramatically ascends with the matrix dimension n, which manifests that SSG method for LCPs is appropriate for the small dimensional data set and thus SSG will not be appealing when n is relatively large. Contrasted with the SSG method, the ETA algorithm is more outstanding in the cpu time and the size of the solvable problems.

4.2 Test for LCPs with positive semidefinite matrices

In this subsection, we test ETA for randomly created LCPs with positive semidefinite matrices. First, we state the way of constructing LCPs and their solutions. Let a matrix \(Z\in\mathbb{R}^{n\times r} \) (\(r< n\)) be generated with the standard normal distribution and \(M=ZZ^{\top}\). Let the sparse vector \(\hat{x}\) be produced by choosing randomly the \(s=0.01 * n\) nonzero components whose values are also randomly generated from a standard normal distribution. After the matrix M and the sparse vector \(\hat{x}\) have been generated, a vector \(q\in\mathbb {R}^{n}\) can be constructed such that \(\hat{x}\) is a solution of the \(\operatorname{LCP}(q,M)\). Then \(\hat{x}\) can be regarded as a sparse solution of the \(\operatorname{LCP}(q,M)\). Namely,

$$\hat{x}\geq0,\qquad M\hat{x}+q\geq0, \qquad\hat{x}^{\top}(M\hat{x}+q)=0, \quad \mbox{and}\quad \|\hat{x}\|_{0}=0.01*n. $$

To be more specific, if \(\hat{x}_{i}>0\) then choose \(q_{i}=-(M\hat{x})_{i}\), if \(\hat{x}_{i}=0\) then choose \(q_{i}=|(M\hat{x})_{i}|-(M\hat{x})_{i}\). Let M and q be the input to our ETA algorithm and take \(z^{0}=e\), \(c=\max(\operatorname{svd}(M))\), \(\lambda_{0}=0.02\), \(\beta=2c\), \(\tau=0.75\), \(\gamma=0.1\), \(\mu =1/c\), \(\epsilon=1e-10\), \(n_{\max}=2{,}000\), \(K_{0}=\max(2, \mathrm{floor} (10{,}000/n))\). Then ETA will output a solution x. Similarly, the residual \(\|x-\hat {x}\|\), average cpu time (in seconds), the average number of iteration times, and the residual \(\|x-z\|\) will also be taken into consideration on valuating our ETA algorithm.

As manifested in Table 3, the ETA algorithm performs quite efficiently. Furthermore, the sparsity \(\|x\|_{0}\) of recovered solution x is in all cases equal to the sparsity \(\|\hat{x}\|_{0}\), which means the recover is exact. Likewise, the ETA algorithm is exceptionally fast in this example, which makes that only 46.67 seconds are needed to pursue the sparse solution of LCP when the dimension \(n=7{,}000\).

Table 3 Results on randomly created LCPs with positive semidefinite matrices.

4.3 Test for co-coercive nonlinear complementarity problem

We now consider a co-coercive nonlinear complementarity problems (NCP) with

$$\begin{aligned} F(x)=D(x)+Mx+q, \end{aligned}$$
(29)

where \(D(x)\) and \(Mx+q\) are the nonlinear part and the linear part of \(F(x)\), respectively. We form \(F(x)\) similarly as in [21, 22]. The matrix \(M=A^{\top}A+B\), where A is an \(n\times n\) matrix whose entries are randomly generated in the interval \((-5, 5)\), and a skew-symmetric matrix B is generated in the same way. In \(D(x)\), the nonlinear part of \(F(x)\), the components are

$$D_{j}(x)=a_{j}*\arctan(x_{j}) $$

and \(a_{j}\) is a random variable in \((-1,0)\). Then the sequent part of generating the sparse vector \(\hat{x}\) and vector \(q\in\mathbb{R}^{n}\) such that

$$\hat{x}\geq0,\qquad F(\hat{x})\geq0, \qquad\hat{x}^{\top}F(\hat{x})=0, \quad\mbox{and}\quad \|\hat{x}\|_{0}=0.01*n $$

is similar to the procedure of Section 4.2. Let M and q be the input to our ETA algorithm and take \(z^{0}=e\), \(c=150*\log(n)\), \(\lambda_{0}=0.2\), \(\beta=2c\), \(\tau =0.75\), \(\gamma=0.1\), \(\mu=1/c\), \(\epsilon=1e-6\), \(n_{\max}=2{,}000\), \(K_{0}=\max(2, \mathrm{floor} (10{,}000/n))\), and \(a=-\operatorname{rand}(n,1)\). Then ETA will output a solution x. Similarly, the average number of iteration times, the average residual \(\|x-z\|\), the average sparsity \(\|x\|_{0}\) of x, and the average cpu time (in seconds) will also be taken into consideration on valuating our ETA algorithm.

It is not difficult to see from Table 4 that the ETA algorithm also performs quite efficiently in such nonlinear complementarity problems. The sparsity \(\|x\|_{0}\) of the recovered solution x are all equal to the sparsity \(\|\hat{x}\|_{0}\), that is, the recover is exact. What is also striking is that the ETA algorithm is exceptionally fast in this example as well, with only 144.99 seconds being needed to tackle the sparse NCP when the dimension \(n=10{,}000\).

Table 4 Results on co-coercive nonlinear complementarity problems.

5 Conclusions

In this paper, we concentrate on finding sparse solutions of co-coercive nonlinear complementarity problems (NCPs). An \(\ell_{1}\) regularized projection minimization model is proposed for relaxation, and an extragradient thresholding algorithm (ETA) is then designed for this regularized model. Furthermore, we analyze the convergence of this algorithm and show any cluster point of the sequence generated by ETA is a solution of NCP. Preliminary numerical results indicate that the \(\ell_{1}\) regularized model as well as the ETA are promising to find sparse solutions of NCPs.