1 Introduction

We study connections between solvability of nonlinear cone-complementarity problems [1] and testing copositivity of operators with respect to cones \({{\mathcal {K}}}\). Cone-complementarity problems play a significant role in a multitude of applications spanning physics, mechanics, economics, game theory, robotics, optimization, and neural networks [2,3,4,5,6,7,8,9,10,11]. In this paper all cones will be assumed to be closed and convex sets. Danninger [12] used the concept of \(\mathcal {K}\)-copositivity and we follow this terminology. Some existence theorems regarding complementarity problems can be traced back to optimizing a quadratic function on the intersection of the sphere and a cone, see [13,14,15]. This can also be connected to the \(\mathcal {K}\)-copositivity of operators, where \(\mathcal {K}\) is a cone. An operator A is said to be \(\mathcal {K}\)-copositive if the quadratic function associated to it is nonnegative on \(\mathcal {K}\). Determining whether an operator A is copositive (with respect to the nonnegative orthant) is known to be co-NP complete, see [16]. Testing \(\mathcal {K}\)-copositivity with respect to a cone \(\mathcal {K}\) has also been analysed, albeit using different terminology. For example, Eichfelder and Jahn [17] used set-semidefiniteness instead of cone-copositivity. They showed that testing \(\mathcal {K}\)-semidefiniteness with respect to a polyhedral cone can be reduced to copositivity with respect to the nonnegative orthant. Loewy and Schneider [18] characterized testing copositivity of a matrix A with respect to the Lorentz cone. As quadratic programming problems can be related to complementarity via the KKT conditions, Gowda [19] used the notion of copositivity to investigate complementarity problems. It should be mentioned that algorithms for testing copositivity of matrices are based on simplicial decomposition [20, 21], polynomial programming [22] and finite branching for non-convex quadratic programming [23]. Dickinson [24] presented a simple methodology for testing copositivity based on the linear algebra literature. It turned out that recognizing copositivity of matrices can be traced back to mixed-integer linear programming [25] or linear complementarity problems, see [26].

The minimization of a smooth function on a subset of the sphere or, more generally, on a smooth manifold, holds significant practical value. Papers addressing this problem, including those cited in [27, 28], explore various approaches to tackle this challenge. One of the first algorithms that comes to mind for solving this problem is the gradient projection method, known for its effectiveness in minimizing smooth functions on a subset of the Euclidean space. Balashov et al. [29] proposed a different (non-intrinsic) version of the gradient projection method to solve this problem and obtained results that guarantee convergence of the method under some minimal natural assumptions. Balashov and Kamalov [30] considered the problem of minimizing a function with a Lipschitz continuous gradient on a proximally smooth subset that is a smooth manifold without boundary. They proposed a gradient projection method with Armijo’s step size and proved its linear convergence. Bergmann and Herzog [31] studied nonlinear optimization problems on a smooth manifold and proposed an intrinsic gradient projection method to solve a specific instance of a constrained optimization problem on the sphere. However, they have not studied the convergence properties of this method. Our aim is to present convergence results, too.

In this paper we propose two intrinsic versions of a gradient projection method, namely, with constant and Armijo step sizes, to solve constrained problems on the sphere in a real finite dimensional vector space with a positive definite inner product. We work in general Euclidean vector spaces in order to consider a unified description of different cones. It is noteworthy that the gradient projection method can also be used to test copositivity of operators with respect to cones \(\mathcal {K} \subseteq \mathbb {V}\) and thus solvability of nonlinear \(\mathcal {K}\)-complementarity problems, too. It should be noted that if the algorithm returns a negative value, then the operator is not copositive with respect to the cone. However, the algorithm cannot say for sure if the operator is copositive with respect to a cone. To handle this problem, one can apply several techniques such as using different initializations, and performing multiple runs of the optimization algorithm, to increase the likelihood that the gradient descent method finds a global minimizer. In our numerical results we applied this technique, and we ran the algorithm several times with different starting points. The convergence analysis of the gradient projection method on the sphere is also presented. To our best knowledge this is the first method which can be used to test copositivity of operators with respect to cones \(\mathcal {K}\).

The paper is organized as follows. In Sect. 2 the connection between \(\mathcal {K}\)-complementarity problems and testing \(\mathcal {K}\)-copositivity is presented. Note that we work in Euclidean vector spaces in order to consider a unified description of different cones, including the case when \(\mathcal {K}\) is the cone of positive semidefinite matrices. In Sect. 3 we show different examples for \(\mathcal {K}\)-copositivity. In particular, we deal with the case when the cone is the nonnegative orthant. We also give a characterization of copositivity with respect to the Lorentz cone. Section 4 contains the new gradient projection method on the sphere. In Sect. 4.1 we present the basic results related to this topic that will be used later in the analysis of the method. Section 4.2 deals with the projection onto closed spherically convex set. Section 4.2.1 gives several properties on the projection onto a closed spherically convex set. In Sect. 4.3 the gradient projection method on the sphere for solving a general constrained optimization problem is analyzed. Section 5 contains a new variant of the method and several special cases for the nonnegative orthant, the Lorentz cone and the cone of positive semidefinite matrices, respectively. In Sect. 5.1 several numerical results are presented to show how the gradient projection method on the sphere can be applied to detect \(\mathcal {K}\)-copositivity of operators. In Sect. 6 some concluding remarks and future research plans are enumerated.

1.1 Notations

Let \(\mathbb {V}\) be a finite n-dimensional real vector space together with the positive definite inner product \(\langle \cdot , \cdot \rangle : \mathbb {V} \times \mathbb {V} \rightarrow \mathbb {R}\). The norm \(\Vert \cdot \Vert \) generated by the inner product is defined by

$$\begin{aligned} \Vert x\Vert {:=}\sqrt{\langle x,x \rangle }. \end{aligned}$$
(1)

Consider n unit vectors \(e^1, \dots , e^n\) that form an orthonormal system of vectors in the sense that \(\langle e^i, e^j \rangle = \delta _i^j\), where \(\delta _i^j\) is the Kronecker symbol. Then, \(e^1, \ldots , e^n\) form a basis of the vector space \(\mathbb {V}\). If we want to emphasise that \(\mathbb {V}\) is n-dimensional, then we write \(\mathbb {V}^{n}\) instead of \(\mathbb {V}\). If \(x \in \mathbb {V}\), then \(x = x_1e^1 + \ldots x_n e^n\) can be characterized by the coordinates \(x_1, \ldots , x_n\) of x with respect to the given system. We can also write \(x=(x_1, \ldots , x_n)\). Then, \(e^i = (0, \ldots , 0, 1, 0, \ldots , 0)\), with 1 in the i-th position. Let \(x, y \in \mathbb {V}\), \(x=(x_1,\ldots , x_n)\) and \(y=(y_1,\ldots ,y_n)\). Then, the inner product of x and y is \(\langle x, y \rangle = \sum _{i=1}^n x_iy_i\). Although the notion of cones can be introduced for more general sets, in this paper we use it only for closed convex sets. A closed convex set \(\mathcal {K} \subseteq \mathbb {V}\) is called cone if and only if for any \(x\in \mathcal {K}\) and any \(\lambda > 0\), it holds that \(\lambda x \in \mathcal {K}\). A cone \(\mathcal {K}\) is called pointed if \(\mathcal {K} \cap -\mathcal {K} =\{0\}\). The dual of a cone \(\mathcal {K}\) is the cone defined by

$$\begin{aligned} \mathcal {K}^* {:=} \{y \in \mathbb {V}:\langle x, y\rangle \ge 0, \ \forall x\in \mathcal {K}\}. \end{aligned}$$

Denote the nonnegative orthant by \(\mathbb {R}^{n}_+{:=}\{x\in \mathbb {R}^{n}: x\ge 0\}\). For any \(n>1\), the Lorentz cone in the Euclidean space \({\mathbb {R}}^{n+1}=\mathbb {R}^n\times {\mathbb {R}}\) is defined as:

$$\begin{aligned} \mathcal {L}^n{:=}\left\{ (x,t)^{\top }\in \mathbb {R}^{n}\times \mathbb {R}:\Vert x\Vert \le t\right\} . \end{aligned}$$

By denoting \(J{:=}\text {diag}(-1,-1,\ldots ,-1,1)\) an \(n\times n\) diagonal matrix, the Lorentz cone can also be written as

$$\begin{aligned} \mathcal {L}^n=\left\{ x\in \mathbb {R}^{n+1}:x^{\top }Jx\ge 0 \text { and }x_{n+1}\ge 0 \right\} . \end{aligned}$$

Denote the set of \(n\times n\) symmetric matrices by \(\mathcal {S}^{n}\) and the cone of positive semidefinite matrices by \( \mathcal {S}^n_+\), where

$$\begin{aligned} \mathcal {S}^n_+=\{p \in \mathcal {S}^n:~ p \succeq 0 \} \end{aligned}$$

and \(p\succeq 0\) is the standard notation for the positive semidefiniteness of the matrix p. For \(\alpha \in \mathbb {R}\) denote \(\alpha ^+={\text {max}}(\alpha ,0)\) and for a vector \(z\in \mathbb {R}^{n}\) denote \(z^+{:=}(z_1^+,\dots ,z_{n}^+)\).

2 Complementarity problems

In this section we present a relationship between complementarity problems with an optimization problem constrained to a suitable subset of the sphere. For that let us first recall some concepts.

The inversion of mapping \(F:\mathbb {V} \rightarrow \mathbb {V}\) is the mapping \(\mathcal {I}(F):\mathbb {V} \rightarrow \mathbb {V}\) such that

$$\begin{aligned} \mathcal {I}(F)(x){:=} {\left\{ \begin{array}{ll} \displaystyle \Vert x\Vert ^2F\left( \frac{x}{\Vert x\Vert ^2}\right) &{} \text { if }x\ne 0, \\ 0 &{} \text { if }x=0. \end{array}\right. } \end{aligned}$$

The following theorem has been introduced by Isac and Németh in [14].

Theorem 1

Let \(\mathcal {K}\subset \mathbb {V}\) be a cone and \(F: \mathbb {V}\rightarrow \mathbb {V}\) be a mapping. Consider the following complementarity problem

$$\begin{aligned} CP(F,\mathcal {K})= {\left\{ \begin{array}{ll} \text {Find } x^*\in \mathcal {K} \text { such that }\\ F(x^*)\in \mathcal {K}^* \text { and } \langle x^*,F(x^*) \rangle = 0. \end{array}\right. } \end{aligned}$$

If

$$\begin{aligned} \liminf _{x\rightarrow 0}{\frac{\langle F(x)-F(0), x\rangle }{\Vert x\Vert ^2}}>0, \end{aligned}$$

then \(CP(F,\mathcal {K}) \) has a solution.

Whenever the mapping \(\mathcal {I}(F)\) is differentiable at 0, by using [14, Theorem4.6], we have

$$\begin{aligned} \liminf _{x\rightarrow 0}{\frac{\langle F(x)-F(0), x\rangle }{\Vert x\Vert ^2}}= {\text {min}}_{\Vert u\Vert =1,u \in \mathcal {K}} \langle d\mathcal {I}(F)(0) u, u \rangle , \end{aligned}$$
(2)

where \(d\mathcal {I}(F)(0)\) denotes the differential of the inversion of mapping \(\mathcal {I}(F)\) at 0. As a consequence, the constrained optimization problem (2) allows us to provide a sufficient condition for the complementarity problem \(CP(F,\mathcal {K})\) to have a solution which is enunciated in the next corollary.

Corollary 1

Let \(\mathcal {K}\subset \mathbb {V}\) be a cone and \(F: \mathbb {V}\rightarrow \mathbb {V}\) a mapping such that \(\mathcal {I}(F)\) is differentiable at 0. If

$$\begin{aligned} {\text {min}}_{\Vert u\Vert =1,u \in \mathcal {K}} \langle d\mathcal {I}(F)(0) u, u \rangle > 0, \end{aligned}$$
(3)

then the complementarity problem \(CP(F,\mathcal {K})\) has a solution.

The following theorem provides a class of mappings F such that the associated inversion of mapping \(\mathcal {I}(F)\) is differentiable at 0.

Theorem 2

Let \(P_i,Q_i:\,\mathbb {V} \rightarrow \mathbb {R}\) be polynomial functions of degree \(k_i\) and \(m_i\), respectively. Assume that \(m_i+1\ge k_i\). In addition, for \(r\in \mathbb {N}\) we assume that for any \(i\in \{1, 2, \ldots , r\}\) we have \(Q_i(x)\ne 0\), for all \(x\in \mathbb {V}\) and

$$\begin{aligned} Q_i\left( \frac{x}{\Vert x\Vert ^2}\right) \ne 0, \qquad \forall x\in \mathbb {V}, ~ x\ne 0. \end{aligned}$$

Let \(L: \mathbb {V} \rightarrow \mathbb {V}\) be a linear mapping and \(q \in \mathbb {V}\). Consider the mapping \(F: \mathbb {V}\rightarrow \mathbb {V}\) defined by

$$\begin{aligned} F(x)=\sum _{i=1}^r \frac{P_i(x)}{Q_i(x)} e^i +Lx + q. \end{aligned}$$
(4)

Then, the mappings F and \(\mathcal {I}(F)\) are differentiable. Furthermore, we have \(d\mathcal {I}(F)(0)=L.\)

Proof

Since L, \(Q_i(x)\ne 0\) and \(P_i,Q_i\) are differentiable, it follows that F is differentiable. Next we prove the differentiability of \(\mathcal {I}(F)\). For similar reasons as before \(\mathcal {I}(F)\) is differentiable for any \(x\in D\setminus \{0\}\). Since \(\mathbb {V}\ni x\mapsto \mathcal {I}(L+q)(x)=L(x)+q\Vert x\Vert ^2\) is differentiable, it is enough to check the differentiability of \(\mathcal {I}(P_i/Q_i)\) in 0 for an arbitrary i. Fix such an i and denote \(g{:=}\mathcal {I}(P_i/Q_i)\). Let \(x\ne 0\). After some algebra, by using the homogeneity of the involved functions, we get

$$\begin{aligned} g(x)=\frac{\displaystyle \sum _{j=0}^{k_i}\Vert x\Vert ^{2-2j}P_{ij}(x)}{\displaystyle \sum _{j=0}^{m_i}\Vert x\Vert ^{-2j}Q_{ij}(x)} =\frac{\displaystyle \sum _{j=0}^{k_i}\Vert x\Vert ^{2m_i-2j+2}P_{ij}(x)}{\displaystyle \sum _{j=0}^{m_i}\Vert x\Vert ^{2m_i-2j}Q_{ij}(x)}, \end{aligned}$$
(5)

where \(P_{ij}\) and \(Q_{ij}\) are the monomial terms of degree j in \(P_i\) and \(Q_i\), respectively. We have \(g(0)=0\). Indeed, let us first consider the case \(m_i+1=k_i\). Then, we have \(P_{ik_i}(0)=0\), because \(P_{ik_i}\) is homogeneous of degree \(k_i>0\), and the powers of \(\Vert x\Vert \) in the remaining terms of the nominator of (5) are positive. Hence, \(g(0)=0\). If \(m_i+1>k_i\), then \(g(0)=0\), because the powers of \(\Vert x\Vert \) in all terms of the nominator of (5) are positive. In order to show that g is differentiable in 0, it is enough to prove that the directional derivative

$$\begin{aligned} \frac{\partial g}{\partial h}(0)=\lim _{t\rightarrow 0}\frac{g(th)-g(0)}{t}=\lim _{t\rightarrow 0}\frac{g(th)}{t} \end{aligned}$$

exists and it is linear with respect to \(h\in \mathbb {V}\). Since \(Q_{im_i}(0)\ne 0\), it follows that \(Q_{im_i}(v)\ne 0\) if v is sufficiently close to the origin. For such a v, by using (5) and again the homogeneity of the involved functions, we obtain

$$\begin{aligned} \displaystyle \frac{g(tv)}{t}=\frac{\displaystyle \sum _{j=0}^{k_i}t^{m_i-j+2}\Vert v\Vert ^{m_i-j+2}P_{ij}(v)}{\displaystyle \sum _{j=0}^{m_i} t^{m_i-j}\Vert v\Vert ^{m_i-j}Q_{ij}(v)}, \end{aligned}$$

which, after some algebraic manipulations, implies that

$$\begin{aligned} \frac{g(tv)}{t} = \displaystyle \frac{t\left( \Vert v\Vert ^{m_i+1-k}P_{ik}(v)t^{m_i+1-k}+\dots +\Vert v\Vert ^{m_i+1}P_{i0}(v)t^{m_i+1}\right) }{Q_{im_i}(v)+\Vert v\Vert tQ_{i,m_i-1}(h)t+\dots +\Vert v\Vert ^{m_i}Q_{i0}(v)t^{m_i}}. \end{aligned}$$

Since the nominator of the right hand side of the second equality above is t multiplied by a polynomial of t (because the powers of t inside the bracket are nonnegative), it follows that \(\left( \partial g/\partial v\right) (0)=0\) if v is close enough to the origin. Hence, by using the positive homogeneity of the directional derivative, we obtain

$$\begin{aligned}\mathbb {V}\ni h\mapsto \frac{\partial g}{\partial h}(0)=0, \end{aligned}$$

which is linear. \(\square \)

From Theorem 2 and Corollary 1 we obtain the following result.

Corollary 2

Let \(\mathcal {K}\subset \mathbb {V}\) be a cone. If F is given as in (4), then the complementarity problem \(CP(F,\mathcal {K})\) has a solution if \({\text {min}}_{\Vert u\Vert =1,u\in \mathcal {K}}\langle Lu,u \rangle >0\).

Please note that if all polynomial functions \(P_i\) in Corollary 2 reduce to the null function, then \(CP(F,\mathcal {K})\) becomes a linear complementarity problem. Hence, Corollary 2 extends a well known result from linear complementarity which says that any positive definite matrix is a Q-matrix (see [32]).

By using [33] we obtain the following example, which gives a relationship between a cone constrained optimization problem with a complementarity problem.

Example 1

Let \(\mathcal {K}\subset \mathbb {V}\) be a cone. Let \(\varphi :\mathbb {V} \rightarrow \mathbb {R}\) be a differentiable function with its gradient denoted by \(D\varphi \). Then, the KKT conditions of the constrained optimization problem

$$\begin{aligned} {\text {min}}_{x\in \mathcal {K}} \varphi (x) \end{aligned}$$
(6)

can be written as follows

$$\begin{aligned} D\varphi (x)= y, ~ x\in \mathcal {K},~y\in \mathcal {K}^* ,~\langle x,y\rangle =0. \end{aligned}$$
(7)

Hence, the KKT conditions of the optimization problem (6) lead to the complementarity problem \(CP(F,\mathcal {K})\). In particular, consider the function \(\varphi : \mathbb {R}^4 \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} \varphi (x)=\frac{x_1^2+x_3}{x_2^4+x_4^4+1}+x_1^2+x_3^2+3x_1x_3+2x_2x_4+5x_1+3x_3+4x_4, \end{aligned}$$

where \(x{:=}(x_1,x_2,x_3,x_4)\). Define the mapping \(F: \mathbb {R}^4 \rightarrow \mathbb {R}^4\) by \(F(x){:=}D \varphi (x)\), where \(D \varphi \) is the gradient of \(\varphi \). The mapping F is given by

$$\begin{aligned} F(x)=\sum _{i=1}^4\frac{P_i(x)}{Q_i(x)} e^i +Lx + q, \end{aligned}$$
(8)

where \(P_1(x){:=}2x_1\), \(Q_1(x){:=}x_2^4+x_4^4+1\), \( P_2(x){:=}-4x_2^3(x_1^2+x_3)\), \(Q_2(x){:=}(x_2^4+x_4^4+1)^2\), \(P_3(x){:=}1\), \(Q_3(x){:=}x_2^4+x_4^4+1\), \(P_4(x){:=}-4x_4^3(x_1^2+x_3)\), \(Q_4(x){:=}(x_2^4+x_4^4+1)^2\) and

$$\begin{aligned} L{:=}\begin{pmatrix} 2 &{} 0 &{} 3 &{} 0 \\ 0 &{} 0 &{} 0 &{} 2 \\ 3 &{} 0 &{} 2 &{} 0 \\ 0 &{} 2 &{} 0 &{} 0 \end{pmatrix}, \quad \quad q{:=}\begin{pmatrix} 5 \\ 0 \\ 3 \\ 4 \end{pmatrix}. \end{aligned}$$
(9)

Therefore, due to Corollary 2, the solvability of (7) with F given by (8) is related to the strict copositivity of the matrix L given by (9).

It should be mentioned that complementarity problems with functions given in (4) can be considered as extensions of linear complementarity problems [32]. The KKT optimality conditions of quadratic optimization problems lead to linear complementarity problems. If we consider extensions of quadratic optimization problems which optimize the sum of special fractional polynomial functions and quadratic functions, then by writing the optimality conditions, we obtain complementarity problems with functions of type (4).

3 Copositivity with respect to cones

In this section we present a relationship between the general concept of copositivity of linear operators with an optimization problem constrained to a suitable subset of the sphere. We will start by recalling the concept of \(\mathcal {K}\)-copositivity of an operator with respect to a cone \(\mathcal {K}\subset \mathbb {V}\).

Definition 1

Let \(\mathbb {V}\) be a finite dimensional real vector space together with the positive definite inner product \(\langle \cdot , \cdot \rangle : \mathbb {V} \times \mathbb {V} \rightarrow \mathbb {R}\) and \(\mathcal {K} \subseteq \mathbb {V}\) be a cone. The operator \(A: \mathbb {V} \rightarrow \mathbb {V}\) is said to be \(\mathcal {K}\)-copositive if \(\langle Ax,x \rangle \ge 0\), for all \(x \in \mathcal {K}\). We say that the operator A is \(\mathcal {K}\)-strictly copositive if \(\langle Ax,x \rangle > 0\), for all \(x \in \text {int} \; \mathcal {K}\).

When \(\mathbb {V}\) is n-dimensional and the cone \(\mathcal {K}\) is the nonnegative orthant \(\mathbb {V}_{+}=\{x\in \mathbb {V}:x_1\ge 0,\dots ,x_n\ge 0\}\), the classical terminology refers to \(\mathbb {V}_{+}\)-copositive as copositive omitting the reference to the cone \(\mathbb {V}_{+}\). The above Definition 1 also appeared with other names as well, for example \(\mathcal {K}\)-semidefinite or copositive with respect to the set \(\mathcal {K}\), see for example [17]. Testing copositivity of matrices plays key role in combinatorial and non-convex quadratic optimization. However, testing copositivity of a given matrix turned out to be a co-NP-complete problem. Several conditions for copositivty have been introduced, see [34, 35]. Some of the proposed conditions use properties of principal submatrices and are difficult to use for optimization purposes. In the literature we can read about many algorithms for testing copositivity of matrices, see [20, 21, 23,24,25,26]. In particular, in [36] the authors used the projection onto the intersection of a cone and a sphere, which is a non-convex set, to test the classical copositivity of matrices, i.e., it was considered the non-convex problem QP in (10) with \(\mathcal {K}=\mathbb {R}^n_{+}\) and adaptation of several classical algorithms to non-convex constraint to solve it. It is worth to note that due to [17, Corollary 2.21], testing \(\mathcal {K}\)-copositivity of matrices with respect to polyhedral cones can be reduced to test classical copositivity.

The next lemma is an immediate consequence of the Definition 1 and its proof will be omitted.

Lemma 1

Let \(A: \mathbb {V} \rightarrow \mathbb {V}\) be an operator, \(\mathcal {K}\) be a cone and \(\bar{x}\) be a (global) minimal solution of the non-convex constrained quadratic optimization problem

$$\begin{aligned} \text { QP: } {\text {min}}f(x)&{:=}&\frac{1}{2} \langle Ax, x \rangle \nonumber \\ \langle x, x \rangle= & {} 1, \nonumber \\ x\in & {} \mathcal {K}. \end{aligned}$$
(10)

Then, the following statements hold:

  1. 1.

    A is \(\mathcal {K}\)-copositive if and only if \(f(\bar{x}) \ge 0\),

  2. 2.

    A is \(\mathcal {K}\)-strictly copositive if and only if \(f(\bar{x}) > 0\),

  3. 3.

    A is not \(\mathcal {K}\)-copositive if and only if there exists a feasible x with \(f(x) < 0\),

  4. 4.

    A is not \(\mathcal {K}\)-strictly copositive if there exists a feasible x with \(f(x) = 0\).

The next corollary relates copositivity to cone-complementarity and it follows from Corollary 2.

Corollary 3

Let F be defined by formula (4), \(\mathcal {K}\) be a cone and \(I: \mathbb {V} \rightarrow \mathbb {V}\) be the identity map. If there exists an \(\alpha >0\) such that \(L-\alpha I\) is \(\mathcal {K}\)-copositive, then the complementarity problem \(CP(F,\mathcal {K})\) has a solution.

According to Corollary 1, Corollary 3 and Lemma 1, to obtain existence results for the complementarity problem \(CP(F,\mathcal {K})\) with F defined by formula (4) reduces to find a global solution of problem (10) via testing \(\mathcal {K}\)-copositivity. Besides the nonnegative orthant, Corollary 3 also motivates to consider the copositivity with respect to others cones, such as the Lorentz cone \(\mathcal {L}^n\) and the positive semidefinite cone. The following proposition has appeared in [18].

Proposition 3

The matrix \({A}\in \mathcal {S}^n\) is copositive with respect to \(\mathcal {L}^n\) if and only if there exists a \(\mu \in \mathbb {R}_+\) such that the matrix \({A}-\mu J\) is positive semidefinite.

Next we state the homogeneous S-Lemma, which together with Proposition 3 provides a sufficient condition for a matrix to be \(\mathcal {L}^n\)-copositive, its proof can be found in [37, Theorem 2.2].

Lemma 2

Suppose that there exists \(x^*\) such that \((x^*)^{\top }Ax^*>0\). If \(x^{\top }Ax\ge 0\) implies \(x^{\top }Bx\ge 0\) for any \(x\in \mathbb {R}\), then there exists a \(\mu \in \mathbb {R}_+\) such that \(B-\mu A\) is positive semidefinite.

Theorem 4

Let F be defined by formula (4) and consider the Lorentz cone \(\mathcal {L}^n\). Suppose that there exists \(\lambda ,\mu >0\), such that \(L+\lambda I-\mu J\) is positive semidefinite. Then, the complementarity problem \(CP(F,\mathcal {L}^n)\) has a solution.

Proof

By using that \(L+\lambda I-\mu J\) is positive semidefinite and Proposition 3, it follows that \(L+\lambda I\) is copositive with respect to \(\mathcal {L}^n\). Since

$$\begin{aligned} \liminf _{x\rightarrow 0}{\frac{\langle F(x)-F(0), x\rangle }{\Vert x\Vert ^2}}={\text {min}}_{\Vert u\Vert =1,u\in \mathcal {L}^n}\langle L u,u \rangle =\lambda >0, \end{aligned}$$

by using Theorem 1, we conclude that the complementarity problem \(CP(F,\mathcal {L})\) has a solution. \(\square \)

Several papers appeared regarding copositivity with respect to a cone \(\mathcal {K}\), see [12, 17, 19]. To our best knowledge, there is no characterization for copositivity of operators with respect to the cone of positive semidefinite matrices \( \mathcal {S}^n_+\). It should be mentioned that \(\mathcal {S}^2_+\) is isomorphic to \(\mathcal {L}^3\), hence the characterization of copositivity with respect to the positive semidefinite cone in case of \(n=2\) can be given by using the previous results.

In the following section we will introduce the gradient projection method on the sphere to solve the problem (10), which can also be used to test copositivity of operators with respect to different cones. In Sect. 5.1 we will also give examples where we analyse the copositivity of operators with respect to the positive semidefinite cone.

4 Gradient projection method on the sphere

The aim of this section is to introduce the gradient projection method to solve constrained problems on the sphere in Euclidean vector spaces. Let us be more precise by stating the problem we are going to address. Let \(\mathbb {V}^{n+1}\) be an \((n+1)\)-dimensional Euclidean vector space with the inner product \( {\langle } \cdot , \cdot {\rangle }\) and norm \(\Vert \cdot \Vert \). Denote the n-dimensional sphere in the Euclidean vector space by

$$\begin{aligned} \mathbb {S}^{n}{:=}\left\{ p \in \mathbb {V}^{n+1} :\, \Vert p\Vert =1\right\} . \end{aligned}$$
(11)

It should be mentioned that we work in general Euclidean vector spaces in order to consider a unified description of different cones, including the case when \(\mathcal {K}\) is the cone of positive semidefinite matrices. Throughout this section we consider the following constrained optimization problem

$$\begin{aligned} \;\; {\text {min}}\{ f(p) \,: \, p\in \mathcal {C} \}, \end{aligned}$$
(12)

where \( f: \mathbb {S}^n \rightarrow \mathbb {R}\) is differentiable and \(\mathcal {C}\subseteq \mathbb {S}^n\) is closed and spherically convex (see Definition 2). Since problem (2) is of type (12), the gradient projection method can also be used to test copositivity of operators with respect to cones \(\mathcal {K}\subseteq \mathbb {V}^{n+1}\). As a consequence, it follows from Corollary 1 that it can also be used to analyze solvability of complementarity problems. To deal with problem (12), we firstly present some basic results about the sphere (11), after that we show how to intrinsically project onto \(\mathcal {C}\subseteq \mathbb {S}^n\) and present some properties of the projection. Finally, the gradient projection method to solve problem (12) is introduced and the convergence analysis is presented.

4.1 Basics results

In this section we recall notations, definitions and basic geometric properties of the sphere in Euclidean vector spaces, for more details see for example in [38,39,40,41].

The tangent hyperplane at a point \(p\in \mathbb {S}^{n}\) is denoted by

$$\begin{aligned} T_{p}\mathbb {S}^n{:=}\left\{ v\in \mathbb {V}^{n+1}\, :\, \langle p, v \rangle =0 \right\} , \end{aligned}$$
(13)

and the corresponding projection mapping onto it, denoted by \({{\,\textrm{Proj}\,}}_p:\mathbb {V}^{n+1} \rightarrow T_p\mathbb {S}^n\), is given by

$$\begin{aligned} {{\,\textrm{Proj}\,}}_p x{:=}x- \langle p, x \rangle p, \end{aligned}$$
(14)

The intrinsic distance on the sphere between two arbitrary points \(p, q \in \mathbb {S}^n\) is defined by

$$\begin{aligned} d(p, q){:=}\arccos \langle p , q\rangle . \end{aligned}$$
(15)

A geodesic segment on the sphere joining two points \(p, q\in \mathbb {S}^n\) is obtained by the intersection of a plane through these points and the origin of \(\mathbb {V}^{n+1}\) with \( \mathbb {S}^n\). The arc length of a geodesic segment \(\omega \) is denoted by \(\ell (\omega )\). If for a geodesic segment \(\omega : [a, b]\rightarrow \mathbb {S}^n\) we have \(\ell (\omega ){:=}\arccos \langle \omega (a), \omega (b)\rangle \), then this geodesic segment is said to be minimal.

Definition 2

A set \(\mathcal {C}\subseteq \mathbb {S}^n\) is called spherically convex if for any two points belonging to \(\mathcal {C}\), all minimal geodesic segments joining them are contained in \(\mathcal {C}\).

Below, we present some examples; for more details, please refer to [42].

Example 2

The sets \(\mathcal {C}=\mathbb {R}^{n+1}_+\cap \mathbb {S}^n\), \(\mathcal {C}={\mathcal {L}^n} \cap \mathbb {S}^n\) and \(\mathcal {C}=\{p\in \mathcal {S}^{n}_+:\Vert p\Vert =1\}\) are spherically convex.

Denoting by \(\omega _{p,v}\), the geodesic defined by its initial position p with velocity v at p, the exponential mapping \(\text{ exp}_{p}:T_{p}\mathbb {S}^n \rightarrow \mathbb {S}^n \) is given by \(\text{ exp}_{p}v\,{:=}\, \omega _{p,v}(1)\). Hence, we have

$$\begin{aligned} \text{ exp}_{p}v{:=} {\left\{ \begin{array}{ll} \displaystyle \cos (\Vert v\Vert ) \,p+ \sin (\Vert v\Vert )\, \frac{v}{\Vert v\Vert }, \quad &{} v\in T_p\mathbb {S}^n/\{0\},\\ p, \quad &{} v=0. \end{array}\right. } \end{aligned}$$
(16)

As a consequence, for all \(t\in \mathbb {R}\) we obtain that \(\omega _{p,v}(t)=\text{ exp}_{p}tv\). In addition,

$$\begin{aligned} \text{ exp}_{p}tv{:=} {\left\{ \begin{array}{ll} \displaystyle \cos (t\Vert v\Vert ) \,p+ \sin (t\Vert v\Vert )\, \frac{v}{\Vert v\Vert }, \quad &{} v\in T_p\mathbb {S}^n/\{0\},\\ p, \quad &{} v=0. \end{array}\right. } \end{aligned}$$
(17)

The inverse of the exponential mapping, denoted by \(\text{ exp}_{p}^{-1}:\mathbb {S}^n \rightarrow T_{p}\mathbb {S}^n \), is given by

$$\begin{aligned} \text{ exp}_{p}^{-1}q{:=} {\left\{ \begin{array}{ll} \displaystyle \frac{d(p, q)}{\sqrt{1-\langle p, q\rangle ^2}} {{\,\textrm{Proj}\,}}_p q, \quad &{} q\notin \{p, -p\},\\ 0, \quad &{} q=p. \end{array}\right. } \end{aligned}$$
(18)

By using (15) and (18), some calculations show that

$$\begin{aligned} d(p, q)=\Vert \text{ exp}_{q}^{-1}p\Vert , \qquad p, q \in \mathbb {S}^n. \end{aligned}$$
(19)

From now on \(\Omega \subseteq \mathbb {S}^n\) denotes an open set and \(f: \Omega \rightarrow \mathbb {R}\) is a differentiable function. The gradient on the sphere of f at \(p\in \Omega \) is defined by

$$\begin{aligned} {{\,\textrm{grad}\,}}f(p)= {{\,\textrm{Proj}\,}}_p Df(p). \end{aligned}$$
(20)

where \(Df(p) \in \mathbb {V}^{n+1}\) denotes the usual gradient (Euclidean gradient) of f at a point \(p\in \Omega \). For f twice differentiable, the Hessian on the sphere at \(p\in \Omega \) is an operator \( {{\,\textrm{Hess}\,}}f(p):T_p\mathbb {S}^n \rightarrow T_p\mathbb {S}^n\) defined by

$$\begin{aligned} {{\,\textrm{Hess}\,}}f(p)u{:=} {{\,\textrm{Proj}\,}}_p\left( D^2f(p)u-\langle Df(p), p\rangle u\right) , \end{aligned}$$
(21)

where \( D^2f(p):\mathbb {V}^{n+1}\rightarrow \mathbb {V}^{n+1}\) is the usual Hessian operator (Euclidean Hessian) of f at p. For the next example, let us firstly recall the operator norm of the Hessian operator:

$$\begin{aligned} \Vert {{\,\textrm{Hess}\,}}f(p)\Vert {:=} \sup _{\Vert u\Vert =1} \mid \langle {{\,\textrm{Hess}\,}}f(p)u,u\rangle \mid = \sup _{\Vert u\Vert =1} \Vert {{\,\textrm{Hess}\,}}f(p)u\Vert . \end{aligned}$$
(22)

Example 3

In the special case of \(f(p){:=}\langle Ap,p\rangle ,\) where \(A:\mathbb {V}^{n+1}\rightarrow \mathbb {V}^{n+1}\) is a linear operator, by using (14) and (20) we have \(\langle {{\,\textrm{grad}\,}}f(p), u\rangle = \langle Ap, u\rangle + \langle p, Au\rangle \), for all \(u\in T_p\mathbb {S}^n\). We also obtain from (14) and (21) that \(\left\langle {{\,\textrm{Hess}\,}}f(p)u,u\right\rangle =2\left\langle Au,u\right\rangle -2\langle Ap, p\rangle \left\langle u,u\right\rangle \), for all \(u\in T_p\mathbb {S}^n\). Letting \(\lambda _{{\text {max}}}(A){:=}{\text {max}}_{\Vert u\Vert =1} \langle Au,u\rangle \) and \(\lambda _{min}(A){:=}{\text {min}}_{\Vert u\Vert =1} \langle Au,u\rangle \), we conclude that \(\Vert {{\,\textrm{grad}\,}}f(p)\Vert \le 2\lambda _{{\text {max}}}(A)\) and \(\Vert {{\,\textrm{Hess}\,}}f(p)\Vert \le 2(\lambda _{{\text {max}}}(A)- \lambda _{{\text {min}}}(A))\).

For each \(p, q\in {{{\mathbb {S}}}^n}\) with \(q\ne -p\) we denote by \([0,1]\ni t\mapsto \omega _{pq}(t){:=}\text{ exp}_{p}t(\text{ exp}_{p}^{-1}q)\) the geodesic segment joining p and q. The parallel transport from p to q along the geodesic segment \(\omega _{pq}\), which is denoted by \(P_{ pq} :T _{p} {{{\mathbb {S}}}^n} \rightarrow T _ {q}{{{\mathbb {S}}}^n}\), is given by

$$\begin{aligned} P_{pq} (v){:=}v-\frac{1}{1+\langle p, q\rangle }\langle q, v\rangle (p+q). \end{aligned}$$

Definition 3

Let \(\mathcal {C}\subset \mathbb {S}^n\) be a spherically convex set. The gradient vector field of f is said to be Lipschitz continuous on \(\mathcal {C}\) with constant \(L\ge 0\) if \(\left\| P_{pq} {{\,\textrm{grad}\,}}f(p)- {{\,\textrm{grad}\,}}f(q)\right\| \le L d(p,q)\), for any \(p,q\in \mathcal {C}\).

The proof of the next lemma is similar to [38, Proposition 10.43], it will be omitted.

Lemma 3

The gradient vector field of f is Lipschitz continuous with constant \(L\ge 0\) on \(\mathcal {C}\) if and only if there exists \(L\ge 0\) such that \(\Vert \text{ Hess }\,f(p)\Vert \le L\), for all \(p\in \mathcal {C}\).

As an application of Lemma 3 and Example 3 we obtain the following lemma.

Lemma 4

Let \(f:\Omega \rightarrow \mathbb {R}\) be given by \(f(p)=\langle Ap, p\rangle \) and \(\mathcal {C}\subseteq \Omega \) be a convex set. Then, f is Lipschitz continuous with constant \(L=2 (\lambda _{max}(A)-\lambda _{min}(A))\) on \(\mathcal {C}\).

Proof

It follows from Example 3 that \(\Vert {{\,\textrm{Hess}\,}}f(p)\Vert \le 2(\lambda _{max}(A)- \lambda _{min}(A))\). Therefore, by combining the last equality with (22) and Lemma 3, we obtain the desired result. \(\square \)

The proof of the next lemma it a straight application of [38, Corollary 10.54].

Lemma 5

Assume that \({{\,\textrm{grad}\,}}f\) is Lipschitz continuous on a convex set \(\mathcal {C}\subseteq \Omega \) with constant \(L\ge 0\). Then, there holds

$$\begin{aligned} f( q) \le f(p) + \langle {{\,\textrm{grad}\,}}f(p), \text{ exp}_{p}^{-1}q\rangle +\frac{L}{2}d^2(p,q), \qquad \forall p, q\in \mathcal {C}. \end{aligned}$$

In the next lemma we recall the well-known cosine law for triangle in the sphere. Since its proof is a straight application of (18), for sake of completeness we include it here.

Lemma 6

Let \({{\hat{q}}}, {{\tilde{q}}}, {{\bar{q}}} \in \mathbb {S}^n\) such that \({{\tilde{q}}}, {{\bar{q}}} \notin \{{{\hat{q}}}, -{{\hat{q}}}\}\) and let \(\theta _{{\hat{q}}}\) be the angle between the vectors \( \text{ exp}_{{{\hat{q}}}}^{-1}{{\tilde{q}}}\) and \(\text{ exp}_{{\hat{q}}}^{-1}{{\bar{q}}}\). Then, there holds

$$\begin{aligned} \cos d({{\tilde{q}}}, {{\bar{q}}})=\cos d({{\hat{q}}}, {{\bar{q}}}) \cos d({{\hat{q}}}, {{\tilde{q}}}) + \sin d({{\hat{q}}}, {{\bar{q}}}) \sin d({{\hat{q}}}, {{\tilde{q}}}) \cos \theta _{{\hat{q}}}. \end{aligned}$$

Proof

First we apply (19) to obtain that \(\langle \text{ exp}_{{{\hat{q}}}}^{-1}{{\tilde{q}}}, \text{ exp}_{{\hat{q}}}^{-1}{{\bar{q}}}\rangle = d({{\hat{q}}}, {{\tilde{q}}}) d({{\hat{q}}}, {{\bar{q}}})\cos \theta _{{\hat{q}}}\). Now, by using (18), we obtain that

$$\begin{aligned} \left\langle \text{ exp}_{{\hat{q}}}^{-1}{{\tilde{q}}}, \text{ exp}_{{\hat{q}}}^{-1}{{\bar{q}}}\right\rangle = \displaystyle \frac{d({{\hat{q}}}, {{\tilde{q}}})}{\sqrt{1-\langle {{\hat{q}}}, {{\tilde{q}}}\rangle ^2}} \displaystyle \frac{d({{\hat{q}}}, {{\bar{q}}})}{\sqrt{1-\langle {{\hat{p}}}, {{\bar{q}}}\rangle ^2}} \left\langle {{\,\textrm{Proj}\,}}_{{\hat{q}}} {{\tilde{q}}}, {{\,\textrm{Proj}\,}}_{{\hat{q}}}{{\bar{q}}}\right\rangle . \end{aligned}$$

By combining the two previous inequalities, after some algebraic manipulations, we obtain that

$$\begin{aligned} \cos \theta _{{\hat{q}}} =\displaystyle \frac{1}{\sqrt{1-\langle {{\hat{q}}}, {{\tilde{q}}}\rangle ^2}} \displaystyle \frac{1}{\sqrt{1-\langle {{\hat{p}}}, {{\bar{q}}}\rangle ^2}} \left( \langle {{\tilde{q}}}, {{\bar{q}}}\rangle -\langle {{\hat{q}}}, {{\bar{q}}}\rangle \langle {{\tilde{q}}}, {{\hat{q}}}\rangle \right) . \end{aligned}$$
(23)

Hence, bearing in mind that \(\langle {{\tilde{q}}}, {{\bar{q}}}\rangle =\cos d({{\tilde{q}}}, {{\bar{q}}})\), \(\langle {{\hat{q}}}, {{\bar{q}}}\rangle =\cos d({{\hat{q}}}, {{\bar{q}}})\) and \(\langle {{\tilde{q}}},{{\hat{q}}}\rangle =\cos d({{\tilde{q}}},{{\hat{q}}})\), we obtain the desired equality. \(\square \)

4.2 Projection onto a closed spherically convex set

In this section we study some concepts related to the projection onto a closed spherically convex set. We begin by recalling some concepts on the projection onto a closed convex set, for more details see [42, 43]. Since Definition 2 implies that \(\mathbb {S}^n\) is closed spherically convex, for convenience, from now on we assume that all closed spherically convex sets are nonempty proper subsets of the sphere. For each set \( \mathcal {C} \subseteq \mathbb {S}^n\), let

$$\begin{aligned} \mathcal {K}_\mathcal {C}{:=}\left\{ tp \, :\, p\in \mathcal {C}, \; t\in [0, +\infty ) \right\} . \end{aligned}$$
(24)

Proposition 5

The closed set \(\mathcal {C}\subset \mathbb {S}^n\) is spherically convex if and only if \(\mathcal {K}_\mathcal {C}\subseteq {{\mathbb {V}}}^{n+1}\) is a pointed cone.

Let \(\mathcal {C}\subseteq \mathbb {S}^n \) be a closed spherically convex set. The projection onto \(\mathcal {C}\subseteq \mathbb {S}^n \) is defined by

$$\begin{aligned} \mathcal {P}_\mathcal {C}(p):= & {} \left\{ {\bar{p}}\in \mathcal {C}:~ d(p,{\bar{p}})\le d(p,q), \forall \, q\in \mathcal {C} \right\} \nonumber \\= & {} \left\{ {\bar{p}}\in \mathcal {C}: \langle p, q \rangle \le \langle p, {\bar{p}} \rangle , \forall \, q\in \mathcal {C} \right\} . \end{aligned}$$
(25)

In the following proposition we present the main property of the projection onto a closed spherically convex set \(\mathcal {C}\subseteq \mathbb {S}^n\).

Proposition 6

Let \(p\in \mathbb {S}^n\) and \( {\bar{p}}\in \mathcal {C}\) such that \( \langle p, {\bar{p}} \rangle >0\). Then, \({\bar{p}}\in \mathcal {P}_\mathcal {C}(p)\) if and only if \(\left\langle {{\,\textrm{Proj}\,}}_{{\bar{p}}}p, \ {{\,\textrm{Proj}\,}}_{{\bar{p}}}q \right\rangle \le 0\), for all \(q\in \mathcal {C}.\) In addition, \( \mathcal {P}_\mathcal {C}(p)\) is a singleton.

Remark 1

The condition \(\left\langle {{\,\textrm{Proj}\,}}_{{\bar{p}}}p, \ {{\,\textrm{Proj}\,}}_{{\bar{p}}}q \right\rangle \le 0\) is equivalent to \(\langle \text{ exp}_{{\bar{p}}}^{-1}p, \text{ exp}_{{\bar{p}}}^{-1}q\rangle \le 0\).

The following proposition implies that for projecting a point onto a closed spherically convex set, it is sufficient to project this point onto the cone spanned by this set, its proof can be found in [43].

Proposition 7

Let \(\mathcal {C}\subseteq \mathbb {S}^n\) be a nonempty, closed, and convex proper subset. If \(p\in \mathbb {S}^n\) with \(P_{\mathcal {K}_\mathcal {C}}(p)\ne 0\), then

$$\begin{aligned} \mathcal {P}_\mathcal {C}(p)=\frac{P_{\mathcal {K}_\mathcal {C}}(p)}{\Vert P_{\mathcal {K}_\mathcal {C}}(p)\Vert }, \end{aligned}$$

where \( P_{\mathcal {K}_\mathcal {C}}(p)\) denotes the usual orthogonal projection onto the cone \( \mathcal {K}_\mathcal {C}\).

Example 4

Let \( {P}_{\mathbb {R}^{n+1}_+}(z)\) be the usual orthogonal Euclidean projection onto the pointed cone \(\mathbb {R}^{n+1}_+\). It is well-known that \(z^+={P}_{\mathbb {R}^{n+1}_+}(z)\). Consider the closed convex set \(\mathcal {C}=\{p\in \mathbb {S}^n:~ p\in \mathbb {R}^{n+1}_+\} \). The cone spanned by C is \( K_\mathcal {C}=\mathbb {R}^{n+1}_+\). Thus, it follows from Proposition 7 that, for all points \(p\in \mathbb {S}^n\) with \(p^+\ne 0\), we have

$$\begin{aligned} {\mathcal {P}}_{{\mathcal {C}}_{+}}(p)=\frac{p^+}{\Vert p^+\Vert }. \end{aligned}$$

Example 5

Let \(\mathcal {L}^n\) be the Lorentz cone. Then,

$$\begin{aligned} {P}_{\mathcal {L}^n}(x,t)= {\left\{ \begin{array}{ll} \displaystyle \frac{1}{2} \Big (\left[ (t+ \Vert x\Vert )^+ -(t- \Vert x\Vert )^+\right] \displaystyle \frac{x}{\Vert x\Vert }, (t- \Vert x\Vert )^+ + (t+ \Vert x\Vert )^+\Big ), \hspace{-0.2cm}&{} x \ne 0, \\ \\ \displaystyle \ \left( t^+,\, 0\right) , \hspace{-0.2cm}&{} x= 0. \end{array}\right. } \end{aligned}$$

see for example [44, Proposition 3.3], or alternatively we have

$$\begin{aligned} {P}_{\mathcal {L}^n}\left( x,t\right) = {\left\{ \begin{array}{ll} (x,t), &{} \quad t \ge \Vert x\Vert , \\ \frac{1}{2}\left( 1+\frac{1}{\Vert x\Vert }\right) \left( x,\Vert x\Vert \right) , &{} \quad -\Vert x\Vert< t < \Vert x\Vert . \end{array}\right. } \end{aligned}$$
(26)

for more details, see [45, Theorem 3.3.6, pp. 40]. Let \(\mathcal {C}=\{p\in \mathbb {S}^n:~ p\in \mathcal {L}^n\} \) be a closed spherically convex set. Then, \( K_\mathcal {C}=\mathcal {L}^n\). Thus, it follows from Proposition 7 that, for all points \(p=(x, t)\in \mathbb {S}^n\) with \(t>0\), we have

$$\begin{aligned} \mathcal {P}_\mathcal {C}(p)=\frac{{P}_{\mathcal {L}^n}(p)}{\Vert {P}_{\mathcal {L}^n}(p)\Vert }. \end{aligned}$$

Example 6

Let \(\mathcal {S}^n\) be the vector space of symmetric matrices over the real numbers \(\mathbb {R}\), and \(\mathcal {S}^n_+=\{X \in \mathcal {S}^n:~ X \succeq 0 \}\) be the cone of positive semidefinite matrices. The inner product of two matrices \(X, Y \in \mathcal {S}^n\) is defined by \(\langle X, Y\rangle =\text{ tr }(YX)\), where \(\text{ tr }\) denotes the trace. Let \(X\in \mathcal {S}^n\) and \(\{v^1, v^2,\dots ,v^n\}\) be an orthonormal system of eigenvectors of the matrix X corresponding to the eigenvalues \(\lambda _1, \lambda _2, \ldots , \lambda _n\), respectively. Thus, by using the spectral decomposition of X, we have

$$\begin{aligned} X=\sum _{i=1}^n\lambda _i v^i(v^i)^{\top }. \end{aligned}$$

Consider the closed convex set \(\mathcal {C}=\{X\in \mathbb {S}^n:~ X\in \mathcal {S}^n_+\} \). The cone spanned by C is \( K_\mathcal {C}=\mathcal {S}^n_+\). Then, the projection of \(X\in \mathcal {S}^n\) onto \(\mathcal {S}^n_+\) is given by

$$\begin{aligned} {P}_{\mathcal {S}^n_+}(X)= {\sum _{i=1}^{n} \lambda _i^+ v_i v_i^T}, \end{aligned}$$

where \(\lambda _i^+={\text {max}}\{\lambda _i,0\}\). Thus, it follows from Proposition 7, that for all matrices \(X\in \mathbb {S}^n\) with \(\mathcal {P}_{\mathcal {S}^n_+}(X)\ne 0\), we have

$$\begin{aligned} \mathcal {P}_\mathcal {C}(X)=\frac{ {P}_{\mathcal {S}^n_+}(X)}{\Vert {P}_{\mathcal {S}^n_+}(X)\Vert }. \end{aligned}$$

4.2.1 Properties of the projection onto a closed spherically convex set

In this section we present some new properties on the projection onto a closed spherically convex set which will be useful to analyze the gradient projection method. Consider a nonempty closed spherically convex set \(\mathcal {C}\subseteq \mathbb {S}^n\) with \(\mathcal {C} \ne \mathbb {S}^n\).

Lemma 7

Let \(p, q\in \mathbb {S}^n\) and \({\bar{\theta }} >0\) such that \({\bar{\theta }} <\pi /2\). Assume that \(p \in \mathcal {C}\) and \(d(p,q)\le {\bar{\theta }}\). Then, the following inequality holds

$$\begin{aligned} \cos ({\bar{\theta }}) d^2(p, \mathcal {P}_\mathcal {C}(q))\le \left\langle \text{ exp}_{p}^{-1}q, \text{ exp}_{p}^{-1}\mathcal {P}_\mathcal {C}(q) \right\rangle . \end{aligned}$$
(27)

Proof

By applying Lemma 6 with \({{\hat{q}}}=p\), \({{\tilde{q}}}=q\) and \({{\bar{q}}}= \mathcal {P}_\mathcal {C}(q)\), we conclude that

$$\begin{aligned} \cos d(q, \mathcal {P}_\mathcal {C}(q))=\cos d(p, \mathcal {P}_\mathcal {C}(q)) \cos d(p, q) + \sin d(p, \mathcal {P}_\mathcal {C}(q)) \sin d(p, q) \cos \theta _{p}. \end{aligned}$$
(28)

Now, by using Lemma 6 with \({{\hat{q}}}= \mathcal {P}_\mathcal {C}(q)\), \({{\tilde{q}}}=q\) and \({{\bar{q}}}= p\), we obtain that

$$\begin{aligned} \cos d(q, p)=\cos d(\mathcal {P}_\mathcal {C}(q), p ) \cos d(\mathcal {P}_\mathcal {C}(q), q) + \sin d(\mathcal {P}_\mathcal {C}(q), p) \sin d(\mathcal {P}_\mathcal {C}(q), q) \cos \theta _{\mathcal {P}_\mathcal {C}(q)}. \end{aligned}$$

It follows from Proposition 6 that \(\cos \theta _{\mathcal {P}_\mathcal {C}(q)}\le 0\). Thus, due to \(d(\mathcal {P}_\mathcal {C}(q), p) \le \pi \) and \(d(\mathcal {P}_\mathcal {C}(q), q) \le \pi \), the last equality becomes

$$\begin{aligned} \cos d(q, p)\le \cos d(\mathcal {P}_\mathcal {C}(q), p ) \cos d(\mathcal {P}_\mathcal {C}(q), q). \end{aligned}$$
(29)

By combining inequalities (28) and (29), after some algebraic manipulations, we conclude that

$$\begin{aligned}{} & {} \left( \cos d(q, \mathcal {P}_\mathcal {C}(q))+ \cos d(q, p)\right) \left( 1- \cos d(\mathcal {P}_\mathcal {C}(q), p )\right) \nonumber \\{} & {} \quad \le \sin d(p, \mathcal {P}_\mathcal {C}(q)) \sin d(p, q) \cos \theta _{p}. \end{aligned}$$
(30)

Since \(\langle \text{ exp}_{p}^{-1}q, \text{ exp}_{p}^{-1}\mathcal {P}_\mathcal {C}(q)\rangle = d(p,q) d(p, \mathcal {P}_\mathcal {C}(q))\cos \theta _{p}\), the last inequality is equivalent to

$$\begin{aligned}{} & {} d^2(p, \mathcal {P}_\mathcal {C}(q)) \frac{d(q, p)}{ \sin d(p, q)}\left( \cos d(q, \mathcal {P}_\mathcal {C}(q))+ \cos d(q, p)\right) \frac{1- \cos d(\mathcal {P}_\mathcal {C}(q), p )}{d(p, \mathcal {P}_\mathcal {C}(q))\sin d(p, \mathcal {P}_\mathcal {C}(q))} \le \\{} & {} \quad \langle \text{ exp}_{p}^{-1}q, \text{ exp}_{p}^{-1}\mathcal {P}_\mathcal {C}(q)\rangle . \end{aligned}$$

Due to \(d(q, \mathcal {P}_\mathcal {C}(q))\le d(q, p)\), we have \(\cos d(q, p)\le \cos d(q, \mathcal {P}_\mathcal {C}(q))\). Thus, the last inequality implies

$$\begin{aligned}{} & {} d^2(p, \mathcal {P}_\mathcal {C}(q)) \frac{2d(q, p)\cos d(q, p)}{ \sin d(p, q)}\frac{1- \cos d(\mathcal {P}_\mathcal {C}(q), p )}{d(p, \mathcal {P}_\mathcal {C}(q))\sin d(p, \mathcal {P}_\mathcal {C}(q))}\nonumber \\{} & {} \quad \le \langle \text{ exp}_{p}^{-1}q, \text{ exp}_{p}^{-1}\mathcal {P}_\mathcal {C}(q)\rangle . \end{aligned}$$
(31)

By using that \((0, \pi /2)\ni x\mapsto {x}/{\sin (x)}>1\), the function \((0, \pi /2)\ni x\mapsto \cos (x)\) is decreasing, and \(0\le d(p,q)\le {\bar{\theta }}<\pi /2\), we obtain

$$\begin{aligned} \cos ({\bar{\theta }}) \le \cos d(q, p) \le \frac{d(q, p)\cos d(q, p)}{ \sin d(p, q)}. \end{aligned}$$
(32)

On the other hand, the function \((0, \pi /2)\ni x\mapsto {(1-\cos (x))}/{(x\sin (x))}>1/2\). Thus, by considering \(d(q, \mathcal {P}_\mathcal {C}(q))\le d(q, p)\) and \(d(p,q)\le {\bar{\theta }}<\pi /2\), we have

$$\begin{aligned} \frac{1}{2}\le \frac{1- \cos d(\mathcal {P}_\mathcal {C}(q), p )}{d(p, \mathcal {P}_\mathcal {C}(q))\sin d(p, \mathcal {P}_\mathcal {C}(q))}. \end{aligned}$$
(33)

Therefore, by combining (31) with (32) and (33), inequality (27) follows. \(\square \)

To simplify the notations we take \(\theta >0\) such that

$$\begin{aligned} {\bar{\theta }}{:=}\arccos (\theta ) <\frac{\pi }{2}. \end{aligned}$$
(34)

Consider a function f as defined in (12).

Lemma 8

Let \(p \in \mathcal {C}\) be such that \({{\,\textrm{grad}\,}}f(p)\ne 0\) and the constants \({\theta }\) and \({\bar{\theta }}\) satisfying (34). Assume that \(\alpha \in {{\mathbb {R}}}\) satisfies \(0<\alpha \le {\bar{\theta }}/{\Vert {{\,\textrm{grad}\,}}f(p)\Vert }\). Then, we have

$$\begin{aligned}{} & {} \left\langle {{\,\textrm{grad}\,}}f(p), \text{ exp}_{p}^{-1}\mathcal {P}_\mathcal {C}\left( \exp _{p}\left( -\alpha {{\,\textrm{grad}\,}}f(p)\right) \right) \right\rangle \nonumber \\{} & {} \quad \le - \frac{\theta }{\alpha }d^2\left( p,\mathcal {P}_\mathcal {C}\left( \exp _{p}\left( -\alpha {{\,\textrm{grad}\,}}f(p)\right) \right) \right) . \end{aligned}$$
(35)

Proof

Let \(p \in \mathcal {C}\) and \(\alpha >0\). To simplify the notations we set

$$\begin{aligned} v{:=}{{\,\textrm{grad}\,}}f(p), \quad \qquad q(\alpha ){:=}\text{ exp}_{p}(-\alpha {{\,\textrm{grad}\,}}f(p)). \end{aligned}$$
(36)

Since \({{\,\textrm{grad}\,}}f(p)\ne 0\), we have \(\mathcal {P}_\mathcal {C}(q(\alpha ))\ne p\). By using (36), \({\bar{\theta }} <\pi /2\) and \(\alpha \le {\bar{\theta }}/{\Vert {{\,\textrm{grad}\,}}f(p)\Vert }\) we conclude that \(d(p, q(\alpha ))=\alpha \Vert {{\,\textrm{grad}\,}}f(p)\Vert \le {\bar{\theta }} <\pi /2\). Thus, we have \(-\alpha v= \text{ exp}_{p}^{-1}q(\alpha )\). By applying Lemma 7 with \(q= q(\alpha )\), we obtain that

$$\begin{aligned} \theta d^2(p, \mathcal {P}_\mathcal {C}(q(\alpha )))\le \left\langle \text{ exp}_{p}^{-1}q(\alpha ), \text{ exp}_{p}^{-1}\mathcal {P}_\mathcal {C}(q(\alpha )) \right\rangle . \end{aligned}$$

Thus, we have \(\langle v, \text{ exp}_{p}^{-1}\mathcal {P}_\mathcal {C}(q(\alpha ))\rangle \le -\frac{\theta }{\alpha } d^2(p, \mathcal {P}_{C}(q(\alpha ))\), which, by using (36), implies (35). \(\square \)

Let \({\mathcal {C}} \subseteq \Omega \) be a closed spherically convex set. If \({\bar{p}}\in \mathcal {C}\) is a solution of the problem (12) then

$$\begin{aligned} \left\langle Df ({\bar{p}}), {{\,\textrm{Proj}\,}}_{{\bar{p}}} p \right\rangle =\left\langle {{\,\textrm{grad}\,}}f ({\bar{p}}), {{\,\textrm{Proj}\,}}_{{\bar{p}}} p \right\rangle =\left\langle {{\,\textrm{grad}\,}}f ({\bar{p}}), p \right\rangle \ge 0, \quad \forall ~p\in \mathcal {C}. \end{aligned}$$
(37)

Any point satisfying (37) is called a stationary point for problem (12).

In the following corollary we present two important properties of the projection, which is related to the stationary points of problem (12).

Corollary 4

Let \({{\bar{p}}} \in \mathcal {C}\) be such that \({{\,\textrm{grad}\,}}f({{\bar{p}}})\ne 0\) and the constants \({\theta }\) and \({\bar{\theta }}\) satisfying (34). Assume that \(\alpha \in {{\mathbb {R}}}\) satisfies \(0<\alpha \le {\bar{\theta }}/{\Vert {{\,\textrm{grad}\,}}f({{\bar{p}}})\Vert }\). Then, there hold:

  1. (i)

    The point \({{\bar{p}}}\) is stationary for problem (12) if and only if \({{\bar{p}}}=\mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) \).

  2. (ii)

    If \({{\bar{p}}}\) is a nonstationary point for problem (12), then

    $$\begin{aligned} \left\langle {{\,\textrm{grad}\,}}f({{\bar{p}}}), \text{ exp}_{{\bar{p}}}^{-1}\mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) \right\rangle < 0. \end{aligned}$$
    (38)

    Equivalently, if there exists \({\bar{\alpha }} \in {{\mathbb {R}}}\) such that \(0<{\bar{\alpha }} \le {\bar{\theta }}/{\Vert {{\,\textrm{grad}\,}}f({{\bar{p}}})\Vert }\) and

    $$\begin{aligned} \left\langle {{\,\textrm{grad}\,}}f({{\bar{p}}}), \text{ exp}_{p}^{-1}\mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -{\bar{\alpha }}{{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) \right\rangle \ge 0, \end{aligned}$$
    (39)

    then \({{\bar{p}}}\) is stationary for problem (12).

Proof

To prove item (i), we first assume that \({\bar{p}}\in \mathcal {C}\) is a stationary point for problem (12). Assume also by contradiction that

$$\begin{aligned} {{\bar{p}}}\ne \mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) . \end{aligned}$$
(40)

It follows from (37) that

$$\begin{aligned} \left\langle {{\,\textrm{grad}\,}}f ({\bar{p}}), {{\,\textrm{Proj}\,}}_{{\bar{p}}} p \right\rangle \ge 0, \quad \forall ~p\in \mathcal {C}. \end{aligned}$$
(41)

By using (18) and \({{\,\textrm{Proj}\,}}_{{\bar{p}}} p= p- \langle {{\bar{p}}}, p \rangle {{\bar{p}}}\), the combination of (40) and (41) implies that

$$\begin{aligned} \left\langle {{\,\textrm{grad}\,}}f ({{\bar{p}}}), \text{ exp}_{{\bar{p}}}^{-1}\mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) \right\rangle \ge 0. \end{aligned}$$
(42)

By using Lemma 8, we conclude that \(d({{\bar{p}}},\mathcal {P}_\mathcal {C}(\exp _{{\bar{p}}}(-\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}}))))\le 0 \), which contradicts (40). Therefore, \({{\bar{p}}}=\mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) \). Now, assume that \({{\bar{p}}}=\mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) \). By using Proposition 6 together with Remark 1, we obtain that

$$\begin{aligned} \langle \text{ exp}_{{\bar{p}}}^{-1}\exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) , \text{ exp}_{{\bar{p}}}^{-1}p\rangle \le 0, \qquad \forall ~p\in \mathcal {C}. \end{aligned}$$

or equivalently, \(\langle \alpha {{\,\textrm{grad}\,}}f({{\bar{p}}}), \text{ exp}_{{\bar{p}}}^{-1}p\rangle \ge 0\), for all \(p\in \mathcal {C}\). Thus, considering \(\alpha >0\), by using (18), the last inequality implies that

$$\begin{aligned} \left\langle {{\,\textrm{grad}\,}}f ({\bar{p}}), {{\,\textrm{Proj}\,}}_{{\bar{p}}} p \right\rangle \ge 0, \quad \forall ~p\in \mathcal {C}. \end{aligned}$$

Therefore, the point \({{\bar{p}}}\) is stationary for problem (12) and (i) is proved. We proceed to prove item (ii). Take a nonstationary point \({{\bar{p}}}\) for problem (12). Thus, item (i) implies that

$$\begin{aligned} {{\bar{p}}}\ne \mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) . \end{aligned}$$

Thus, by applying Lemma 8, we conclude that

$$\begin{aligned} 0< & {} \frac{\theta }{\alpha }d^2\left( {{\bar{p}}},\mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) \right) \\\le & {} - \left\langle {{\,\textrm{grad}\,}}f({{\bar{p}}}), \text{ exp}_{{{\bar{p}}}}^{-1}\mathcal {P}_\mathcal {C}\left( \exp _{{{\bar{p}}}}\left( -\alpha {{\,\textrm{grad}\,}}f({{\bar{p}}})\right) \right) \right\rangle , \end{aligned}$$

which implies (38), and therefore the first sentence of item (ii) is proved. Finally, note that the second statement of item (ii) is the contrapositive of the first sentence. \(\square \)

4.3 Gradient projection method on the sphere

In this section we present the gradient projection method to solve the constrained optimization problem (12). For that, let \(\mathcal {C}^*\ne \varnothing \) be the solution set of the problem (12) and \(f^*{:=} \inf _{p\in \mathcal {C}}f(p)\) be the optimum value of f. From now on, we assume that

(H1):

\({{\,\textrm{grad}\,}}f\) is Lipschitz continuous on \(\mathcal {C}\subseteq \mathbb {S}^n \) with constant \(L\ge 0\).

To proceed, we need a constant \(\zeta \in {{\mathbb {R}}}\) such that

$$\begin{aligned} {\text {max}}_{p\in \mathcal {C}}\Vert {{\,\textrm{grad}\,}}f(p)\Vert \le \zeta <+\infty . \end{aligned}$$
(43)

Next, we present an example of a function f and an upper bound for its gradient \({{\,\textrm{grad}\,}}f\).

Example 7

Let \(f:\mathbb {S}^n \rightarrow \mathbb {R}\) be given \(f(p)=p^{\top }Ap\). Thus, it follows from Example 3 that \(\Vert {{\,\textrm{grad}\,}}f(p)\Vert \le 2\lambda _{{\text {max}}}(A)\). In this case, we can take \(\zeta =2\lambda _{{\text {max}}}(A)\).

The conceptual version of the gradient projection method to solve problem (12) is given in Algorithm 1.

figure a

Remark 2

Since the sphere \(\mathbb {S}^n\) is compact and f is a differentiable function, there exists \({{\bar{q}}}\in \mathbb {S}^n\) a minimizer of f and \({{\,\textrm{grad}\,}}f({{\bar{q}}})=0\). Consequently, if \({{\,\textrm{grad}\,}}f\) is Lipschitz continuous on \(\mathbb {S}^n\), then it follows from Definition 3 that \(\Vert {{\,\textrm{grad}\,}}f(p)\Vert \le Ld(p,{{\bar{q}}})\), for all \(p\in \mathbb {S}^n\). Hence, \(\Vert {{\,\textrm{grad}\,}}f(p)\Vert \le \pi L\), for all \(p\in \mathbb {S}^n\), and \(\zeta \) in (43) can be taken as \(\zeta =\pi L\). In order to obtain the biggest interval for the step-size \(\alpha \) in (44), by considering (34), we must take \(0<\theta <1\) such that \(\theta ={\bar{\theta }}\), which implies that \({\bar{\theta }}>0.7\). Therefore, it follows from (44) that we can take \(0<\alpha < 0.7/{(\pi L)}\). As we will see in Sect. 5, for the special case \(f(p)=p^{\top }Ap\), this interval can be taken bigger.

In the next proposition we prove that Algorithm 1 is well defined.

Proposition 8

Algorithm 1 is well defined and generates a sequence \((p_k)_{k\in {\mathbb {N}}}\subseteq C\).

Proof

Considering \(p_0\in \mathcal {C}\), without loss of generality we can assume that \(p_k\in \mathcal {C}\). Since \(\zeta \) satisfies (43) and \(\alpha \) satisfies (44), we have \(d(p_k,\exp _{p_k}\left( -\alpha {{\,\textrm{grad}\,}}f(p_k)\right) )= \alpha \Vert {{\,\textrm{grad}\,}}f(p_k)\Vert < {\bar{\theta }}\). Thus, considering \({\bar{\theta }} <\pi /2\), we conclude that \(d(p_k,\exp _{p}\left( -\alpha {{\,\textrm{grad}\,}}f(p_k)\right) ) <\pi /2\). Hence, due to \(p_k\in \mathcal {C}\), it follows from (25) that

$$\begin{aligned} d\big ( \mathcal {P}_\mathcal {C}(\exp _{p_k}\left( -\alpha {{\,\textrm{grad}\,}}f(p_k)\right) ),\exp _{p_k}\left( -\alpha {{\,\textrm{grad}\,}}f(p_k)\right) \big ) <\frac{\pi }{2}. \end{aligned}$$

Thus, Proposition 6 implies that \( \mathcal {P}_\mathcal {C}(\exp _{p_k}\left( -\alpha {{\,\textrm{grad}\,}}f(p_k)\right) )\) is a singleton. Therefore, Algorithm 1 is well defined and the point \(p_{k+1}\) belongs to the set C. \(\square \)

Next, we present an inequality which plays an important role in the analysis of the sequence \((p_k)_{k\in {{\mathbb {N}}}}\) generated by Algorithm 1.

Lemma 9

The following inequality holds

$$\begin{aligned} f(p_{k+1}) \le f(p_{k}) - \left( \frac{2\theta -\alpha L}{2\alpha }\right) d^2(p_{k}, p_{k+1}), \qquad k=0, 1, \ldots . \end{aligned}$$
(46)

In particular, the sequence \((f(p_k))_{k\subseteq {{\mathbb {N}}}}\) is non-increasing and converges.

Proof

Since \(p_0\in \mathcal {C}\), it follows from Proposition 8 that \((p_k)_{k\in {{\mathbb {N}}}}\subseteq C\). By applying Lemma 5 with \(\mathcal {D}=S^n\), \(p=p_{k}\) and \(q=p_{k+1}\), we have

$$\begin{aligned} f(p_{k+1}) \le f(p_{k}) + \langle {{\,\textrm{grad}\,}}f(p_{k}), \text{ exp}_{p_k}^{-1}p_{k+1}\rangle +\frac{L}{2}d^2(p_{k}, p_{k+1}). \end{aligned}$$

Thus, by applying Lemma 8 with \(p=p_{k}\) and by taking into account (45), we conclude that

$$\begin{aligned} f(p_{k+1})\le & {} f(p_{k}) - \frac{\theta }{\alpha }d^2(p_{k}, p_{k+1}) +\frac{L}{2}d^2(p_{k}, p_{k+1})\nonumber \\= & {} f(p_{k}) - \left( \frac{2\theta -\alpha L}{2\alpha }\right) d^2(p_{k}, p_{k+1}), \end{aligned}$$
(47)

which implies that (46) holds. It follows from (44) that \(({2\theta -\alpha L)}/{2\alpha }>0\), and by using (46), we conclude that \((f(p_k))_{k\in {{\mathbb {N}}}}\) is non-increasing. Moreover, since \(-\infty <f^*\) and \((f(p_k))_{k\in {{\mathbb {N}}}}\) is non-increasing, it follows that it converges. \(\square \)

In the following we prove that any cluster point of \((p_k)_{k\in {{\mathbb {N}}}}\) is a solution of the problem (12).

Theorem 9

If \({{\bar{p}}}\in \mathcal {C}\) is a cluster point of the sequence \((p_k)_{k\in {{\mathbb {N}}}}\), then \({\bar{p}}\) is a stationary point for problem (12).

Proof

If \({{\,\textrm{grad}\,}}f({\bar{p}})=0\), then (37) implies that \({\bar{p}}\in \mathcal {C}\) is a stationary point for problem (12). Now, assume that \({{\,\textrm{grad}\,}}f({\bar{p}})\ne 0\). Lemma 9 implies that \((f(p_k))_{k\in {{\mathbb {N}}}}\) is non-increasing, which together with \(-\infty <f^*\) yield that it converges. By using (46), we have

$$\begin{aligned} d^2(p_{k}, p_{k+1}) \le \frac{2\alpha }{2\theta -\alpha L} \left( f(p_{k}) - f(p_{k+1}) \right) , \qquad k = 0,1, \ldots . \end{aligned}$$
(48)

Thus, we obtain that \(\lim _{k\rightarrow +\infty }d(p_{k}, p_{k+1})=0\). Let \({{\bar{x}}}\) be a cluster point of \((p_k)_{k\in {{\mathbb {N}}}}\) and \((p_{k_j})_{j\in {{\mathbb {N}}}}\) be a subsequence of \((p_k)_{k\in {{\mathbb {N}}}}\) such that \(\lim _{j\rightarrow +\infty }p_{k_j}=~\bar{x}\). Because \(\lim _{j\rightarrow +\infty }d(p_{k_j+1}, p_{k_j})=0\), we have \(\lim _{j\rightarrow +\infty }p_{k_j+1}={{\bar{p}}}\). On the other hand, considering \(p_{k_j+1} = \mathcal {P}_\mathcal {C}(\exp _{p_{k_j}}\left( -\alpha {{\,\textrm{grad}\,}}f(p_{k_j})\right) )\), for \(j=0, 1, \ldots \), it follows from item (ii) of Proposition (6) that

$$\begin{aligned} \left\langle {{\,\textrm{Proj}\,}}_{p_{k_j+1}}\left( \exp _{p_{k_j}}(-\alpha {{\,\textrm{grad}\,}}f(p_{k_j}))\right) , {{\,\textrm{Proj}\,}}_{p_{k_j+1}}q \right\rangle \le 0, \quad \forall ~q\in \mathcal {C}. \end{aligned}$$

Thus, by taking limit and by using that \( {{\,\textrm{grad}\,}}f\) is continuous, we obtain that

$$\begin{aligned} \left\langle {{\,\textrm{Proj}\,}}_{{\bar{p}}}\left( \exp _{{\bar{p}}}(-\alpha {{\,\textrm{grad}\,}}f({\bar{p}}))\right) , {{\,\textrm{Proj}\,}}_{{\bar{p}}}q \right\rangle \le 0, \quad \forall ~q\in \mathcal {C}, \end{aligned}$$

which is equivalent to \(\left\langle \exp _{{\bar{p}}}(-\alpha {{\,\textrm{grad}\,}}f({\bar{p}})), {{\,\textrm{Proj}\,}}_{{\bar{p}}}q \right\rangle \le 0\), for all \(q\in \mathcal {C}\). Hence, by setting \(v=-\alpha {{\,\textrm{grad}\,}}f({\bar{p}})\) and by using (16), we have

$$\begin{aligned} 0\ge \left\langle \cos (\Vert v\Vert ){{\bar{p}}}+ \sin (\Vert v\Vert ) \frac{v}{\Vert v\Vert }, {{\,\textrm{Proj}\,}}_{{\bar{p}}}q \right\rangle = \frac{\sin (\Vert v\Vert )}{\Vert v\Vert }\left\langle v, {{\,\textrm{Proj}\,}}_{{\bar{p}}}q \right\rangle , \quad \forall ~q\in \mathcal {C}. \end{aligned}$$
(49)

By combining equations (43), (44), and (34), we deduce the following inequality

$$\begin{aligned} \Vert v\Vert =\alpha \Vert {{\,\textrm{grad}\,}}f({\bar{p}})\Vert \le \alpha \zeta \le {\bar{\theta }} <\frac{\pi }{2}. \end{aligned}$$

Consequently, \(\sin (\Vert v\Vert )\ge 0\), which combined with (49) leads to \(0\ge \langle v, {{\,\textrm{Proj}\,}}_{{\bar{p}}}q \rangle \), for all \(q\in \mathcal {C}\). Thus, due to \(v=-\alpha {{\,\textrm{grad}\,}}f({\bar{p}})\) and \(\alpha >0\), we conclude that

$$\begin{aligned} \left\langle {{\,\textrm{grad}\,}}f({\bar{p}}), {{\,\textrm{Proj}\,}}_{{\bar{p}}}q \right\rangle \ge 0, \quad \forall ~q\in \mathcal {C}, \end{aligned}$$

which by using (37) implies that \({\bar{x}} \in \mathcal {C}\) is a stationary point for problem (12). \(\square \)

Item (i) of Corollary 4 implies that if \({p_k}=\mathcal {P}_\mathcal {C}(\exp {{p_k}}(-\alpha _k{{\,\textrm{grad}\,}}f({p_k})))\), then \({p_k}\) is a stationary point for problem (12). Additionally, (45) implies that

$$\begin{aligned} d({p_k}, p_{k+1})= d\left( {p_k},\mathcal {P}_\mathcal {C}(\exp {{p_k}}(-\alpha _k{{\,\textrm{grad}\,}}f({p_k})))\right) . \end{aligned}$$

Consequently, the quantity \(d({p_k}, p_{k+1})\) can be seen as a measure of the stationarity of \({p_k}\). The next theorem presents an iteration-complexity bound for this measure. To simplify its statement we define

$$\begin{aligned} \eta {:=} (2\theta -\alpha L)/(2\alpha ) > 0. \end{aligned}$$
(50)

Theorem 10

For all \(N\in \mathbb {N}\) there holds

$$\begin{aligned} {\text {min}}\left\{ d(p_{k}, p_{k+1}):~k=0, 1, \ldots , N \right\} \le \sqrt{\frac{f(p_{0}) - f^*}{\eta }} \frac{1}{\sqrt{N+1}}, \end{aligned}$$

Proof

By using again (46) and (50), we have \(d^2(p_{k}, p_{k+1})\le \left( f(p_{k}) -f(p_{k+1}) \right) /\eta \), for all \(k=0, 1, \ldots \). Since \(f^* \le f(p_k)\) for all k, the last inequality implies

$$\begin{aligned} \sum _{k=0}^{N} d^2(p_{k+1}, p_{k})\le \frac{1}{\eta }\sum _{k=0}^{N} \left( f(p_{k}) - f(p_{k+1})\right) \le \frac{1}{\beta }(f(p_{0}) - f^*) . \end{aligned}$$
(51)

Therefore, \((N+1) {\text {min}}\left\{ d^2(p_{k}, p_{k+1}):~k=0,1, \ldots , N \right\} \le (f(p_{0}) - f^*)/\eta \), which is equivalent to the desired inequality. \(\square \)

It should be mentioned that in Algorithm 1 we need the Lipschitz constant for \({{\,\textrm{grad}\,}}f\) and an upper bound for \(\Vert {{\,\textrm{grad}\,}}f\Vert \) on \(\mathcal {C} \subseteq \mathbb {S}^n\). However, these constants are not always known or computable. For large-scale problems or in case when we deal with the positive semidefinite cone, this quantity is not easily computable. Therefore, in this case we state another variant of Algorithm 1 with a backtracking stepsize rule approximating the Lipschitz constant, which can also be shown to accumulate at stationary points as well. The conceptual version of the gradient projection method on the sphere with a backtracking stepsize rule to solve the problem (12) is as follows:

figure b

5 Special case

In this section we state Algorithm 1 to solve the following constrained optimization problem

$$\begin{aligned} \;\; {\text {min}}\{f(p){:=}\langle Ap, p\rangle \,: \, p\in \mathcal {C} \}, \end{aligned}$$
(54)

where \(A:\mathbb {V}^{n+1}\rightarrow \mathbb {V}^{n+1}\) is a linear operator with \(\lambda _{max}(A)\ne \lambda _{min}(A)\) and \(\mathcal {C}\subseteq \mathbb {S}^n\). For that, we first note that by Lemma 4 we have \(L=2 (\lambda _{{\text {max}}}(A)-\lambda _{{\text {min}}}(A))\). Moreover, by using (43) and Example 7, we can take \(\zeta =2 \lambda _{{\text {max}}}(A)\). Consequently, by considering that \(0.7<\arccos (0.7)\), we can also take \(\alpha \in {{\mathbb {R}}}\) in Algorithm 1 satisfying

$$\begin{aligned} 0<\alpha < \frac{0.35}{\lambda _{{\text {max}}}(A)}. \end{aligned}$$
(55)

It is worth recalling that for projecting on \(\mathcal {C}\) we use Proposition 7, i.e., if \(P_{\mathcal {K}_\mathcal {C}}(p)\ne 0\), then

$$\begin{aligned} \mathcal {P}_\mathcal {C}(p)=\frac{P_{\mathcal {K}_\mathcal {C}}(p)}{\Vert P_{\mathcal {K}_\mathcal {C}}(p)\Vert }, \end{aligned}$$

where \( P_{\mathcal {K}_\mathcal {C}}(p)\) denotes the usual orthogonal projection onto the cone \( \mathcal {K}_\mathcal {C}\). Therefore, in this specific case Algorithm 1 can be stated as follows:

figure c

Remark 3

In practical applications we use upper bound for the value \(\lambda _{max}(A)-\lambda _{min}(A)\).

In the following examples we explicit the first equality in (56) for \(\mathcal {C}\subseteq \mathbb {S}^n\) the nonnegative orthant, the Lorentz cone and the positive semidefinite cone.

Example 8

For \(\mathcal {C}=\mathbb {R}^{n+1}_+\cap \mathbb {S}^n\) we have \(\mathcal {K}_\mathcal {C}= \mathbb {R}^{n+1}_+\). Thus, it follows from Example 4 that the first equality in (56) becomes

$$\begin{aligned} p_{k+1}=\frac{q_k^+}{\Vert q_k^+\Vert }. \end{aligned}$$

Example 9

For \(\mathcal {C}={\mathcal {L}^n} \cap \mathbb {S}^n\), we obtain that \(\mathcal {K}_\mathcal {C}={\mathcal {L}^n}\). Thus, letting \(q_k{:=}(x_k, t_k)\in \mathbb {R}^n \times \mathbb {R}\), by using Example 5, the first equality in (56) can be written as follows

$$\begin{aligned} p_{k+1}= {\left\{ \begin{array}{ll} (x_k,t_k), &{} \quad t_k \ge \Vert x_k\Vert , \\ \frac{1}{\sqrt{2}\Vert x_k\Vert }\left( x_k,\Vert x_k\Vert \right) , &{} \quad -\Vert x_k\Vert< t_k < \Vert x_k\Vert . \end{array}\right. } \end{aligned}$$

Example 10

For \(\mathcal {C}=\mathcal {S}^n_+ \cap \mathbb {S}^n\) we have \(\mathcal {K}_\mathcal {C}=\mathcal {S}^n_+\). Let \(q_k\in \mathcal {C}\) and \(\{v^{1k}, v^{2k},\dots ,v^{nk}\}\) be an orthonormal system of eigenvectors of the matrix \(q_{k}\) corresponding to the eigenvalues \(\lambda _{1k}, \lambda _{2k}, \ldots , \lambda _{nk}\), respectively. Thus, by using the spectral decomposition of \(q_{k}\), we have

$$\begin{aligned} q_k=\sum _{i=1}^n\lambda _{ik} v^{ik}(v^{ik})^{\top }. \end{aligned}$$

Thus, Example 6 implies that the first equality in (56) is given by

$$\begin{aligned} p_{k+1}{:=} \frac{\sum _{i=1}^{n} (\lambda _{ik})^+ v^{ik} (v^{ik})^T}{\Vert \sum _{i=1}^{n} (\lambda _{ik})^+ v^{ik} (v^{ik})^T \Vert }, \end{aligned}$$
(57)

In the following section we present numerical results, where we show how our algorithm can be used to test copositivity of operators with respect to the nonnegative orthant, to the Lorentz cone and to the positive semidefinite cone.

5.1 Numerical experiments

We will use the algorithm presented in the previous section to test copositivity of operators with respect to different cones. We implemented the algorithms in Matlab 2018b. If the output of the algorithm is negative, then we know that the operator is not copositive with respect to the cone \(\mathcal {K}\). On the other hand, if we run the algorithm several times with different starting points and the output is nonnegative each time, then we can just guess that the operator might be copositive with respect to \(\mathcal {K}\).

5.1.1 Positive orthant and Lorentz cone

Firstly, we consider the cases when the cone is the nonnegative orthant and the Lorentz cone, respectively. For this, we will use the following function \(f(p)=p^{\top }Ap\) and consider the optimization problem given in (12).

Example 11

Consider the following matrix which is not copositive, see [26]:

$$\begin{aligned} A_1= \begin{pmatrix} 1 &{}-0.72&{} -0.59&{} 1 \\ -0.72 &{} 1 &{}-0.6 &{}-0.46\\ -0.59&{} -0.6 &{}1 &{}-0.6\\ 1 &{}-0.46 &{}-0.6 &{}1 \end{pmatrix}. \end{aligned}$$
(58)

Firstly, we consider the matrix \(A_1\) given in (58) and \(\mathcal {K}\) is the nonnegative orthant in problem (12). Thus, by running Algorithm 3 with starting point \(p_0=[0.5 \; 0.5 \; 0.5 \; 0.5]\), we obtain that \(f^*=-0.2756<0\). In fact, we confirm the known result that the matrix \(A_1\) is not copositive. Now we consider problem (12) with the Lorentz cone. In this case, if we run Algorithm 3 with \(p_0=[\frac{\sqrt{3}}{6} \; \frac{\sqrt{3}}{6} \; \frac{\sqrt{3}}{6} \; \frac{3}{4}]\), we obtain that \(f^*=-0.0545<0\). Thus, \(A_1\) is also not copositive with respect to the Lorentz cone. The copositivity with respect to the Lorentz cone can also be verified by using Proposition 3. Indeed, suppose that \(A_1\) is copositive with respect to the Lorentz cone. Therefore, we can find a \(\mu \in \mathbb {R}_+\) such that matrix \(A_1-\mu J\) is positive semidefinite. Then we have that one of its principle minors

$$\begin{aligned} \begin{vmatrix} 1+\mu&1\\ 1&1-\mu \end{vmatrix}\ge 0, \end{aligned}$$

which is equivalent to \(\mu \le 0\), hence we conclude that \(\mu =0\). Thus, if \(A_1\) is copositve with respect to the Lorentz cone, then \(A_1\) is positive semidefinite. But the matrix \(A_1\) has one negative eigenvalue, which implies that it is not positive semidefinite. Thus, we obtain as before that \(A_1\) is not copositive with respect to the Lorentz cone.

Example 12

We take the well-known Horn matrix:

$$\begin{aligned} A_2= \begin{pmatrix} 1&{} -1 &{}1&{} 1 &{} -1\\ -1&{} 1&{} -1&{} 1 &{} 1\\ 1&{}-1&{} 1 &{}-1 &{} 1\\ 1 &{}1&{} -1 &{} 1 &{} -1 \\ -1 &{} 1 &{} 1 &{} -1 &{} 1 \end{pmatrix}. \end{aligned}$$
(59)

If we consider the Horn matrix given in (59), by applying Algorithm 3 in the case when the cone is the nonnegative orthant we obtain that \(f^*=1>0\). Hence, it might be strictly copositive with respect to the nonnegative orthant. On the other hand, if we consider the Lorentz cone and the starting point \(p_0=[0 \; 0\; 0\; 0\; 1]\), we obtain that \(f^*=-1.2018<0\). Therefore, we can conclude that it is not copositive with respect to the Lorentz cone, which can also be verified by using once again Proposition 3. Indeed, if \(A_2\) would be copositive with respect to the Lorentz cone, then there would exist a \(\mu \in \mathbb {R}_+\) such that

$$\begin{aligned} A_2-\mu J= \begin{pmatrix} 1-\mu &{} -1 &{}1&{} 1 &{} -1\\ -1&{} 1-\mu &{} -1&{} 1 &{} 1\\ 1&{}-1&{} 1-\mu &{}-1 &{} 1\\ 1 &{}1&{} -1 &{} 1-\mu &{} -1 \\ -1 &{} 1 &{} 1 &{} -1 &{} 1+\mu \end{pmatrix}. \end{aligned}$$

is positive semidefinite. Since the matrix \(A_2-\mu J\) is positive semidefinite, all the principal minors of this matrix must be nonnegative. Thus, we have

$$\begin{aligned} \begin{vmatrix} 1+\mu&-1\\ -1&1-\mu \end{vmatrix}\ge 0, \end{aligned}$$

which is equivalent to \(1-\mu ^2-1\ge 0\), this leads to \(\mu \le 0\), hence we have \(\mu =0\). Thus, we conclude once again that the Horn matrix \(A_2\) is not copositive with respect to the Lorentz cone since \(A_2\) itself has two negative eigenvalues.

Example 13

Consider the Hoffmann-Pereira matrix:

$$\begin{aligned} A_3= \begin{pmatrix} 1&{} -1 &{}1&{} 0 &{} 0 &{} 1 &{} -1 \\ -1 &{} 1 &{}-1&{} 1&{} 0&{} 0&{} 1\\ 1&{} -1&{} 1&{} -1&{} 1&{} 0&{} 0\\ 0&{} 1 &{} -1 &{} 1&{} -1 &{}1 &{} 0\\ 0 &{} 0 &{} 1 &{} -1 &{}1&{} -1 &{}1 \\ 1&{} 0&{} 0&{} 1&{} -1&{} 1&{} -1\\ -1 &{}1 &{}0&{} 0&{} 1 &{}-1&{} 1 \end{pmatrix}, \end{aligned}$$
(60)

which is a copositive matrix, see [26]. If we run Algorithm 3 with different starting points in the nonnegative orthant case, we obtain that \(f^* =1 >0\). Hence, we might conclude that this matrix is strictly copositive with respect to the nonnegative orthant. However, if we consider the Lorentz cone with \(p_0=[0\; 0 \; 0 \; 0 \; 0 \;0 \;1]\) we obtain that \(f^*= -0.6519<0\), hence this matrix is not copositive with respect to the Lorentz cone. This can also be implied by using Proposition 3, since the matrix \(A_3-\mu J\) has the same principle minor

$$\begin{aligned} \begin{vmatrix} 1+\mu&-1\\ -1&1-\mu \end{vmatrix} \end{aligned}$$

as \(A_2-\mu J\). Hence, we can conclude that \(\mu =0\), which implies that \(A_3\) itself is not positive semidefinite because it has 4 negative eigenvalues. Thus, we get again that \(A_3\) is not copositive with respect to the Lorentz cone.

5.1.2 Results in case of generated matrices

In [26, 46] the authors considered a set of matrices related to maximum clique problem from the DIMACS collection [47], for which the real status of copositivity is known by construction, see [26] for justification. The matrices can be accessed at [48].

We applied Algorithm 3 to test the copositivity of most matrices given in [48] with respect to the nonnegative orthant. In these cases we ran the algorithm with 1000 randomly generated starting points. The algorithm detected correctly the copositivity status of the tested matrices. In the following table we can see the average number of iterations in some of the cases. In the other cases the behaviour of the algorithm is similar.

 

Order

Strictly Copositive

Not copositive

Average nr. of iterations

Hamming4-4-not-COP

16

 

Yes

6.28

Johnson6-2-4-not-COP

15

 

Yes

31.69

Johnson6-4-4-not-COP

15

 

Yes

31.92

Keller2-not-COP

16

 

Yes

13.54

sanchis22-Not-COP

22

 

Yes

141.68

Hamming4-4-in-Interior

16

Yes

 

77.62

Johnson6-2-4-in-Interior

15

Yes

 

56.48

Johnson6-4-4-in-Interior

15

Yes

 

56.66

Keller2-in-Interior

16

Yes

 

66.73

sanchis22-in-Interior

22

Yes

 

140.5

5.1.3 Positive semidefinite cone

Let \(\mathcal {S}^{n}\) be the vector space of the \(n\times n\) symmetric matrices, \(\Vert \cdot \Vert \) be the Frobenius norm in \(\mathcal {S}^{n}\) and \(\mathbb {S}^m=\{p\in \mathcal {S}^{n}:~ \Vert p\Vert ^2=1\}\) be the sphere, where \(m=n(n+1)/2-1\). Consider the following nonlinear programming problem

$$\begin{aligned} {\text {min}}_{p\in \mathcal {C} } f(p){:=}\textrm{tr}(pAp), \end{aligned}$$
(61)

where \(A:\mathcal {S}^{n} \rightarrow \mathcal {S}^{n}\) is a linear operator and \(\mathcal {C}=\mathbb {S}^m_{+}\) is the positive semidefinite cone.

In the next examples we present the results obtained by applying Algorithm 2 in two particular instances of problem (61).

Example 14

Take \(a\in \mathcal {S}^{n}\) and consider the linear operator \(A:\mathcal {S}^{n} \rightarrow \mathcal {S}^{n}\) defined by \(Ap=apa\). In this case, \(A^*=A\). Thus, by using (20), it follows that

$$\begin{aligned} {{\,\textrm{grad}\,}}f(p) = 2Ap - \textrm{tr}((2Ap)p)p. \end{aligned}$$
(62)

If we consider the matrix \(a=A_1\) given in (58) and the positive semidefinite cone, then by applying Algorithm 2 with different stating points we obtain that \(f^*\simeq 3 \cdot 10^{-6}\). Hence, the operator \(Ap=apa\) it might be copositive with respect to positive semidefinite cone.

Example 15

Take \(a\in \mathcal {S}^{n}\) and consider the linear operator \(A:\mathcal {S}^{n} \rightarrow \mathcal {S}^{n}\) defined by \(Ap=pa+ap\). We know that \(A^*=A\) and we also have \({{\,\textrm{grad}\,}}f(p) = 2Ap - \textrm{tr}((2Ap)p)p.\) Considering \(a=A_2\), the matrix given in (59), by applying Algorithm 2 we obtain that \(f^*=-2.4721<0\). Hence, we conclude that the operator \(Ap=pa+ap\) with the Horn matrix is not copositive with respect to the positive semidefinite cone.

6 Conclusions

We proposed a gradient projection algorithm to solve constrained optimization problems on the sphere in finite dimensional vector spaces. We presented the convergence analysis of this method. We studied the existence of solutions of a certain type of nonlinear cone-complementarity problem, by reducing it to optimizing a quadratic function on the intersection of the sphere and the corresponding cone. The latter problem was reduced to testing the cone-copositivity of the considered linear operator via the introduced algorithm. Furthermore, we provided several computational results, including numerical study of cone-copositivity of operators.