Conic formulation of QPCCs applied to truly sparse QPs

We study (nonconvex) quadratic optimization problems with complementarity constraints, establishing an exact completely positive reformulation under—apparently new—mild conditions involving only the constraints, not the objective. Moreover, we also give the conditions for strong conic duality between the obtained completely positive problem and its dual. Our approach is based on purely continuous models which avoid any branching or use of large constants in implementation. An application to pursuing interpretable sparse solutions of quadratic optimization problems is shown to satisfy our settings, and therefore we link quadratic problems with an exact sparsity term \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert {{\mathsf {x}}}\Vert _0$$\end{document}‖x‖0 to copositive optimization. The covered problem class includes sparse least-squares regression under linear constraints, for instance. Numerical comparisons between our method and other approximations are reported from the perspective of the objective function value.


Motivation
Many real-world applications can be modelled by (nonconvex) quadratic optimization problems with complementarity constraints (QPCCs) [35]. Due to nonconvexity of the objective function and the complementarity constraints, QPCCs form an NP-hard class of problems. Denoting by ℝ n the n-dimensional Euclidean space, by ℝ n + = { ∈ ℝ n ∶ x i ≥ 0 for all i} its positive orthant, and by ℝ m×n the set of all real m × n matrices, we study in this paper the following problem where ∈ ℝ n×n , ∈ ℝ n , ∈ ℝ n , ∈ ℝ m×n , ∈ ℝ m and ∈ ℝ n×n + : where the sign ⊤ denotes transposition. Since all F ij ≥ 0 , observe that F ii > 0 would entail x 2 i = 0 for all feasible , so that this variable can be dropped. Hence we may and do assume without loss of generality that all F ii = 0 ; likewise we may and do assume symmetry: ⊤ = and ⊤ = as well. Finally, we assume ≠ .
Problem (1) actually covers a wide range of relevant problems, including, for instance, all mixed-binary quadratic optimization problems (MBQPs) which are hard as well [1,20]: with I a suitable subset of all coordinate indices.
One of possible approaches to tackle problem (1) is conic relaxation. In particular, [15] proposed a conic relaxation model, which links QPCCs to conic linear optimization problems over the so-called completely positive cone, and which is exact (i.e., a conic reformulation) under some conditions. However, these conditions are not necessarily satisfied in many real-world applications including quadratic optimization problems with a true sparsity term ‖ ‖ 0 (see below). The purpose of the paper is to generalize the results of [15] such that the exact relaxation technique is applicable, for instance, to all sparse QPs.
Sparsity of a vector ∈ ℝ n usually means that the zero-(pseudo-)norm which counts the number of non-zero entries y i , i.e., is small, where Moreover, a vector is said to be k-sparse if ‖ ‖ 0 = k . We also recall that for q > 0 , the q-(pseudo-)norms are defined as (1)

3
Conic formulation of QPCCs applied to truly sparse QPs

Literature account (sparse...) and our contribution
Recently, there have been several attempts to solve (1) by a conic approach. For instance [21] employs a doubly nonnegative relaxation and applies a branch-andbound procedure combined with an augmented Lagrangian approach, assuming convexity of the objective. In a different setting, [39] treats (1) under the assumption that the feasible set is bounded. The same assumption is needed in [30] where the authors solve a conic relaxation with a proximal ADMM variant. In this study however, we present methods for non-convex objectives on unbounded feasible sets. The literature on sparse optimization and modelling is vast. The majority of applicable formulations replace the discontinuous (thus nonconvex, nonsmooth and non-Lipschitz) term ‖ ⋅ ‖ 0 by some approximations. To pursue sparsity, these approximations are either employed in constraints or else in an additive regularization term in the objective, involving a trade-off parameter. Such approximate terms include, but are not limited to, q-pseudonorms ‖ ⋅ ‖ q ( 0 < q < 1 ), p-norms ‖ ⋅ ‖ p ( p ≥ 1 ), or some combinations of them like ‖ ⋅ ‖ 1 − ‖ ⋅ ‖ 2 . A subjective sparse selection of related references is [4,6,17,19,27,32,34,37,38].
Among the few papers involving the 'true' sparsity term ‖ ⋅ ‖ 0 the likely closest to our proposal is [7]. There the authors use a mixed-binary conic formulation and treat the resulting nonlinear semidefinite optimization problem with a cutting-plane method involving several big-M constants; see also [18] for potential limitations of using similar mixed-integer approaches. Interestingly enough, most of the papers using purely continuous reformulations of problems involving ‖ ⋅ ‖ 0 in an explicit way, either in the objective or in the constraint, focus on local solutions via NLP solvers, with considerable success, demonstrated by convergence results and impressive empirical findings; see, e.g. [14,22,25,36].
However, questions of global optimality or bounds within this purely continuous model framework seem to have not been addressed before, to the best of our knowledge. The present study tries to narrow this research gap by copositive optimization, yielding rigorous bounds of high quality, albeit at the expense of increased computational cost. It is our hope that in the near future, this cost can be considerably reduced by tapping on recent advances in tractable approximations of copositive optimization problems. This paper offers: • a resulting conic relaxation of (1) which is exact under conditions that significantly generalize the classical approach (Burer's key condition); • the exactness conditions involve only the constraints and avoid any assumptions on the objective; rather, we show that they yield some implications for the objective which may be hard to test directly; • some conditions ensuring strong conic duality; • an approach avoiding discrete variables, use of branching or big constants, by focussing on a purely continuous (in fact, quadratic with linear constraints and one complementarity constraint) exact model of sparse optimization including the 'true' sparsity term ‖ ⋅ ‖ 0 ; this is important for the efficient performance of implementations, see [5] for a related observation with linear objective functions; • some experimental evidence on the quality comparison between direct use of ‖ ⋅ ‖ 0 and the most common approximations.
Let us sketch this in a bit more detail: instead of tackling (1) directly, we adopt copositive relaxation techniques [10,15] to convexify the targeted problem and end up with a linear conic problem. For surveys on copositive optimization readers are referred to [8,11,13,16,23,24,29]. Note that we propose a refined copositive formulation compared to the general approach via Burer's result [15]. The latter would result in additional constraints potentially detrimental to algorithmic performance. As already observed by [3], this relaxation cannot always be tight for problems of the form (1). Here we propose two widely applicable sufficient conditions on (1), ensuring the exactness of the relaxation. To demonstrate the procedure, an application to pursuing sparse solutions for quadratic problems is presented and applied to sparse linear regression, both with and without additional linear (in)equality constraints.
Note the contrast to previous exactness conditions for conic QPCC reformulations: those proposed in [9] for general quadratically constrained quadratic problems (QCQP) need a Slater-type condition on the linear constraints which typically is violated by sparse feasible solutions, so they cannot be applied directly. Those in [3] involve a copositivity condition on the objective which may be hard to test, in particular since the involved cone may be non-convex. On the other hand, [3] shows how to reformulate a general cone-constrained QCQP into another cone-constrained QPCC. As it turns out, even if the original cone constraints are linear, the reformulated cone constraints suggested by [3] involve second-order cones, which make copositivity tests even harder. It should be stressed that one of the merits of [3] lies in clarification of the underlying general structure. However, for practical purposes it seems justified to treat the particular case of QPCC among the generic QCQPs separately, exploiting the special circumstances.
Contrasting most of the studies in literature, we directly use the discontinuous (thus nonconvex, nonsmooth and non-Lipschitz) term ‖ ⋅ ‖ 0 to pursue sparsity, rather than using common approximations. While several conditions are proposed in the literature under which some approximations of ‖ ⋅ ‖ 0 yield indeed sparsity, up to now it remained largely unclear how good all above mentioned approximations are in general, compared to ‖ ⋅ ‖ 0 , in terms of ability of gaining sparsity without losing accuracy of the original problem. Here we will justify our approach by help of a small experimental study, where we compare the optimal values between our method and several recently popular approximate methods on a test set of randomly generated problems.

Paper organization and notation
The paper is arranged as follows. In Sect. 2, as a significant extension to Burer's results, we propose two sufficient conditions ensuring the exactness of copositive relaxation for QPCCs. As shown in [25] and detailed in Sect. 3.1, the problem of finding sparse solutions for quadratic optimization problems turns out to be a QPCC satisfying our proposed conditions, and therefore an exact copositive relaxation is obtained. The same holds true for convex objectives under hard sparsity constraints, as observed already in [14]; see Sect. 3.2. Both regimes (significantly different as noted, e.g., by [36]) are met in the case of sparse linear regression treated in Sect. 3.3. In Sect. 4, we compare our method with several approximate methods nowadays popular for sparse quadratic problems, from the perspective of optimal value. Throughout the paper, by bold lower case, e.g. , and bold upper case letter, e.g. , we denote a vector and a matrix respectively. For instance, and stand for zero vectors and matrices of suitable size, respectively. For any integers a, b we denote by [a ∶ b] the set of all integers k satisfying a ≤ k ≤ b . For ∈ ℝ n , the n-dimensional Euclidean space, and any nonempty I ⊂ [1 ∶ n] , we denote by I = [x i ] i∈I the restricted subvector. By ∑ n 1 x ij y ij we denote the inner product between symmetric matrices. We also define sets by: For a given matrix ∈ S n , ⊤ i presents the i-th row of and q ij stands for the ij-th entry. Particularly, by i ∈ ℝ n we mean the i-th column of the n × n identity matrix, and by = ∑ n i=1 i the all-ones vector in ℝ n , so that Δ n = { ∈ ℝ n + ∶ ⊤ = 1}.

Conic formulation: Burer's key condition
As announced, we focus on QPCCs of the form where ∈ S n , ∈ N n ⧵ { } with zero diagonal, , i ⊂ ℝ n and = [b 1 , … , b m ] ⊤ ∈ ℝ m . We collect ⊤ = [ 1 , … , m ] to get an m × n matrix and represent the linear constraints in the form is an all-quadratic problem of special structure; remark that the case where { , } ⊂ P n reduces to a traditional convex QP of the form which can be solved in polynomial time to arbitrary accuracy. In all other cases, this problem class includes NP-hard nonconvex subclasses. Problems of form (P) have been addressed in the seminal paper by Burer [15] who related (P) to the following conic optimization problem: The following horizon cone of L plays a crucial role in this analysis: The set of indices belonging to variables which remain bounded over L, is related to L ∞ by the implication ∈ L ∞ ⟹ B = . We need one more index set, namely the projection of the support of , We will view the sparsity pattern of as an undirected graph G ∶= (V, E) on the vertex set V = [1 ∶ n] and edge set E . Since by assumption ∈ N n , we have ⊤ = 0 if and only if x i x j = 0 for all {i, j} ∈ E , so G comprises all information on relevant to (P). Thus we can always replace by ∶= ∑ {i,j}∈E i ⊤ j without changing the problem.
With these notational preliminaries, we easily can formulate the so-called key condition put forward by Burer [15] in the analysis of QPCCs:

3
Conic formulation of QPCCs applied to truly sparse QPs in other words, any variable x j actually involved in the complementarity constraints, must remain bounded over the linear portion L of the feasible set. Under (K), Burer proved that the conic relaxation (C) of (P) is exact.

Relaxing the key condition: vertex covers
Unfortunately, condition (K) is not necessarily met in some relevant applications, and in other applications with a convex objective, the key condition is superfluous (even if the problem is not convex as may happen for ∉ P n ). In this section, we will propose less restrictive conditions ensuring exactness which can be successfully applied to several optimization problems, including sparse least-squares regression with soft or hard sparsity constraints, a hot topic in data science.
Given a graph G = (V, E) , recall that a subset C ⊆ V is said to be a vertex cover for but more general vertex covers C may also have insider edges, C × C ∩ E ≠ � ). Obviously, V itself is a vertex cover but it may be too large for our purposes, as we aim to replace Burer's S with a (typically smaller) vertex cover C for G .
The relaxed key condition reads as follows, using notation (2): or equivalently, there is a vertex cover C for G and some (P)-feasible ̂ such that for all j ∈ C there exists an M j > 0 such that Obviously, the conditions in (R) are met more easily if C is smaller, so the system ℭ( ) in which we collect all minimal vertex covers 1 for G is of particular relevance. But to keep our approach more transparent, we will focus on one vertex cover C first (which is sufficient for application to sparse regression) and later on, extend the following conditions for future applications to more general QPCCs. First let us note that indeed (R) relaxes (K):

Proof
We show that C ∶= j ∈ S ∶x j = 0 and ̂ ∈ L satisfy (R) under condition (K). Indeed, by definition ̂ C = and C ⊆ S . Furthermore, ̂ ⊤ ̂ = 0 implies x ixj = 0 for all {i, j} ∈ E , so that {i, j} ∩ C ≠ � , hence C is a vertex cover for G . ◻ (R) There is a vertex cover C ⊆ B for G and a point̂ ∈ L witĥ C = , x j = 0 and x j ≤ M j for all ∈ L .
In case of a convex objective, condition (K) and even the condition (R) are unnecessary. Actually, we only need the conclusion that is copositive on the cone the intersection of the horizon cone of L with the complementarity set Condition (R) guarantees that L ∞ ∩ F = L ∞ and the mentioned copositivity property of , which is also implied by convexity of the objective. Essential arguments are summarized in the following.
is bounded, and that (R) holds. Then L ∞ ⊆ F , and for any ∈ L ∞ we have ⊤ ≥ 0 . The latter property holds if ∈ COP n (e.g., because is positive-semidefinite).
Proof If condition (R) holds for some vertex cover C ⊆ V of G , pick an arbitrary Note: if ⊤ = 0 , we could infer, by the same reasoning, that ⊤ ∇f (̂ ) = 2 ⊤ ( ̂ + ) ≥ 0 , i.e., that could not be a strict descent direction for f at ̂ . But we will not explore this first-order information further here. Now we formulate a variant applicable to other QPCCs: , which obviously is a vertex cover for G . Clearly there is an C ∈ ℭ( ) with C ⊆ C , so C = . Then the result follows by arguments similar to those employed in the previous proof. ◻

3
Conic formulation of QPCCs applied to truly sparse QPs The following examples shall demonstrate wider applicability of our proposed conditions. But for the readers' convenience let us start with a counterexample already presented in [3] which shows that some conditions are needed to avoid a positive, or even infinite, conic relaxation gap:

Example 1
As argued in [3, Ex.1], the relaxation (C) is unbounded below while the optimal value of (P) is zero. As can be seen easily, conditions (K) and (R) are violated, and as well the one of Lemma 3: all ∈ L must have

Example 3
As in (4), (4) Therefore the system satisfies the condition in Lemma 3.

Example 4
As in (4), On the other hand, the class of minimal vertex covers of (6) is , the cone is nontrivial. However, the set { ∈ L ∶ C = } is empty now, because C = implies a contradiction, looking at the sum of the first two constraints and the third one, in conjunction with the sign constraints on x 1 and x 2 . Therefore, the condition of Lemma 3 is violated.
While (K) implies (R), it does not imply the condition of Lemma 3; we give the following example:
As a final note, it is easy to construct examples for which both (K) and the condition of Lemma 3 are satisfied [3]. Now we are in a position to state and prove the following result which generalizes the one in [15]. Although the main arguments are well known by now, we repeat them here for the readers' convenience. Note that even without any assumptions, the first part of the following proof also shows that (P) is feasible if (C) is feasible.
Conic formulation of QPCCs applied to truly sparse QPs Theorem 4 Suppose that is (L ∞ ∩ F)-copositive in the sense of Lemmas 2, 3. Then the QPCC problem (P) is equivalent to the conic problem (C).
Proof It is easy to see that any (P)-feasible ∈ ℝ n + gives rise to a (C)-feasible point = ⊤ ∈ CP n+1 of rank one with ⊤ = [1, ⊤ ] and same objective value f ( ) = ⟨ , ⊤ ⟩ + 2 ⊤ = g( ) . Therefore (C) is a conic relaxation of (P) and z * C ≤ z * P . We proceed to show equivalence by establishing the reverse inequality. To this end, take a (C)-feasible and optimal (or approximately optimal, in case of non-attainability for either of the problems (P) or (C); we will address primal attainability later). Then consider the following decomposition for this completely positive matrix: The following equalities hold directly by the decomposition (8) and the constraints in (C): Combining the equalities (9), (10) and (11), we have for any fixed i∈ [1 ∶ m], By the condition of equality case of the Cauchy-Schwarz inequality, there exists i ∈ ℝ such that Multiplying by k and summing relations (12) across k and by (9) and (10), we obtain Moreover, by (C)-feasibility of we have Therefore, by taking k ∶= 1 k k if k > 0 , we get k ∈ L ∩ F for all k ∈ K + while for k = 0 we obtain k = , i.e., k ∈ L ∞ ∩ F and therefore ⊤ k k ≥ 0 for all k ∈ K 0 . We obtain that all k are (P)-feasible, so that where the last inequality follows from which establishes the desired inequality z * P ≤ z * C and moreover that any k realizing the minimum in z * + provides an (approximate) optimal solution to (P). ◻ Note: since the feasible set of (P) is a finite union of polyhedra (if not empty), the Frank-Wolfe theorem guarantees attainability of z * P if this value is finite. Slater's condition (strict feasibility of the conic dual below) would ensure attainability of z * C , but above equivalence ensures this automatically: indeed, * = * ( * ) ⊤ solves (C) if ( * ) ⊤ = [1, ( * ) ⊤ ] and * solves (P).

A simplification of (C) and its dual
We show a possible simplification of (C) which is initially proposed in [15], reducing the cone in the feasible set from CP n+1 to CP n , and which works under From the decomposition (8) and as in the proof of Theorem 4 we have ⊤

3
Conic formulation of QPCCs applied to truly sparse QPs which allows us to rewrite (C) as: i and h ∈ ℝ for the constraint ⟨ , ⟩ = 0 , the dual problem of (CP) can be obtained: While we have, by preceding arguments under the assumption of (L ∞ ∩ F)-copositivity of and z * P ∈ ℝ, as the optimal value for (CP), we denote by the optimal value for (CD).
We will now establish sufficient conditions for strict feasibility of (CD), which means that Slater's condition holds for (CD). The condition seems quite natural as it is a sharpening of the conclusion of Lemma 2. It is met if is positive-definite, and for general if L is bounded (implying L ∞ = { } ), by default: Theorem 5 Suppose that z * P ∈ ℝ and that is strictly L ∞ -copositive: Proof First observe that by definition of = ⊤ , we have ⊤ = ( ) ⊤ = 0 for all ∈ L ∞ , hence ⊤ = ⊤ on L ∞ . Now, under the assumptions, there is a constant > 0 such that By continuity, there is a > 0 such that also in a small relative neighbourhood we have ⊤ ≥ > 0 . The continuous function ‖ ‖ 2 takes its positive minimum > 0 on the compact set Δ n ⧵ U which is disjoint from L ∞ ; furthermore, we have that ∶= min ∈Δ n ⊤ ∈ ℝ . Choosing : = max 0, − gives therefore a strictly copositive matrix ( , − , 0) with a quadratic form ⊤ + ‖ ‖ 2 taking values not smaller than over U but also, by choice of , not smaller than over Δ n ⧵ U . Hence Slater's condition holds for (CD) and the claim follows. ◻

QPCC and conic formulations for quadratic problems under sparsity constraints
In this section, we discuss two models for pursuing sparsity in quadratic optimization problems with the direct use of ‖ ⋅ ‖ 0 put forward by [14,25]. Both are based upon the following idea: negating the binary variables | sign (y i )| ∈ {0, 1} , we arrive again at binary variables u i ∶= 1 − | sign (y i )| ∈ {0, 1} which satisfy u i |y i | = 0 for all i. As no negative variables appear in this complementarity relation, we can proceed towards a QPCC with a single homogeneous bilinear equality constraint. An additional interesting feature of this approach is that binarity of u i is not needed here explicitly as a constraint, only the upper bounds u i ≤ 1 are already sufficient, as will be argued below. As already noted in the introduction, a naive approach via Burer's result [15] would result in additional constraints in the conic formulation which may be detrimental to algorithmic performance. Anyhow, some of them may be added in the implementation of specially structured problems.

No sign constraints on the variables
We will first treat sparse quadratic problems under linear equality constraints, but no sign constraints on the variables. At first sight, this may seem a restriction of models but actually any sign constraint on some variables would reduce z * C = z * D and the optimal value z * C is attained.
dimensionality rather than increase complexity, as will be clear in the next subsection. So consider the following soft sparsity model: where > 0 is a regularization parameter for balancing accuracy against sparsity: the larger , the more sparse the solution obtained from ((P S 0 )) will be. By introduction of new variables ∈ ℝ n + and decomposition of = − , where { , } ⊂ ℝ n + , problem ((P S 0 )) can be exactly expressed by a quadratic problem: Because of nonnegativity of ( , , ) , the homogeneous bilinear equality constraint ⊤ ( + ) = 0 is equivalent to the equality system Theorem 6 Problems (P S 0 ) and (P S ′ ) are equivalent.
Proof This was proved in [25], but we provide a short argument for the readers' convenience here. For any ∈ ℝ , let + ∶= max { , 0} and − = + − ; then ± ≥ 0 and | | = + + − as well as = + − − . If is feasible to (P S 0 ), then putting v i = (y i ) + and w i = (y i ) − as well as u i = 1 − | sign y i | ∈ {0, 1} one gets a feasible solution to (P S ′ ) with same objective value, so opt(P S � ) ≤ opt(P S 0 ) . Conversely, let For ( , , ) ∈ F � feasible to (P S ′ ), put ∶= − ∈ ℝ n and consider v � i ∶= (y i ) + , w � i ∶= (y i ) − and u � i ∶= sign u i . Then also ( � , � , � ) ∈ F � is ( P S ′)-feasible with lower or equal objective value due to ⊤ ≤ ⊤ ′ and � − � = − , and again, this value coincides with the objective value of in (P S 0 ). Thus opt(P S 0 ) ≤ opt(P S � ) . ◻ Given ( , , ) ∈ F � , we introduce slack variables ∶= − ∈ ℝ n + if ≤ and stack all variables to ∶= [ ⊤ , ⊤ , ⊤ , ⊤ ] ⊤ ∈ ℝ 4n + . Then problem (P S ′ ) can be rewritten, dropping the additive constant n: The first 2n variables ( , ) are not necessarily bounded but nevertheless are involved in the complementarity constraint ⊤ ( + ) = 0 , so the classical key condition (K) fails. Fortunately, by construction we find a vertex cover C ∶= [2n + 1 ∶ 3n] of G such that any x j = u j−2n ≤ 1 for all j ∈ C . Moreover, for any feasible solution ̂ ∈ ℝ n to the original QP (i.e., such that ̂ = ), satisfies ̂ C = , so that condition (R) is met if the original QP is feasible. Lemma 2 can be applied, and yields L ∞ ⊆ F for this class of problems. Note that [3, Thm.4] requires a priori knowledge of copositivity of the Hessian of the objective while our approach yields it automatically only by property of the constraints alone, and the fact that the optimal objective value is finite.
Therefore the conic relaxation is exact whenever the QP is feasible and bounded, which automatically would be the case for positive-semidefinite 0 , given that the linear system ⊤ i = b i , i∈ [1 ∶ m] , has a solution. Furthermore, the slackness constraints + = provide some rows i ∈ ℝ 4n + with b i = 1 > 0 , so that also the reduction step as in Sect. 2.3 can be performed.
Unfortunately, the constructed is never strictly L ∞ -copositive even if 0 is positive-definite, hence we cannot establish strong duality along the general lines in Sect. 2.3. Note that in the present case, by construction of , which includes all vectors of the form [ ⊤ , ⊤ , ⊤ ] ⊤ ∈ ℝ 4n + for which any slack matrix ( , , h) gives a zero quadratic form.
However, even if 0 is merely positive-semidefinite, adding an 2 term to the objective provides a regularization which ensures strong conic duality. See Sect. 3.3 for more details and Sect. 5 for the effect of .

Sparsity in QPs with sign constraints on variables
Additional linear inequality constraints can also be incorporated by introducing additional slack variables. We leave the obvious adaptations to the interested readers and concentrate for ease of presentation on the following model: Here we need not split = − into two nonnegative vectors any more, but we will keep and the slack variables as in the previous subsection. So, here ⊤ = [ ⊤ , ⊤ , ⊤ ] with ∈ ℝ 3n + now, and we end up with a QPCC formulation very similar to ( P S ), but with reduced dimensionality 3n and smaller primitive data: The linear constraints read now, putting b i = 1 for all i∈[m + 1 ∶ m + n] while keeping b i unchanged for all i∈[1 ∶ m]: Furthermore, we use the edge set E = {{i, i + n} ∶ i∈[1 ∶ n]} to express the bilinear constraints ⊤ = 0 modelling u i y i = 0 . Here C = [n + 1 ∶ 2n] is a vertex cover for G . Again, similar to the case with no sign constraints, Lemma 2 can be applied and yields L ∞ ⊆ F also for this problem class. Indeed, for any feasible ̂ here ̂ ∶= [̂ , ⊤ , ⊤ ] ⊤ ∈ L satisfies ̂ C = .
The two resulting, already simplified conic reformulation problems read exactly as in (CP) and (CD) for their duals, with n replaced by 3n and m replaced by m + n.
Apart from employing a conic representation with reduced dimensionality, we also can infer strong duality for the conic primal-dual pair by imposing conditions which in a realistic setting are met more frequently, compared to the setting with no sign constraints on the primitive variables. Indeed, as explained in the remark after the following proof, if the primitive polyhedron is bounded, then the strict copositivity condition holds by default.

Theorem 7 In model
Then • any feasible problem instance is bounded below; • the conic representation is exact; • the primal-dual conic pair has zero duality gap.
with ∈ L ∞ ⧵ { } , and then ⊤ = ⊤ 0 > 0 by assumption. The claims now follows by application of Theorem 5. ◻ Note that the assumption that 0 be strictly L ∞ -copositive is trivially satisfied if the primitive feasible polyhedron M = ∈ ℝ n + ∶ = is bounded, because then L ∞ = { }.

Hard sparsity constraints
Sometimes we want to force the solution of a quadratic problem to be k-sparse, where the integer k > 0 is a hard parameter for sparsity. To this end, we consider another sparsity model for constrained quadratic problems: Instead of using the soft parameter to pursuit sparsity, alternatively, we employ ‖ ⋅ ‖ 0 directly in a hard constraint. In analogy to the formulation of the soft sparsity problem (P S 0 ), we can also exactly express the hard sparsity problem (P H 0 ) as follows: Proof The proof was presented in [14] and is very similar to that of Theorem 6. ◻ Similarly, with the help of slack variables ∶= − ∈ ℝ n + and ∶= ⊤ + k − n ≥ 0 and after stacking all variables to ∶= [ ⊤ , ⊤ , ⊤ , ⊤ , ] ⊤ ∈ ℝ 4n+1 + , the problem (P H ′ ) can be rewritten: Conic formulation of QPCCs applied to truly sparse QPs as well as the linear constraints, putting b i = 1 for all i∈[m + 1 ∶ m + n] and b i = k − n for i = m + n + 1: Similarly to the soft sparsity case, the model with sign constraints on the variables can also be treated, leading to Again, we need not split = − into two nonnegative vectors any more, but we will keep and the slack variables as in the previous subsection. So, here The QPCC formulation is similar to (P H ), using the same adding a linear constraint with b m+n+1 = k − n while keeping b i unchanged for all i∈[1 ∶ m + n]: The conic reformulation problems read exactly as in (CP) and (CD) for their duals, with n replaced by 3n + 1 and m replaced by m + n + 1.

Theorem 9 In model
Then • any feasible problem instance is bounded below; • the conic representation is exact; • the primal-dual conic pair has zero duality gap.
Proof Employ the same arguments as for Theorem 7. ◻ Again, the assumption that 0 be strictly L ∞ -copositive is trivially satisfied if the primitive feasible polyhedron M = ∈ ℝ n + ∶ = is bounded, because then L ∞ = { } . It is also satisfied if 0 is positive-definite as in the following section which presents an application which motivated this study.

Sparse linear regression: conic reformulation
In a linear least-squares/regression context, 0 = ⊤ 0 0 with 0 an m × n matrix with m < n , so 0 is always positive-semidefinite but will have some zero eigenvalues. Hence strict copositivity of 0 cannot be ensured by spectral properties of 0 alone; one possibility is the condition that 0 contains some strictly positive rows which renders 0 = + with ∈ P n and ∈ N n . This is enough to ensure strict ℝ n + -copositivity of 0 which establishes all favorable properties of our conic reformulation by application of Theorems 7 and 9. Another class of instances is that with bounded primitive feasible polyhedron M = ∈ ℝ n + ∶ = , as explained in the note after these results. In absence of sign constraints, another option already hinted at in Sect. 3.1.1 would consist in adding a regularization term Proposition 10 Consider a regularized sparse linear regression problem with a small > 0 where, in the notation of (P S 0 ) or (P H 0 ),

Then
• any feasible problem instance is bounded below; • the conic representation is exact; • the primal-dual conic pair has zero duality gap.
Proof By reformulation as before, we only need to add � ‖ ‖ 2 + ‖ ‖ 2 � to the original objective of (P S 0 ) or (P H 0 ) instead of ‖ ‖ 2 = ‖ − ‖ 2 ; indeed, in the optimum we have ‖ ‖ 2 + ‖ ‖ 2 = ‖ − ‖ 2 as optimality already implies v i = y + i and w i = y − i and therefore ⊤ = 0 , as argued before. Thus we can regularize by a positive-definite term in ( , ) which renders strictly L ∞ -copositive where L ∞ is given as in (14) (replace 4n with 4n + 1 when dealing with hard sparsity constraints). The assertions then follow by application of Theorems 4 and 5. ◻ Thus, for a large class of sparse regression problems, be they constrained or not, our theory applies in full strength. Apart from that, even if the last line of arguments cannot be carried through for all instances with no sign constraints on the variables, we still can use weak conic duality to obtain strong dual bounds, as our experiments show.

3
Conic formulation of QPCCs applied to truly sparse QPs

An iterative algorithm for dual problems (CD)
For the convenience of expression, we write (CD) in the following form: where g ∶ ℝ m → S n is a linear-affine function in and ∈ ℝ m . By convexity of COP n , it is obvious that the set { ∶ g( ) ∈ COP n } is convex too. Therefore it is possible to solve the convex problem (15) via a directional search scheme, provided a feasible initial guess. Recall that any feasible solution to (15) already provides a rigorous lower bound ⊤ for (P). Now we introduce a derivative-free algorithm for (15) where a subscript i denotes the i-th entry of a vector and a superscript k stands for the iteration counter.
As for the copositivity detection involved in the Algorithm 1, we adopt the method by [2], which amounts to solve the following MILP: where i is the i-th column of targeted matrix and the parameter m i is typically chosen m i = 1 + ∑ i≠j {s ij ∶ s ij > 0} as in [2].
Theorem 11 [2] Let ∈ S n . Then ∈ COP n if and only if the optimal value of (16) is zero.

Numerical experiments
In this section, we conduct numerical experiments on a repository available online and randomly generated problem sets by adopting the proposed copositive relaxation technique and solving it by Algorithm 1. Moreover, we compare our results with several existing primal approaches in the content of soft and hard regression problems. All the computations are implemented in MATLAB R2021a on a laptop with 3.3Ghz AMD Ryzen 9 processor and 16GB RAM.

Results on some QPCC instances from MacMPEC
As a first proof of concept for our approach, we take several QPCC instances from the MacMPEC repository [31], namely the larger ones originating from [26] which involve linear constraints arising from network conditions. We bring these instances into the form (P) and report the conic dual bounds of these instances in Table 1, which suggests that our approach provides almost tight bounds for all instances, given that the optimal value is zero, see [31]. Note that the objective functions in these instances are all convex, so that the second implication of Lemma 2 applies and checking (R) is not necessary in this context. The slight increase in the absolute value of dual bounds with increasing problem size may be attributed to the numerical tolerance involved in Algorithm 1. The reported computational times show a moderate increase with problem size. For these relatively large instances, they fall into the typical range for conic optimization problems and should be compared to training times for machine learning, including the effort of tuning hyperparameters. Note that most likely, off-the-shelf alternative copositive approaches [15] would not be solvable at all due to memory problems.

Results on constructed instances
In this subsection we turn to sparse least-square regression models which may take one of the following forms: and which are special cases of (P S 0 ), (P + S ), (P H 0 ) and (P + H ) respectively, referred to soft sparse regression problems and hard sparse regression problems. This class of problems actually covers many applications especially in portfolio selection [6,32] and compressed sensing [4]. It is clear that the Hessians of the quadratic parts in these regression problems are all positive-semidefinite. Alternatively, one here could employ [3,Thm.4] directly for this application, or else employ the branch-andbound approach in [21].
As discussed in Sect. 3.3, we employ the regularization term (v 2 i + w 2 i ) to ensure strong conic duality according to Theorem 3.3 and by R s , R sp , R h , R hp we denote the objective function of the regularized problems for (17)- (20) respectively. e.g., the regularized problems for (17): where 0 = ⊤ 0 0 and 0 = − ⊤ 0 0 . In order to reduce the effect of added regularization term, we adopt = 10 −5 throughout this section.
Following Theorem 4, we then reformulate the above QPCCs as conic problems of the form (CP) and have its conic dual problems of the form (CD). Consequently, on the dual side, by using an iterative algorithm for the copositive optimization problems (CD), we obtain conic dual bounds for all sparse regression problems (17)-(20).
• Choice 1: First, we define a set ̄ ∶= { i − ∈ ℝ n ∶ i∈ [1 ∶ d], ≠ 0} , where ⊤ i is the i-th row of matrix 0 , and 0 can be chosen as any element of the set Indeed, if ≠ lies in the null space of 0 ∈ , then there exists j∈ [1 ∶ m] such that ( j − ) ⊤ = 0 . Now consider Following this strategy, we can generate many constraints 0 * 0 = 0 by simply changing parameter , and ⊤ 0 0 is always strictly { ∈ ℝ n + ∶ 0 = } -copositive. • Choice 2: We choose 0 with at least one row to be strictly positive, which means that the set 0 = 0 is bounded and the recession cone of it shrinks to { } . By default, the chosen 0 satisfies the condition that ⊤ 0 0 is { ∈ ℝ n

3
Conic formulation of QPCCs applied to truly sparse QPs Following the construction, we then choose a suitable such that the constructed * 0 is an optimal solution to the problem (17) or (18). To this end, we obtain a group of constrained soft sparse regression problems with known optimal solutions * 0 and their objective value. After standardizing the constructed problems (subtracting a constant such that the optimal value is 0), by d,m,n ( ̄ d,m,n ) we denote problem sets that contain generated instances with randomly generated 0 ∈ ℝ d×n and constructed 0 ∈ ℝ m×n (randomly generated 0 ∈ ℝ m×n ).
For hard sparse regression, an instance can be determined by the parameters 0 , 0 , 0 , 0 and k. With same procedure adopted as in the soft case, we generate 0 , 0 , 0 , 0 and a k 0 -sparse vector * 0 . By taking k = k 0 , a hard constrained sparse regression problem with known optimal solution * 0 and optimal value is generated. Similarly, by ℌ d,m,n ( H d,m,n ) we denote problem sets that contain generated instances with randomly generated 0 ∈ ℝ d×n and constructed 0 ∈ ℝ m×n (randomly generated 0 ∈ ℝ m×n ).

Numerical results
We test our approach on a group of generated problem sets with each set containing 100 instances and record the dual bound as shown in Table 2. Recall that the optimal value of each constructed instance is 0. In Table 2, the label " -regularized" indicates whether the regularization term is applied for the problem set or not. Here and in the sequel, an entry ≈ 0 signifies a number of magnitude 10 −4 or smaller.
Overall, from Table 2, we can see that our approach provides an acceptable copositive dual bound from the perspective of average bound, ranging from −0.015 to −0.15 . In some instances the best bound reported exceeds the optimal value, which may be a systematical error generated in the process of copositivity detection where we solve a MILP subproblem. In other words, some non-copositive matrices are considered copositive in very rare cases, which extends the feasible set of the dual problem. Recall that the Hessian of instances in problem sets d,m,n or ℌ d,m,n is strictly L ∞ -copositive by construction while the Hessian of instances in problem sets ̄ d,m,n or H d,m,n may not be strictly L ∞ -copositive, which indicates that the strong conic duality may not hold for problems from ̄ d,m,n or H d,m,n . However, from Table 2, irrespective of whether the -regularized strategy applied or not, it can be seen that the average bound of problems from ̄ d,m,n or H d,m,n is still relatively tight in computational sense in general, even in absence of a strong duality guarantee. Moreover we note that our method performs better on hard regression instances, providing a conic dual bound with absolute gap less than 0.006 for hard instances. Regarding soft regression problems, across all problem sizes, our strategy gives a dual bound with an absolute gap of the order 10 −2 . While bounds generated by our approach are quite tight, the computational effort for it can be massive, in line with similar conic optimization models yielding results of comparable quality. When n = 16 , all the instances from test sets d,m,n (̄ d,m,n ) and ℌ d,m,n (H d,m,n ) can be resolved within 5 minutes. As the problem dimensions increase to n = 32 , the computational effort sees a dramatic growth, with 12% of instances needing longer than 10 minutes. In the worst case, runtime can be 20 minutes. For the cases n = 64 , 73% instances are solved within 30 minutes while the worst case takes roughly 110 minutes.

Comparisons for soft sparse regression
In this subsection, we compare our method with four popular approximate approaches to soft sparse regression problems, from the perspective of optimal value.
For convenience of expression, we denote these alternatives by However, in general, the sparse solution obtained through these approximate sparse terms may differ from the one obtained via the original term ‖ ⋅ ‖ 0 . The exactness of these approximations holds only under some conditions. For example, it was proved in [17] that the sparse solution can be recovered exactly via ‖ ⋅ ‖ 1 regularization term if the so-called restricted isometry property (RIP) condition holds. In particular, the non-convex terms ‖ ⋅ ‖ q with 0 < q < 1 can be regarded as a continuation strategy to approximate ‖ ⋅ ‖ 0 as q ↘ 0 , where the RIP is also needed for exactness. Similarly, a RIP-type sufficient condition is given in [37] to guarantee that with the regularization term ‖ ⋅ ‖ 1 − ‖ ⋅ ‖ 2 added, a sparse vector can be recovered exactly. We illustrate this discrepancy in Fig. 1. In order to quantify this gap, we construct the following soft sparse regression example. With randomly generated matrix 0 ∈ ℝ 3×12 and properly chosen sparse vector * 0 ∈ ℝ 12 , we let 0 ∶= 0 * 0 ∈ ℝ 3 and choose = 1 . Following the construction, * 0 is the optimal solution to the problem By replacing the term ‖ ‖ 0 with sparse approximate regularization terms, we arrive at By a heuristic algorithm [33], we obtain optimal solutions * i for F i ( ) , i∈ [1 ∶ 3] , i.e. F i ( * i ) = min ∈ℝ 12 F i ( ) . As presented in the Fig. 1, these solutions are distinct from the constructed real sparse solution * 0 .
In Table 3, the number in the i-th row and j-th column stands for the value of objective F i at the point * j . In addition, we highlight the best value of F i among all these points in bold. From the perspective of each row, the optimal solution * i rather than original sparse solution minimizes the objective F i ( ) , which means that the approximate models are not equivalent to the original sparse problem. In particular, we can see that, under original sparsity measure ‖ ⋅ ‖ 0 (the last row), there exist considerable gaps between solutions obtained from approximate models and the original one.
To further study the gap, we take problem sets d,m,n as defined previously. Moreover, for the sake of convenient comparison, we let 0 = and 0 = , which means that the problem set d,m,n turns out to be d,0,n , unconstrained soft sparse regression problems of the form (21) with known optimal solution. We abbreviate it by d,n . On the other hand, we solve the approximate sparsity models (22) by algorithms that are frequently used in compressed sensing. In particular, when the regularization term is ‖ ⋅ ‖ 1 , we adopt the well-known fast iterative shrinkage-thresholding algorithm (FISTA), see [4]; when the regularization term is ‖ ⋅ ‖ 1 2 , we adopt half thresholding algorithm (HTA) from [38] which is a proximal algorithm for composite function with nonsmooth and nonconvex components; when the regularization term is ‖ ⋅ ‖ 1 − ‖ ⋅ ‖ 2 , we adopt the fast forward-backward splitting algorithm (FBS) from [34].
After running these algorithms for (22) on three problem sets with each set containing 100 instances, we obtain optimal solutions and have the Table 4. The numbers in Table 4 stand for the function values of F 0 ( ) at the points which are optimal solutions of the corresponding approximate sparsity models (22) solved by introduced algorithms. The number in the column with label "best" stands for the best case among all instances within a problem set while the "average" is the average performance. Recall that optimal values of the instances we constructed are 0. From Table 4, we can see that there is a large gap (exceeding 6) between the real optimal (21) min ∈ℝ n F 0 ( ) ∶= values and the results obtained by approximate models among the constructed instances, where the measurement matrix 0 may not satisfy RIP. However, shifting to Table 5, the conic dual bound we get from the same problem sets are still tight with absolute gap smaller than 0.23.

Comparisons for hard sparse regression
In this subsection, we compare our method with 4 alternatives for hard sparse regression problems of the form (17) or (18) from the perspective of optimal value.

3
Conic formulation of QPCCs applied to truly sparse QPs As problem sets we take ℌ d,m,n as defined previously, and each problem set contains 100 randomly generated instances. Moreover, the surrogates for sparsity constraint ‖ ‖ 0 ≤ k are: • Top-(k,1) norm: where ‖�x�‖ k,1 denotes the l 1 -norm of a subvector composed of top-k elements in absolute value; • Top-(k,2) norm: where ‖� �‖ k,2 denotes the l 2 -norm of a subvector composed of top-k elements in absolute value; • Big-M: where M is a sufficiently large constant; • Big-M variant: By replacing the sparsity constraint with these alternatives, we arrive at four problems. The first two we solve by PDCA [27], while for the others we employ the MIP solver Gurobi. Then we record corresponding optimal values as presented in Table 6 and compare with optimal dual bound generated by our method as shown in Table 7.    Table 6 evidences that, across all problem sets, the Big-M method provides best optimal values while the Big-M variant method performs worst. The possible reason for it might be the intractability of the involved complementarity constraint for Gurobi. However, the Big-M involved in the constraint is not easy to estimate if we do not know the optimal solutions. In contrast, our method is better than the other approaches without any a priori knowledge.

Conclusions and outlook
In this paper, we propose an exact completely copositive reformulation of QPCCs under mild conditions on the constraints, which extend previously known ones. An application on pursuing sparse solutions of (nonconvex) quadratic optimization is shown to satisfy our proposed conditions and therefore an equivalent completely positive optimization problem is obtained. Moreover, the condition for strong conic duality is also studied. To solve the conic optimization problem from the dual side, we introduce a direct search method with embedded MILP-based  copositivity detection which does not need any large constants. Numerical experiments on sparse regression problems are reported which allow comparing our exact model with popular sparsity approximations, from the perspective of objective function values. However, the computational effort needed in our algorithm is relatively large because of the difficulty on the involved copositivity detection procedure for large problem sizes. A possible solution for it would be employing shortcut strategies in the copositivity checks which systematically exploit favourable instances by preprocessing [12].