ON THE CONVEX HULL OF CONVEX QUADRATIC OPTIMIZATION PROBLEMS WITH INDICATORS

. We consider the convex quadratic optimization problem with indicator variables and arbitrary constraints on the indicators. We show that a convex hull description of the associated mixed-integer set in an extended space with a quadratic number of additional variables consists of a single positive semideﬁnite constraint (explicitly stated) and linear constraints. In particular, convexiﬁcation of this class of problems reduces to describing a polyhedral set in an extended formulation. While the vertex representation of this polyhedral set is exponential and an explicit linear inequality description may not be readily available in general, we derive a compact mixed-integer linear formulation whose solutions coincide with the vertices of the polyhedral set. We also give descriptions in the original space of variables: we provide a description based on an inﬁnite number of conic-quadratic inequalities, which are “ﬁnitely generated.” In particular, it is possible to characterize whether a given inequality is necessary to describe the convex hull. The new theory presented here uniﬁes several previously established results, and paves the way toward utilizing polyhedral methods to analyze the convex hull of mixed-integer nonlinear sets.


Introduction
Given a symmetric positive semidefinite matrix Q ∈ R n×n , vectors a, b ∈ R n and set Z ⊆ {0, 1} n , consider the mixed-integer quadratic optimization (MIQO) problem with indicator variables min a ⊤ x + b ⊤ z + 1  2 t (1a) (MIQO) s.t.x ⊤ Qx ≤ t (1b) and the associated mixed-integer nonlinear set where e denotes a vector of ones, and x • (e − z) is the Hadamard product of vectors x and e−z.There has recently been an increasing interest in problem (1) due to its statistical applications: the nonlinear term (1b) is used to model a quadratic loss function, as in regression, while Z represents logical conditions on the support of the variables x.For example, given model matrix F ∈ R m×n and responses 1) is equivalent to the best subset selection problem with a given cardinality r [11,17]: Other constraints defining Z that have been considered in statistical learning applications include multicollinearity [11], cycle prevention [29,31], and hierarchy [13].Set X arises as a substructure in many other applications, including portfolio optimization [14], optimal control [22], image segmentation [27], signal denoising [10].A critical step toward solving MIQO effectively is to convexify the set X. Indeed, the mixed-integer optimization problem (1) is equivalent to the convex optimization problem min x,z,t where conv(X) denotes the convex hull of X and cl conv(X) is the closure of conv(X).However, problem MIQO is N P -hard even if Z = {0, 1} n [16].Thus, a simple description of cl conv(X) is, in general, not possible unless N P = Co-N P .
In practice, one aims to obtain a good convex relaxation of X, which can then be used either as a standalone method (as is pervasively done in the machine learning literature), to obtain high quality solutions via rounding, or in a branch-and-bound framework.Nonetheless, it is unclear how to determine whether a given relaxation is good or not.In mixed-integer linear optimization, it is well-understood that facet-defining inequalities give strong relaxations.However, in MIQO (and, more generally, in mixed-integer nonlinear optimization problems), cl conv(X) is not a polyhedron and there is no consensus on how to design good convex relaxations, or even what a good relaxation should be.
An important class of convex relaxations of X that has received attention in the literature is obtained by decomposing matrix Q = ℓ i=1 Γ i +R, where Γ i 0, i = 1, . . ., ℓ, are assumed to be "simple" and R 0. Then t ≥ x ⊤ Qx ⇐⇒ t ≥ ℓ i=1 τ i + x ⊤ Rx, and τ i ≥ x ⊤ Γ i x, ∀i ∈ {1, . . ., ℓ}, (3) and each constraint τ i ≥ x ⊤ Γ i x is replaced with a system of inequalities describing the convex hull of the associated "simple" mixed-integer set.This idea was originally used in [20], where ℓ = n, (Γ i ) ii = d i > 0 and (Γ i ) jk = 0 otherwise, and constraints τ i ≥ d i x 2 i are strengthened using the perspective relaxation [2,19,23], i.e., reformulated as z i τ i ≥ d i x 2 i .Similar relaxations based on separable quadratic terms were considered in [18,35].A generalization of the above approach is rank-one decomposition, which lets Γ i = h i h ⊤ i be a rank-one matrix [7,8,33,34]; in this case, letting S i = {i ∈ [n] : h i = 0}, constraints j∈S i z j τ i ≥ (h ⊤ i x) 2 can be added to the formulation.Alternative generalizations of perspective relaxation that have been considered in the literature include exploiting substructures based on Γ i where non-zeros are 2 × 2 matrices [5,6,9,21,25,28] or tridiagonal [30].
Convexifications based on decomposition (3) have proven to be strong computationally, and are attractive from a theoretical perspective.The fact that a given formulation is ideal for the substructure τ i ≥ x ⊤ Γ i x lends some theoretical weight to the strength of the convexification.However, approaches based on decomposition (3) have fundamental limitations as well.First, they require computing the convex hull description of a nonlinear mixed-integer set to establish (theoretically) the strength of the relaxation, a highly non-trivial task that restricts the classes of matrices Γ i that can be used.Second, even if the ideal formulation for the substructure τ i ≥ x ⊤ Γ i x is available, the convexification based on such decomposition can still be a poor relaxation of X-and there is currently no approach to establish the strength of the relaxation without numerical computations.Third, it is unclear whether the structure of the relaxations induced by (3) matches the structure of cl conv(X), or if they are overly simple or complex.

Contributions and outline.
In this paper, we close the aforementioned gaps in the literature by characterizing the structure of cl conv(X).First, in §2, we review relevant background for the paper.In §3, we show that cl conv(X) can be described in a compact extended formulation with O(n 2 ) additional variables with linear constraints and a single positive semidefiniteness constraint.In particular, convexification of X in this extended formulation reduces to describing a base polytope.We use the vertex description of this base polytope, which is exponential in general.However, we show that the set of vertices can be represented as the feasible points of a compact mixedinteger linear formulation ( §5).In §4, we characterize cl conv(X) in the original space of variables.While the resulting description has an infinite number of conic quadratic constraints, we show that cl conv(X) is finitely generated, and thus we establish which inequalities are necessary to describe cl conv(X)-in precisely the same manner that facet-defining inequalities are required to describe a polyhedron.We also establish a relationship between cl conv(X) and relaxations obtained from decompositions (3).In §5, we present a mixed-integer linear formulation of the MIQO problem using the theoretical results in §3.Finally, in §6 we conclude the paper with a few remarks.
We point out that, using standard disjunctive programming techniques [15], it is possible to obtain a conic quadratic extended formulation of (1), although such representation typically requires adding O(|Z|n) number of variables and O(|Z|) nonlinear constraints.Since |Z| is often exponential in n, these formulations are in general impractical, and therefore their use has been restricted to small instances with n ≤ 2 [5,6,21,23,25] or problems with special structures that admit a compact representation [24].We argue that the convexifications in this paper are significantly more tractable: regardless of Z, we require only O(n 2 ) variables instead of O(|Z|n), and only one nonlinear conic constraint instead of O(|Z|).The major complexity of the proposed formulations in this paper is the exponential number of linear inequalities, which can be generated, as needed, using mature mixed-integer linear optimization techniques.

Notation and Preliminaries
In this section, we first review the relevant background and introduce the notation used in the paper.

Definition 1 ([32]
).Given a matrix W ∈ R p×q , its pseudoinverse W † ∈ R q×p is the unique matrix satisfying the four properties: We recall the generalized Schur complement, relating pseudoinverses and positive semidefinite matrices.Note that if W 11 ≻ 0, then the second condition of Lemma 1 is automatically satisfied.Otherwise, this condition is equivalent to the system of equalities W 11 U = W 12 having a solution U ∈ R p×q .
Let [n] = {1, . . ., n}.Throughout, we use the convention that i=1 W ii denote its trace, and let W −1 denote its inverse, if it exists.W 2 and W ∞ denote the Frobenius norm and the maximum absolute value of entries of W respectively, and λ max (W ) means the maximum eigenvalue of W .We let col(W ) denote the column space of matrix W .Given a matrix W ∈ R n×n and S ⊆ [n], let W S ∈ R S×S be the submatrix of W induced by S, and let ŴS ∈ R n×n be the n × n matrix obtained from W S by filling the missing entries with zeros, i.e., matrices subscripted by S without "hat" refer to the lower-dimensional submatrices.For any two sets S, T ⊂ [n], let W S,T denote the submatrix of W with rows in S and columns in T .Note that if matrix W ≻ 0, then it can be easily be verified from Definition 1 that the submatrix of Ŵ † S indexed by S coincides with W −1 S , and Ŵ † S is zero elsewhere; in this case, we abuse notation and write Ŵ −1 S instead of Ŵ † S .Given S ⊆ [n], let êS ∈ {0, 1} n be the indicator vector of S. We define π S as the projection onto the subspace indexed by S and π −1 S (x) as the preimage of x under π S . Example

Convexification in an extended space
In this section, we describe cl conv(X) in an extended space.In §3.1, we provide a "canonical" representation of cl conv(X) under the assumption that Q ≻ 0. In §3.2, we provide alternative representations of cl conv(X), which can handle non-invertible matrices Q and may also lead to sparser formulations.
Proposition 1 below shows how to construct mixed-integer conic formulations of MIQO using polytope P .
Proposition 1.If Q ≻ 0, then the mixed-integer optimization model is a valid formulation of problem (1).
Proof.Consider a point (x, z, t, W ) satisfying constraints (4b), (4c) with z = êS for some êS ∈ Z. Constraint (4c) is satisfied if and only if , we find from Lemma 1 that constraint (4b) is satisfied if and only if: • W 0, which is automatically satisfied.
Note that condition W W † x = x is used to enforce the complementarity constraints.We point out that a similar idea was recently used in the context of low-rank optimization [12].Now consider the convex relaxation of (4), obtained by dropping the integrality constraints z ∈ {0, 1} n : Theorem 1.Let Q be a positive definite matrix.Then Consequently, the problem (5) has an optimal solution integral in z.
Projecting out variable t, we find that problem (5) reduces to min Note that this formulation uses the pseudoinverse of a matrix of variables.
Observe that we omit the constraint W 0. Since every extreme point (z, W ) of P satisfies W 0, it follows (z, W ) ∈ P already implies W 0.
We argue that for any fixed (z, W ) ∈ P , setting x = −W a is optimal for (6).Using equality (6b), we replace the term a ⊤ x in the objective with a ⊤ W W † x.Since the problem is convex in x, from KKT conditions we find that any point x satisfying is optimal.In particular, setting x = −W a, we find that (7b) is satisfied with λ = 0, and (7a) is satisfied since Since the objective − 1 2 aa ⊤ , W +b ⊤ z is linear in (z, W ) and P is a polytope, there exists an optimal solution (z * , W * ) that is an extreme point of P , and in particular there exists êS ∈ Z such that z * = êS and Remark 1.The convexification for the case where Q is tridiagonal [30] is precisely in the form given in Theorem 1, where the polyhedron P is described with a compact extended formulation.
3.1.1.Bivariate quadratic functions.Consider set where Set X 2×2 corresponds (after scaling) to a generic strictly convex quadratic function of two variables; conic quadratic disjunctive programming representations of cl conv(X 2×2 ) have been used in the literature [5], explicit representations of cl conv (X 2×2 ∩ {(x, z, t) : x ≥ 0}) in the original space of variables have been given [9,25], and descriptions of the rank-one case d 1 d 2 = 1 were given in [7].A description of cl conv (X 2×2 ∩ {(x, z, t) : ℓ ≤ x ≤ u}) in a conic quadratic extended formulation is given in [21] using disjunctive programming.This formulation can be easily adapted to the case with no bounds (considered here), and requires three additional variables and three conic quadratic constraints to use with solvers.We now give a more compact representation of cl conv(X 2×2 ) with free variables.We now illustrate Theorem 1 by computing an extended formulation of Proof.Polyhedron P is the convex hull of the four points given in Table 1.
Table 1.Extreme points of P corresponding to set X 2×2 .
) are valid.Letting w = W 12 and projecting out variables W 11 and W 22 , we find that Also note that w = 1 ∆ min{z 1 , z 2 }, and the convex hull of (z ∆ min{z 1 , z 2 } is described by the following inequalities: Then, ( 9) and ( 10) describe the polyhedron P .
Remark 2. Since P is not full-dimensional, we require only one additional variable w (instead of three) for conic representation of cl conv(X 2×2 ) via the constraints 0 ≤ z ≤ 1, (10), and Remark 3. The matrix representation (9) suggests an interesting connection between cl conv(X 2×2 ) and McCormick envelopes.Indeed, from Table 1, we see that Moreover, the usual McCormick envelopes of the bilinear term z 1 z 2 , given by max{0, , are sufficient to characterize the convex hull.
3.1.2.Quadratic functions with "choose-one" constraints.Given Q ≻ 0, consider set Set X C1 arises, for example, in regression problems with multicollinearity constraints [11]: given a set of J features that are collinear, constraints i∈J z i ≤ 1 are used to ensure that at most one such feature is chosen.The closure of the convex hull of X C1 is [see, e.g., 21, 33] We now give an alternative derivation of this result using our technique.Polyhedron P is the convex hull of n + 1 points: point (0, 0) and points . It can easily be seen that P is described by constraints which by Lemma ( 1) is equivalent to and Matrix F may be immediately available when formulating the problem, or may be obtained through a Cholesky decomposition or eigendecomposition of Q.Such a factorization is often employed by solvers, since it results in simpler (separable) nonlinear terms, and in many situations matrix F is sparse as well.In this section, we discuss representations of cl conv(X) amenable to such factorizations of Q.While the proofs of the propositions of this section are similar to those in Section 3.1, additional care is required to handle unbounded problems (1) arising from a rank-deficient Q.
Given F ∈ R n×k , define F S ∈ R S×k as the submatrix of F corresponding to the rows indexed by S, and let FS ∈ R n×k be the matrix obtained by filling the missing entries with zeros.Define the polytope P F ⊆ R n+k 2 as FS is an orthogonal projection matrix (symmetric and idempotent), and in particular ( F † S FS ) † = F † S FS .These properties can be easily verified from Definition 1.Since all eigenvalues of an orthogonal projection matrix are either 0 or 1, it also follows that F † S FS 0.
is a valid formulation of problem (1).
Proof.Consider a point (x, z, t) ∈ X with z = êS for some êS ∈ Z. Constraint (11d) is trivially satisfied.Constraint (11c) is satisfied if and only if W = F † S FS .Note that in any feasible solution, x i = 0 whenever i ∈ S, and in particular F ⊤ x = F ⊤ S x.From Lemma 1, we find that constraint (11b) is satisfied if and only if (recall properties in Remark 4): x, and, therefore, this condition is satisfied as well.
x S , which is precisely the nonlinear constraint defining set X and is thus satisfied.
While the proofs of Proposition 1 and 3 are similar in spirit, we highlight a critical difference.In the proof of Proposition 1, with the assumption Q ≻ 0, constraints W W † x = x enforce the complementarity constraints x • (e − z) = 0, and therefore, such constraints are excluded in (4).In contrast, in the proof of Proposition 3, with Q potentially of low-rank, constraints W W † F ⊤ x = F ⊤ x alone are not sufficient to enforce x • (e − z) = 0, and therefore, they are included in (11) and are used to prove the validity of the mixed-integer formulation.Indeed, if there exist êS ∈ Z and x ∈ R n such that xS = 0, x[n]\S = 0 and F ⊤ x = 0, then for any (x, z, t) ∈ X we find that lim In particular, the point (x + x, z, t), which may not satisfy the complementarity constraints, cannot be separated from cl conv(X), or any closed relaxation.On the other hand, if matrix Q is full-rank, then F ⊤ x = 0 =⇒ x = 0 (as shown in the proof of Proposition 1); therefore, the complementarity constraints are enforced by the conic constraint.
Recall that π S : R n → R S is the projection onto the subspace indexed by S. Now we consider the natural convex relaxation of (11) by dropping constraint (11d), and show that it is ideal under certain technical conditions over F and the set Z, as stated in Theorem 2 below.
Proof.Clearly, constraints (11b),(11c) define a closed convex set.Consider the two optimization problems: and It suffices to show that problem (12) and ( 13) always attain the same optimal value.Consider the following two cases: • F F † a = a: In other words, a is not in the column space of F , i.e., a / ∈ col(F ).In this case, by the condition col(F ) = êS ∈Z π −1 S (col(F S )), there exists one êS ∈ Z such that a S / ∈ col(F S ).Then, let z be such that z i = 1, ∀i ∈ S. Since a S / ∈ col(F S ), there exists x such that x i = 0 for all i ∈ [n]\S, x S is in the orthogonal complement of F S and a ⊤ S x S < 0. Clearly, z and x satisfy the constraint x i (1 − z i ) = 0 for all i = 1, . . ., n. Complementarity holds for λx for λ > 0 as well.Since, by construction, 12) is unbounded and since problem ( 13) is a convex relaxation of (12), problem ( 13) is unbounded as well.
• F F † a = a: For problem (13), we can project out t using the relation Therefore, problem ( 13) is equivalent to Using identical arguments as in the proof of Theorem 1, we find that there exists êS ) is optimal for (15).We now construct an optimal solution for (14).Let x * be defined as x * S = −(F † S ) ⊤ F † S a S and x * [n]\S = 0, and observe that (x * , z * ) is feasible for (12), with objective 2 .Substituting W * = F † S FS , the optimal value of problem (13) Hence, we conclude that the optimal values of problem ( 12) and problem (13) coincide.
Remark 5. From the first case analysis of the proof of Theorem 2, one sees that the technical condition col(F ) = êS ∈Z π −1 S (col(F S )) is equivalent to stating that the mixed-integer optimization problem and the proposed convex relaxation are unbounded at the same time.The condition is automatically satisfied if e ∈ Z.Moreover, if matrix Q is rank-one, then this condition is equivalent to the nondecomposability condition on Z given in [34].If it fails to hold, the convexification presented is still valid but may be weak: the convex relaxation may be unbounded even if the mixed-integer optimization problem is bounded.We provide an example illustrating this phenomenon in §3.2.3.Remark 6.An immediate consequence of Theorem 2 is that if matrix Q is rank-deficient, i.e., k < n, then the extended formulation describing cl conv(X) is simpler than the full rank case, i.e., it has fewer additional variables and lower-dimensional conic constraints.
We now illustrate Theorem 2 by providing an alternative proof of the main result of [7] using our unifying framework.

3.2.1.
Rank-one quadratic functions.Consider the rank-one set where we assume Proof.In the case of a rank-one function, we have F = h and W ∈ R 1 .Note that the pseudoinverse of vector ĥS is given by ĥ and, in particular, we find that ĥ † S ĥS = 1 if S = ∅, and ĥ † S ĥS = 0 otherwise.Thus, ĥ † S ĥS = max{z 1 , . . ., z n }, and P F is described by the linearization 0 ≤ W ≤ min{1, e ⊤ z}.Projecting out variable W , we arrive at the result.
We discuss generalizations of X R1 with arbitrary constraints on the indicator variables in Section 4.

3.2.2.
An example with a rank-two quadratic function.In order to illustrate how convexification methods for polyhedra can be directly utilized to convexify the mixed-integer nonlinear set X, we consider a special rank-two quadratic function with three variables and the associated set . The extreme points of P F are given in Table 2. Using PORTA [1] to switch from the extreme point representation of P F to its facial description, we obtain the closure of the convex hull of X 3 : Table 2. Extreme points of P F corresponding to set X 3 .
3. An example where the technical condition fails.Consider the set In this case, F = h and col(F {i} ) = R and π −1 S (col(F {i} )) = R n .Thus, êS ∈Z π −1 S (col(F S )) = R n , while col(F ) = {x ∈ R n : x = λh for some λ ∈ R}, and the technical assumption is not satisfied.
The relaxation induced by (11b), (11c), (11e), which is constructed as outlined in Proposition 4, results in the set induced by bound constraints 0 ≤ z ≤ 1, e ⊤ z ≤ 1 and t ≥ (h ⊤ x) 2 /(e ⊤ z).Moreover, the corresponding optimization problem min In contrast, cl conv(X C1 R1 ) is described via constraint t ≥ n i=1 h 2 i x 2 i /z i [33, 34] (similar to the result described in §3.1.2),and the corresponding optimization problem is always bounded.

Convexification in the original space
We now turn our attention to describing cl conv(X) in the original space of variables.The discussion of this section is based on projecting out the matrix variable W in the canonical description of cl conv(X) given in Theorem 1 for Q ≻ 0. Identical arguments hold for the representation in Theorem 2 for low-rank matrices.
Suppose that a minimal description of polyhedron P is given by the facetdefining inequalities and equalities where Γ i ∈ R n×n , β i ∈ R and γ i ∈ R n .Theorem 3 describes cl conv(X) in the original space of variables.Note that, in practice, a complete description may not be explicitly available, in which case one can use a partial description to derive valid inequalities.Before we give the description in the original space, we define a set of feasible coefficients used to derive the inequalities.Let or equivalently, Strong duality holds since there exists (z, W ) ∈ P that satisfies the facetdefining inequalities strictly, and we can always increase λ to find a strictly feasible solution to the above minimization problem.Substituting V = W − xx ⊤ /t + λI, the optimization problem simplifies to Letting y ∈ R m 1 + × R m−m 1 denote the dual variables, we find the equivalent representation 0 ≥ max In particular, inequality (20a) is valid for any fixed feasible y.Multiplying both sides of the inequality by t, we find the equivalent conic quadratic representation Note that validity of inequalities (21) implies that y ⊤ β + ( m i=1 y i γ i ) ⊤ z ≥ 0 for any primal feasible z and dual feasible y; dividing both sides of the inequality by y ⊤ β + ( m i=1 y i γ i ) ⊤ z, the theorem is proven.Note that even if inequalities (16) are not facet-defining or are insufficient to describe P , the corresponding inequalities (23) are still valid for cl conv(X).
We also state the analogous result for low-rank matrices, without proof, where or equivalently, We now illustrate Theorem 3 for the set X 2×2 discussed in §3.1.1.
Example 2 (Description of cl conv(X 2×2 ) in the original space).From Proposition 2, we find that for X 2×2 , a minimal description of polyhedron P is given by the bound constraints 0 ≤ z ≤ 1 and Then, an application of Theorem 3 yields the inequality t ≥ max Note that variables y 1 , y 2 are originally free as dual variables for equality constraints, however, the nonnegativity constraints are imposed due to the positive definiteness constraint in Y.In Appendix A we provide an independent verification that inequality (24) is indeed valid, and reduces to the quadratic inequality t ≥ d 1 x 2 1 + d 2 x 2 2 − 2x 1 x 2 at integral z.From Theorem 3, we see that cl conv(X) can be described by an infinite number of fractional quadratic/affine inequalities (23).More importantly, the convex hull is finitely generated: the infinite number of quadratic and affine functions are obtained from conic combinations of a finite number of base matrices Γ i and vectors (γ i , β i ), which correspond precisely to the minimal description of P .To solve the resulting semi-infinite problem in practice, one can employ a delayed cut generation scheme, where at each iteration, the problem with a subset of inequalities ( 22) is solved to obtain (x, z).Then, the separation problem to find a maximum violated inequality (i.e., y) at ( t, x, z), if it exists, is a convex optimization problem given by the inner maximization problem in (23).
Example 3 (Rank-one function with constraints).Given Z ⊆ {0, 1} n , consider the set that is, a rank-one function with arbitrary constraints on the indicator variables z defined by Z.As discussed in the proof of Proposition 4, P F ⊆ R n+1 with one additional variable W ∈ R 1 which, at integer points, is given by W = max{z 1 , . . ., z n }.For simplicity, assume that 0 ∈ Z, and that both conv(Z) and conv(Z \ {0}) are full-dimensional.Finally, consider all facetdefining inequalities of conv(Z\{0}) of the form γ ⊤ i z ≥ 1 (that is, inequalities that cut off point 0), for i = 1, . . ., m.Now consider inequalities First, observe that inequalities ( 25) are valid for Second, note that inequalities (25) are facet-defining for P F .Indeed, given i ∈ [m], consider the face there are n affinely independent points {z j } n j=1 such that z j ∈ Z i .Thus, we find that points (z j , 1) n j=1 and (0, 0) are (n + 1)-affinely independent points satisfying (25) at equality.Moreover, one can easily verify that inequality W ≤ 1 is facet-defining as well.Thus, from (23) (adapted to the factorable representation discussed in §3.2), we conclude that the inequality is valid for cl conv(X Z R1 ).Moreover, an optimal solution to optimization problem (26) corresponds to setting y i = 1 for i ∈ arg min i∈[m] {γ ⊤ i z}, and we conclude that inequalities t ≥ (h ⊤ x) 2 and t ≥ (h ⊤ x) 2 /(γ ⊤ i z), i ∈ [m] are valid for cl conv(X Z R1 ).Indeed, as shown in [34], these inequalities along with z ∈ conv(Z) fully describe cl conv(X Z R1 ) (when a nondecomposability condition holds).
Connection with decomposition methods.From Theorem 3, we see that the convex hull, X, is obtained by adding conic quadratic inequalities with simpler quadratic structure x ⊤ Γ i x (corresponding to inequalities describing P ).In particular, the intuition is similar to convexifications obtained from decompositions (3).We now show how the theory presented in this paper sheds light on the strength of the aforementioned decompositions.Suppose inequalities ( 16), which we repeat for convenience: are valid for P and, additionally, Γ i 0 for all i ∈ [m].Since P is not full-dimensional in general, positive semidefiniteness conditions may not be as restrictive as they initially seem.
Example 4 (Description of cl conv(X 2×2 ), continued).None of the matrices in the facets of P for cl conv(X 2×2 ) given in Example 2 are positive semidefinite.Nonetheless, the inequalities below also describe P (we abuse notation and encode using variables y how each inequality is obtained): In particular, the last two inequalities satisfy positive semidefiniteness.Moreover, the relaxation of the first two equalities obtained by replacing them with inequalities also satisfies positive semidefiniteness.Finally, if Q is sufficiently diagonally dominant and d 1 d 2 ≥ 4, then the third and fourth inequalities satisfy positive semidefiniteness as well.Now suppose that in (23), we fix , where λ is small enough to ensure that constraint m i=1 Tr(Γ i )y i ≤ 1 is satisfied.Then inequality (23) which is precisely the relaxations obtained from (3).We make the following two important observations.
Observation 1 .Relaxations obtained by fixing a given decomposition (3) [20,21] are, in general, insufficient to describe cl conv(X).Indeed, from Theorem 3, describing cl conv(X) requires one inequality per extreme point of the region Y, whereas a given decomposition corresponds to a single point in this region.
Observation 2 .On the other hand, the strong "optimal" or "dynamic" relaxations [7,18,35], where the decomposition is not fixed but instead is chosen dynamically, are excessive to describe cl conv(X).Indeed, they are of the form (23) for every possible (rank-one, 2 × 2, remainder) matrix, and are not finitely generated; whereas, our results imply that the necessary inequalities are finitely generated.We conclude this section with an analysis of rank-one decompositions, where we assume for simplicity that Q ≻ 0: given a subset T ⊆ 2 [n] , rankone relaxations are given by where R = Q − T ∈T ĥT ĥ⊤ T 0, and ĥT ∈ R n are given vectors that are zero in entries not indexed by T .Relaxation (28) can be interpreted as a decomposition obtained from valid inequalities for P of the form ĥT ĥ⊤ where γ ≥ 0. Note that inequality ( 29) is valid for P if , then inequality (29) defines a face of P of dimension at least dim(P 0 ) + 1, where Proof.There are dim(P 0 ) + 1 affinely independent points in P 0 , and all satisfy (29) at equality.Letting S * ∈ arg max êS ∈Z ) is an additional affinely independent point satisfying (29) at equality.Note that if optimization problem (30) has multiple optimal solutions, then one can find additional affinely independent points.In particular, ( 29) is guaranteed to define a high dimensional face of P if |T | is small.Indeed, inequalities (29) were found to be particularly effective computationally if T = {T ⊆ [n] : |T | ≤ κ} for some small κ [7], although a theoretical justification of this observation has been missing until now.
Remark 7. [Description of cl conv(X 2×2 ), continued] Consider again the facet-defining inequalities given in Example 4. The last two inequalities correspond to a rank-one strengthening with |T | = 1, which leads to relaxations of X 2×2 similar to the perspective relaxation.Thus, we may argue that the perspective relaxation is required to describe cl conv(X 2×2 ).

A Mixed-integer Linear Formulation for P
The polyhedron P can (in theory) be studied using standard methods from mixed-integer linear optimization.However, the vertex representation of P is often not convenient, as most techniques require that the polyhedron be described explicitly via linear inequalities.Thus, in this section, we present such a mixed-integer linear formulation for the vertices of polytope P when the Hessian matrix Q is positive definite.
First, we describe the linear equalities necessary for P .Throughout this section, for ease of exposition, for a given S ⊆ [n], we permute the rows and columns of Q such that indices in S appear first.
Proof.For any Observe that the i th diagonal entry of Q−1 S Q is one if i ∈ S and zero otherwise.Since at all extreme points of P we have z = êS and Since P satisfies n linearly independent equalities, we immediately get insights into the dimension of P .
Proof.Polyhedron P has n + n 2 variables, but symmetry constraints W ij = W ji and equalities (31) imply the upper bound on the dimension.If Q is dense, the set of points (ê {i,j} , Q −1 {i,j} ) i =j are n(n + 1)/2 affinely independent points of P , because each point is the unique one satisfying W ij = 0. Together with point (0, 0), we find the required n(n + 1)/2 + 1 affinely independent points in P .
From Corollary 1, we see that (under mild conditions) there are no other equalities in the description of P .In order to construct a mixed-integer linear formulation for the vertices of P , we will use big-M constraints.Lemmas 2 and 3 are necessary to identify valid bounds for coefficients M .

Lemma 2. For any
. Since switching the order of matrix multiplication does not change the set of nonzero eigenvalues, the nonzero eigenvalues of Q is an upper triangular matrix, which has a maximum eigenvalue of one.Then we conclude that ) gives a uniform bound on the diagonal elements of Q−1 S , λ max (Q −1 ) also bounds the absolute value of the off-diagonal elements of Q−1 S .Next, we define and prove that M provides a bound for the off-diagonal elements of Q−1 S Q for any S ⊆ [n] in the following lemma.
, where the last inequality follows from Lemma 2.
One can make a few observations about P = {(ê S , Q−1 S )} êS ∈Z .Note that at extreme points of P , W = Q−1 S for some S. Thus, for any extreme point (z, W ) ∈ P , W ij is nonzero only if z i = z j = 1.Moreover, for any S ⊆ , and the off-diagonal entries in the i th row of QW are all zeros if i ∈ S.These two observations lead to the formulation in the following proposition.
Proposition 7. The extreme points of P are described as Proof.For any z = êS ∈ Z, the constraint implies that W ij = 0 if either i or j is not in S. For i ∈ S, we have Inequalities ( 34) and ( 35) imply that . It is clear that the off-diagonal elements in the i th row are all zero if i ∈ S, otherwise (if i ∈ S) they are bounded by M according to Lemma 3. In other words, constraints hold.Moreover, thanks to Lemma 2, the constraints hold at W = Q−1 S and z = êS as well.Proposition 7 allows us to give a mixed-integer linear formulation for the MIQO problem (1).Substituting the mixed-integer linear representation of P given in Proposition 7 in the equivalent MIQO formulation (8), we arrive at an explicit mixed-integer linear formulation for (1): where M is defined in (33).We point out that the mixed-integer representation of P in Proposition 7 relies on big-M constraints and, therefore, it is not a strong formulation.Nonetheless, advanced mixed-integer linear optimization solvers have a plethora of built-in techniques to improve such formulations.Preliminary computations using Gurobi indicate the following findings: (1) The natural relaxation of (37) is very weak and, therefore, (37) results in worse performance than alternative (nonlinear) formulations for problem (1) in most cases.(2) In some cases, however, and notably when the matrix Q is sparse, Gurobi improves the relaxation in presolve to the point where the problems are solved at the root node, faster than existing formulations for (1).This situation illustrates that (in some cases), due to the polyhedrality of P , existing methods can improve even weak relaxations, whereas similar improvements are not available for nonlinear formulations.
Detailed computational results are presented in Appendix B. Overall, the results illustrate the potential benefits of reducing convexification to describing a polyhedral set, but also indicate that much work remains to be done for deriving better relaxations of P .

Conclusion
In this paper, we first describe the convex hull of the epigraph of a convex quadratic function with indicators in an extended space, which is given by one semi-definite constraint, and an exponential system of linear inequalities defining the convex hull of a polytope, P (or P F ).We then derive the convex hull description in the original space as a semi-infinite conic quadratic program.Furthermore, we give a compact mixed-integer linear representation of the vertices of the polytope P that results in the first compact mixed-integer linear formulation of MIQO problems.While this is a weak formulation, our preliminary computational experience indicates that for a class of sparse problems, off-the-shelf solvers are able to take advantage of the developments in MILO to improve the formulation substantially and it is competitive if not better than state-of-the-art approaches.To translate our theoretical developments into effective practical methods, it is crucial to exploit the structure of P .In our ongoing work, we explore the case when Q is a Stieltjes matrix for which P has a nice structure that allows us to use our results directly without resorting to the MILO formulation.Our results provide a unifying framework for several convex relaxations of MIQO problems in the literature and can also be used to evaluate their strength.
Natural: The natural reformulation, where we replace the nonconvex constraint x i (1 − z i ) = 0 in (1) with |x i | ≤ 5 x * ∞ z i , where x * denotes the optimal solution of the problem without binary variables or cardinality constraints.Observe that 5 x * ∞ is not guaranteed to be a valid bound on |x i |, thus this formulation may produce suboptimal solutions for (1).Perspective: The perspective reformulation [3,15,19,23] where we extract a diagonal term diag(δ) from Q and add the perspective constraints We choose δ to be the minimum eigenvalue of Q in our experiments.
In all experiments, Z is defined by a cardinality constraint, i.e., Z = {z ∈ {0, 1} n | n i=1 z i ≤ r}, where r = kn for a given sparsity parameter 0 < k ≤ 1, and b = 0.The mixed-integer optimization problems are solved by Gurobi 9.0 on a laptop with Intel(R) Core(TM) i7-8750H 2.20 GHz and 32 GB RAM.We set the time limit to 30 minutes, and we use the default settings of Gurobi parameters.B.1.Best subset selection.In this section, we solve the best subset selection problem (2) with varying k on the benchmark datasets in Table 3, available from the UCI machine learning repository1 .The performance measures considered are solution time, the number of nodes explored, and the initial optimality gap of the continuous relaxation.We also record the optimality gaps attained at the root node after presolve (in parentheses).Denoting the optimal objective value of a continuous relaxation by LB and the exact optimal value by OPT, the initial optimality gap is calculated as % gap = 100 × OPT−LB OPT .For instances that hit the time limit, we report the average end gap in parentheses.4 shows the performance of the different formulations on these benchmark datasets.We observe that the relaxation quality of MILO is poor, with optimality gaps well above 100% (in the range of 10 3 − 10 7 %).Indeed, even though the objective of (2) has a trivial lower bound of 0, the objective values produced by the continuous relaxation of MILO are in all cases negative.The bad relaxation quality leads to large numbers of branch-and-bound nodes and solution times.However, for the special case of k = 0.1 on the first three datasets, Gurobi is able to close almost all the gap at the root node and solve the problems with little or no branching.Thus, while the results clearly indicate that at the moment-in the context of a general MIQO-standard methods2 are better than the MILO formulation, in some cases, solvers might be able to exploit the polyhedrality of MILO.In the next section, we present experiments showcasing this phenomenon.B.2. Inference with graphical models.Given a graph G = (V, E), we consider the following MIQO problem min z,x i∈V Problem (41) arises in the sparse inference problem of a two-dimensional Gaussian Markov random field (GMRF), see [30] for an in-depth discussion.
The graph G we consider in our experiment is a two-dimensional 10 × 10 grid.The corresponding Hessian matrix Q in problem (41) is sparse: each row has at most five nonzero entries (including the diagonal element).We use the random instances from [26], available at https://sites.google.com/usc.edu/gomez/data,where y i = x i + N (0, σ) is a noisy observation of x, and there are three randomly sampled 3 × 3 blocks of x to be nonzero.Note that σ affects both the noise level of y and the diagonal dominance of Q in (41), with small noise values σ resulting in problems with larger diagonal dominance.We test on σ = 0.1, 0.2, 0.3, 0.4, 0.5 and sparsity levels k = 0.1, 0.2, 0.3, 0.4, 0.5.For each σ, we use five randomly generated instances and report the average statistics.A 1800 second solution time means Gurobi hits the time limit, and we report the best optimality gap in the following parenthesis.2. For MILO, the gap after Gurobi's presolve is reported in the following parenthesis.
Table 5 summarizes the results.Similar to the experiments reported in §B.1, the continuous relaxation of MILO is the worst among the three formulations, with gaps well over 100%.However, in this case, Gurobi closes virtually all optimality gap in all the instances, and the problems are solved very fast with at most one branch-and-bound node.The overall performance is significantly better than using the natural MIQO formulation, and also better than the perspective reformulation for these instances.
We conjecture that sparsity of Q, which leads to sparsity in the linear constraints of the MILO formulation, allows Gurobi to perform significant bound tightening in presolve.In contrast, Gurobi is unable to achieve a similar improvement with a nonlinear formulation.This clearly showcases the benefit of reducing the convexification of X to describing a polyhedron in such cases.A super script i indicates that i out of five instances hit the time limit.2. For instances reaching the time limit, the average best optimality gap is reported in the parenthesis following the solution time.3.For MILO, the gap after Gurobi's presolve is recorded in parentheses.

Table 4 .
(3)formance of MILO, Natural and Perspective on the datasets in Table(3).

Table 5 .
Performance of MILO, Natural, and Perspective formulations on graphical models.