Supermodularity and valid inequalities for quadratic optimization with indicators

We study the minimization of a rank-one quadratic with indicators and show that the underlying set function obtained by projecting out the continuous variables is supermodular. Although supermodular minimization is, in general, difficult, the specific set function for the rank-one quadratic can be minimized in linear time. We show that the convex hull of the epigraph of the quadratic can be obtained from inequalities for the underlying supermodular set function by lifting them into nonlinear inequalities in the original space of variables. Explicit forms of the convex-hull description are given, both in the original space of variables and in an extended formulation via conic quadratic-representable inequalities, along with a polynomial separation algorithm. Computational experiments indicate that the lifted supermodular inequalities in conic quadratic form are quite effective in reducing the integrality gap for quadratic optimization with indicators.


INTRODUCTION
Consider the convex quadratic optimization problem with indicators min a x+b y+y Qy : where a, b ∈ R n and Q ∈ R n×n is a symmetric positive semi-definite matrix.For each i = 1, . . ., n, the binary variable x i , along with the complementary constraint y i (1 − x i ) = 0, indicates whether y i may take positive values.Problem (1) arises in numerous practical applications, including portfolio optimization [16], signal/image denoising [13,14], best subset selection [15,20], and unit commitment [25].
Constructing strong convex relaxations for non-convex optimization problems is critical in devising effective solution approaches for them.Natural convex relaxations of (1), where the complementary constraints y i (1 − x i ) = 0 are linearized using the so-called "big-M" constraints y i ≤ Mx i , are known to be weak [e.g., 39].Therefore, there is a increasing effort in the literature to better understand and describe the epigraph of quadratic functions with indicator variables.Dong and Linderoth [21] describe lifted linear inequalities for (1) from its continuous quadratic optimization counterpart over bounded variables.Bienstock and Michalka [17] give a characterization linear inequalities obtained by strengthening gradient inequalities of a convex objective function over a non-convex set.
The majority of the work toward constructing strong relaxations of ( 1) is based on the perspective reformulation [2,18,23,32,34,38,53,55].The perspective reformulation, which may be seen as a consequence of the convexifications based on disjunctive programming derived in [19], is based on strengthening the epigraph of a univariate convex quadratic function y 2 i ≤ t by using its perspective y 2 i /x i ≤ t.The perspective strengthening can be applied to a general convex quadratic y Qy, by writing it as y (Q − D)y + y Dy for a diagonal matrix D 0 and Q − D 0, and simply reformulating each separable quadratic term D ii y 2 i as D ii y 2 i /x i [22,24,59].While this approach is effective when Q is strongly diagonal dominant, it is ineffective otherwise, or inapplicable when Q is not full-rank as no such D exists.
To address the limitations of the perspective reformulation, a recent stream of research focuses on constructing strong relaxations of the epigraphs of simple but multi-variable quadratic functions.Jeon et al [35] use linear lifting to construct valid inequalities for the epigraphs of two-variable quadratic functions.Frangioni et al [26] use extended formulations based on disjunctive programming to derive stronger relaxations of the epigraph of two-variable functions.They study heuristics and semi-definite programming (SDP) approaches to extract from Q such twovariable terms.The disjunctive approach results in a substantial increase in the size of the formulations, which limits its use to small instances.Atamtürk and Gómez [6] describe the convex hull of the epigraph of the two-variable quadratic function (y 1 − y 2 ) 2 ≤ t in the original space of variables, and Atamtürk et al [13] generalize this result to convex two-variable quadratic functions a 1 y 2 1 − 2y 1 y 2 + a 2 y 2 2 ≤ t and show how to optimally decompose an M-matrix (psd with non-positive offdiagonals) Q into such two-variable terms; the results indicate that such formulations considerably improve the convex relaxations when Q is an M-matrix, but the relaxation quality degrades when Q has positive off-diagonal entries.Han et al [33] give SDP formulations for (1) based on convex-hull descriptions of the 2x2 case.These SDP formulations require O(n 2 ) additional variables and constraints, which may not scale to large problems.Atamtürk and Gómez [7] give the convex hull description of a rank-one function with free continuous variables, and propose an SDP formulation to tackle quadratic optimization problems with free variables arising in sparse regression.Wei et al [50,51] extend those results, deriving ideal formulations for rank-one functions with arbitrary constraints on the indicator variables x.These formulations are shown to be effective in sparse regression problems; however as they do not account for the non-negativity constraints on the continuous variables, they are weak for (1).The rank-one quadratic set studied in this paper addresses this gap and properly generalizes the perspective strengthening of a univariate quadratic to higher dimensions.
In the context of discrete optimization, submodularity/supermodularity plays a critical role in the design of algorithms [27,31,43] and in constructing convex relaxations to discrete problems [1,5,10,41,47,54,56,57,58].Exploiting submodularity in settings involving continuous variables as well typically require specialized arguments, e.g., see [12,36,48].A notable exception is Wolsey [52], presenting a systematic approach for exploiting submodularity in fixed-charge network problems.As submodularity arises in combinatorial optimization, where the convex hulls of the sets under study are polyhedral, there are few papers utilizing submodularity to describe non-polyhedral convex hulls [8], and those sets typically involve some degree of separability between continuous and discrete variables.In this paper, we show how to generalize the valid inequalities proposed in [52] to convexify non-polyhedral sets, where the continuous variables are linked with the binary variables via indicator constraints.
Contributions.Here, we study the mixed-integer epigraph of a rank-one quadratic function with indicator variables and non-negative continuous variables: where (N + , N − ) is a partition of N := {1, . . ., n}.Observe that any rank-one quadratic of the form (c y) 2 ≤ t with c i = 0 for all i ∈ N can be written as in X by scaling the continuous variables.If all coefficients of c are of the same sign, then either N + = / 0 or N − = / 0, and X reduces to the simpler form To the best of our knowledge, the convex hull structure of X or X + has not been studied before.Interestingly, optimization of a linear function over X can be done in polynomial time ( § 4.2).
Our motivation for studying X stems from constructing strong convex relaxations for problem (1) by writing the convex quadratic y Qy as a sum of rankone quadratics.Especially in large-scale applications, it is effective to state Q as a sum of a low-rank matrix and a diagonal matrix.Specifically, suppose that Q = FF + D, where F ∈ R n×r and D ∈ R n×n is a (possibly empty) nonnegative diagonal matrix.Such decompositions can be constructed in numerous ways, including singular-value decomposition, Cholesky decomposition, or via factor models.Letting F j denote the j-th column of F, adding auxiliary variables t ∈ R r , j = 1, . . ., r, and using the perspective reformulation, problem (1) can be cast as Formulation (2) arises naturally, for example, in portfolio risk minimization [16], where the covariance matrix Q is the sum of a low-rank factor covariance matrix and an idiosyncratic (diagonal) variance matrix.When the entries of the diagonal matrix D are small, the perspective reformulation is not effective in strengthening the formulation.However, noting that (x, F j • y,t j ) ∈ X, where (F j • y) i = F i j y i , for each j = 1, . . ., r, one can employ strong relaxations based on the rank-one quadratic with indicators, X.Our approach for decomposing y Qy into a sum of rankone quadratics and utilizing strong relaxations of epigraphs of rank-one quadratics is analogous to employing cuts separately from individual rows of a constraint matrix Ax ≤ b in mixed-integer linear programming.
In this paper, we present a generic framework for obtaining valid inequalities for mixed-integer nonlinear optimization problems by exploiting supermodularity of the underlying set function.To do so, we project out the continuous variables and derive valid inequalities for the corresponding pure integer set and then lift these inequalities to the space of continuous variables as in Nguyen et al [42], Richard and Tawarmalani [46].It turns out that for the rank-one quadratic with indicators, the corresponding set function is supermodular and holds much of the structure of X.The lifted supermodular inequalities derived in this paper are nonlinear in both the continuous and discrete variables.
We show that this approach encompasses several previously known convexifications for quadratic optimization with indicator variables.Moreover, the wellknown inequalities in the mixed-integer linear optimization literature given [52], which include flow cover inequalities as a special case, can also be obtained via the lifted supermodular inequalities.
Finally, and more importantly, we show that the lifted supermodular inequalities and bound constraints are sufficient to describe cl conv(X).Such convex hull descriptions of high-dimensional nonlinear sets are rare in the literature.In particular, we give a characterization in the original space of variables.This description is defined by a piecewise valid function with exponentially many pieces; therefore, it cannot be used by the convex optimization solvers directly.To overcome this difficulty, we also give a conic quadratic representable description in an extended space, with exponentially many valid conic quadratic inequalities, along with a polynomial-time separation algorithm.
The rank-one quadratic sets X and X + appear very similar to their relaxation where the non-negativity constraints on the continuous variables y ≥ 0 are dropped.
However, while only one additional inequality (∑ i∈N y i ) 2 ∑ i∈N x i ≤ t is needed to describe cl conv(X f ) [7] , the convex hulls of X and X + are substantially more complicated and rich.Indeed, cl conv(X f ) provides a weak relaxation for cl conv(X + ), as illustrated in the next example.Example 1.Consider set X + with n = 3.For the relaxation X f , the closure of the convex hull is described by 0 ≤ x ≤ 1 and inequality t ≥ (y 1 +y 2 +y 3 ) 2 min{1,x 1 +x 2 +x 3 } .Figure 1 (A) depicts this inequality as a function of (x 1 , y 1 ) for x 2 = 0.6, x 3 = 0.3, y 2 = 0.5, and y 3 = 0.2 (fixed).In Proposition 8, we give the function f describing cl conv(X + ). Figure 1 (B) depicts f (x, y) (truncated at 5) as a function of (x 1 , y 1 ) when other variables are fixed as before.We find that cl conv(X f ) is a very weak relaxation of cl conv(X + ) for low values of x 1 .For example, for x = 0.01 and y = 1, we find that (1+0.5+0.2) 2 0.01+0.6+0.3 ≈ 3.18, whereas f (x, y) ≈ 100.55.The computation of f for this example is described after Proposition 8. Outline.The rest of the paper is organized as follows.In §2 we review the valid inequalities for supermodular set functions and present the general form of the lifted supermodular inequalities.In §3 we re-derive known ideal formulations in the literature for quadratic optimization using the lifted supermodular inequalities.In §4 we show that the lifted supermodular inequalities are sufficient to describe the convex hull of X.In §5 we provide the explicit form of the lifted supermodular inequalities for X, both in the original space of variables and in conic quadratic representable form in an extended space, and discuss the separation problem.In §6 we present computational results, and in §7 we conclude the paper.
Notation.For a set S ⊆ N, define x S as the indicator vector of S, and define S x as the support set of a vector x ∈ {0, 1} N .By abusing notation, we use x S and S x interchangeably, e.g., given a set function g : 2 N → R, we may equivalently write g(S) or g(x S ).To simplify the notation, given i ∈ N and S ⊆ N, we write S ∪ i instead of S ∪ {i} and S \ i instead of S \ {i}.For a set Y ⊆ R N , let conv(Y ) denote the convex hull of Y and cl conv(Y) denote its closure.We adopt the convention that a/0 = ∞ if a > 0 and a/0 = 0 if a = 0.For a a ∈ R, let a + = max{a, 0}.For a vector c ∈ R N and a set S ⊆ N, we let c(S) = ∑ i∈S c i and max c (S) = max i∈S c i (by convention, max c ( / 0) = 0).

PRELIMINARIES
In this section we cover a few preliminary results for the paper and, at the end, give the general form of the lifted supermodular inequalities (Theorem 1).

Supermodularity and valid inequalities. A set function
where ρ(i, S) = g(S ∪ i) − g(S) is the increment function.
Proposition 1 (Nemhauser et al [41]).If g is a supermodular function, then (1) g(T ) ≥ g(S) As a direct consequence of Proposition 1, one can construct valid inequalities for the epigraph of a supermodular function g, i.e., Specifically, for any S ⊆ N, the linear supermodular inequalities [40] are valid for Z.
2.2.Lifted supermodular inequalities.We now describe a family of lifted supermodular inequalities, using a lifting approach similar to the ones used in [28,46].Let h : {0, 1} N × R N → R ∪ {∞} be a function defined over a mixed 0-1 domain and consider its epigraph Observe that H allows for arbitrary constraints, which can be encoded via function h.For example, nonnegativity and complementary constraints can be included by letting h(x, y) = ∞ whenever y i < 0 or y i (1 − x i ) = 0 for some i ∈ N.
For α ∈ R N , define the set function g α : {0, 1} N → R ∪ {∞, −∞} as and let B ⊆ R N be the set of values of α for which problem (4) is bounded, i.e., Although supermodularity is defined for set functions only, we propose in Definition 1 below an extension for functions involving continuous variables as well.
Definition 1. Function h is supermodular if the set function g α defined in ( 4) is supermodular for all α ∈ B.
Remark 1. Suppose that h does not depend on the continuous variables y, i.e., h(x, y) = g(x).In this case problem (4) is unbounded unless α = 0, i.e., B = {0}, and we find that h(x, y) is supermodular if and only if g 0 (x) = g(x) is supermodular.Thus, Definition 1 includes the usual definition of supermodularity for set functions as a special case.
Proposition 2. If function h is supermodular, then for any α ∈ B and S ⊆ N, the inequalities are valid for H, where ρ α (i, S) = g α (S ∪ i) − g α (S).
Proof.For any α ∈ B, S ⊆ N, and (x, y,t) ∈ H, we find where the first inequality follows directly from the definition of H, the second inequality follows by minimizing h(y)−α y with respect to y, and the third inequality follows from the validity of (3a).Thus, by adding α y on both sides, we find that inequality (5a) is valid.The validity of (5b) is proven identically.
Since inequalities (5) are valid for any α ∈ B, one can obtain stronger valid inequalities by optimally choosing vector α.
Theorem 1 (Lifted supermodular inequalities).If h is supermodular, then for any S ⊆ N, the lifted supermodular inequalities are valid for H.
Observe that while inequalities (5) are linear, inequalities (6) are nonlinear in x and y.Moreover, each inequality ( 6) is convex since it is defined as a supremum of linear inequalities.In addition, if the base supermodular inequalities (3) are strong for the convex hull of epi g α , then the lifted supermodular inequalities (6) are strong for H as well, as formalized next.Given α ∈ B, define Note that conv(G α ) is a polyhedron.Theorem 2 below is a direct consequence of Theorem 1 in [46].
Although Definition 1 may appear to be too restrictive to arise in practice, we show in §2.3 that supermodular functions are in fact widespread in a class of wellstudied problems in mixed-integer linear optimization.In §3 we show that several existing results for quadratic optimization with indicators can be obtained as lifted supermodular inequalities.Perhaps, more surprisingly, for the rank-one quadratic with indicators we show in §4 that conditions in Definition 1 and Theorem 2 are satisfied as well.

Supermodular inequalities and fixed
Wolsey [52] uses FC(x) to describe network structures arising in flow problems with fixed charges on the arcs: N + denotes the incoming arcs into a given subgraph, N − denotes the outgoing arcs, and whereas A + ∪ A − denotes the internal arcs in the subgraph, and b represents the supply/demand of the subgraph.Finally, define ) is supermodular, and inequalities ( 5) and ( 6) are valid.Moreover, Wolsey [52] shows that the linear supermodular inequalities (5) with α ∈ {−1, 0, 1} N include as special cases well-known inequalities for mixed-integer linear optimization such as flowcover inequalities [44,49] and inequalities for capacitated lot-sizing [9,45]; several other classes for fixed-charge network flow problems are special cases as well [4,11,12].Therefore, the inequalities presented in this paper can be interpreted as nonlinear generalizations of the aforementioned inequalities.

PREVIOUS RESULTS AS LIFTED SUPERMODULAR INEQUALITIES
In order to illustrate the approach, in this section, we show how existing results for quadratic optimization with indicators can be derived using the lifted supermodular inequalities (6).
3.1.The single-variable case.Consider, first, the single-variable case ) is given by the perspective reformulation [2,19,23,32]: We now derive the perspective reformulation as a special case, in fact, using a modular inequality.Note have that g α (0) = 0 and g α (1) = min y∈R + −αy 4 since y * = α/2 if α ≥ 0 and y * = 0 otherwise.Thus, g α is a modular function for any α ∈ R N , and inequalities (3) reduce to Then, we find that inequalities (6) reduce to the perspective of y 2 : The rank-one case with free continuous variables.Consider the relaxation of X obtained by dropping the non-negativity constraints y ≥ 0: Observe that any rank-one quadratic constraint of the form (∑ i∈N c i y i ) 2 ≤ t with c i = 0 can be transformed into the form given in X f by scaling the continuous variables (so that |c i | = 1) and negating variables as ȳi := −y i if c i < 0. The closure of the convex hull of X f is derived in [7], and the effectiveness of the resulting inequalities is demonstrated on sparse regression problems.We now re-derive the description of cl conv(X f ) using lifted supermodular inequalities.
For S ⊆ N, we have It is easy to see that g α (x S ) = −∞ unless α i = α j for all i = j, see [7].Therefore, letting ᾱ = α i for all i ∈ N, we find that where the optimal solution is found by setting y(S) = ᾱ/2.The function g α is supermodular since ρ ᾱ (i, / 0) = − ᾱ2 /4 and ρ ᾱ (i, S) = 0 for any S = / 0.
As we shall see in §4, incorporating the non-negativity constraints y ≥ 0, conv(X) is substantially more complex than conv(X f ).Nonetheless, as shown in Example 1, the resulting convexification is substantially stronger as well.
3.3.The rank-one case with a negative off-diagonal.Consider the special case of X with two continuous variables (N = {1, 2}) with a negative off-diagonal: Observe that any quadratic constraint of the form (c In particular, g α is supermodular (and in fact modular) for any fixed α such that . Letting S = / 0, inequality (6a) reduces to max An optimal solution of (7) can be found as follows.If y 1 ≥ y 2 , then set α 1 > 0 and α 2 = −α 1 < 0.Moreover, in this case, the optimal value is given by The case y 2 ≥ y 1 is identical.The resulting piecewise valid inequality along with the bound constraints 0 ≤ x ≤ 1, 0 ≤ y describe cl conv(X 2 − ) [6].We point that a conic quadratic representation for cl conv(X 2 − ) and generalizations to (not necessarily rank-one) quadratic functions with negative off-diagonals are given in [13].
3.4.Outlier detection with temporal data.In the context of outlier detection with temporal data, Gómez [29] studies the set where a 1 , a 2 > 0 are constants.While we refer the reader to [29] for details on the derivation of cl conv(X T ), we point out that it can in fact be described by lifted supermodular inequalities.Indeed, in this case, function g α is given by where K 1 (α) and K 2 (α) are constants that do not depend on x and K 2 (α) ≥ 0. Since max{x 1 , x 2 } is a submodular function, it follows that g α is supermodular.

CONVEX HULL VIA LIFTED SUPERMODULAR INEQUALITIES
We now turn our attention to the rank-one sets X and X + .This section is devoted to showing that the lifted supermodular inequalities (6) are sufficient to describe cl conv(X) and cl conv(X + ).By Theorem 2, it suffices to derive an explicit form of the projection function g α and show that inequalities (3) describe the convex hull of its epigraph G α .The rest of this section is organized as follows.In §4.1 we derive the set function g α defined in (4) for the rank-one quadratic function and then show that it is supermodular.In §4.2 we describe the convex hull of G α using only a small subset of the supermodular inequalities (3).4.1.The set function g α .We present the derivation of set function g α for X + and X separately, and then verify that g α is indeed supermodular.
Therefore, for S ⊆ N, Note that ( 9) is bounded for all α ∈ R S , thus B = R N .Since, for α i < 0, y i = 0 in any optimal solution, we assume for simplicity that α ≥ 0 and B = R N + .From the KKT conditions corresponding to variable y k ≥ 0 in (9), we find that 2y(S) ≥ α k , (10) and, by complementary slackness, (10) holds at equality whenever y k > 0.Moreover, let j ∈ S such that α j = max α (S); setting y j = α j /2 and y i = 0 for i ∈ S \ j, we find a feasible solution for ( 9) that satisfies all dual feasibility conditions (10) and complementary slackness, and therefore is optimal for the convex optimization problem (9).Thus, we conclude that If S ∩N − = / 0 or S ∩N + = / 0, then we find from §4.1.
Proof.Let p = arg max i∈S + α i and q = arg max i∈S − α i .If α p + α q > 0 , then e p + e q is an unbounded direction.Otherwise, , where the second inequality follows from α p + α q ≤ 0.
Note that for (12) to hold, if there exists j ∈ S − such that α j ≥ 0, then α i ≤ 0 for all i ∈ S + .Therefore, either α i ≤ 0 for all i ∈ S + or α j ≤ 0 for all j ∈ S − .Also note that we may equivalently rewrite (12) as α i + α j ≤ 0, for all i ∈ S + , j ∈ S − .First, assume that α j ≤ 0 for all j ∈ S − .In this case, there exists an optimal solution of (11) where y(S − ) = 0 and (11) reduces to (9).Then, we may assume that α i ≥ 0 for all i ∈ S + as in §4.1.1,and arrive at By symmetry, if α i ≤ 0 for all i ∈ S + , we may assume that α j ≥ 0 for all i ∈ S − and From the discussion above, we see that we can assume in ( 6) that It is convenient to partition B into two sets so that B = B + ∪ B − , where and analyze the inequalities separately for each set.Therefore, instead of studying inequalities (6) directly, one can equivalently study their relaxation where either α ∈ B + or α ∈ B − ; consequently, each inequality (6) corresponds to (the maximum of) two simpler inequalities.Since the sets B + and B − are symmetric, and inequalities (6) corresponding to α ∈ B − are simply inequalities where the role of N + and N − is interchanged (and α ∈ B + ), the analysis and derivation of the inequalities is simplified.Therefore, in the sequel, we will derive the inequalities for α ∈ B + only and then state the inequalities corresponding to B − by interchanging N + and N − .Supermodularity.For α ∈ B + , the set function g α (x) for X is monotone nonincreasing, also it is supermodular as max α (S + ) is submodular.The case for α ∈ B − is analogous.4.2.Convex hull of epi g α .In this section we show that a small subset of the supermodular inequalities (3a) are sufficient to describe the convex hull of the epigraph of the set function g α , i.e., Given nonempty S ⊆ N, ∈ arg max i∈S {α i }, k ∈ arg max i∈N\ {α i }, and If S = / 0, then valid inequalities (3a) reduce to can also be obtained by setting S = N \ (or by choosing any S ⊆ N \ such that k ∈ S).Therefore, when considering inequalities ( 14), we can assume without loss of generality that there exists k ∈ arg max i∈N {α i } such that k ∈ S and, thus, the case α ≥ α k can be ignored.
Remark 3. Suppose that the variables are indexed such that α 1 ≤ . . .≤ α n , let α 0 = 0, and let = max i∈S {i} if S = / 0 and = 0 otherwise.Observe that we can assume without loss of generality that i ∈ S for all i ≤ , since inequalities (14) are the same whether i ∈ S or not.Therefore, it follows that there are only n inequalities (14) given by We now show that inequalities ( 14) characterize the convex hull of G α .
Finally, for ( μ, γ), the objective function (17a) is of the form (15): To verify that ( μ, γ) is optimal for (17), we construct a primal solution λ feasible for ( 16) satisfying complementary slackness.The greedy algorithm for constructing λ is presented in Algorithm 1 and illustrated with an example in Figure 3.
We now check that constraint (16c) is satisfied.At the end of the algorithm, ∑ S⊆N λS = Λ (since variable Λ is updated each time λ is updated).Moreover, at the end of the first cycle (line 13) we have Λ = ∑ n i= +1 x i .If = 0, then Λ = 1 trivially (line 16); otherwise, at the end of the second cycle (line 22) and additional value of x = 1 − ∑ n i= +1 x i (line 18) is added to Λ. Hence, at the end of the algorithm Next, we verify that constraints (16b) are satisfied.For i ∈ {1, . . ., − 1}, at any point in the algorithm, we have that ∑ S⊆N:i∈S λ S = x i − xi .Since, at any point, xi = (x i − Λ) + and Λ = 1 at the end of the algorithm, it follows that ∑ S⊆N:i∈S λ S = x i .For i ∈ { + 1, . . ., n} we also have that ∑ S⊆N:i∈S λ S = x i − xi , and xi = 0 at the end (line 13).Finally, for i = > 0, we have that Finally, to check that λ satisfies complementary slackness, it suffices to observe that all updates of λ correspond to sets S such that exactly one element of S is Algorithm 1 Algorithm for problem (16) Input: x 1 , . . ., x n with 0 =: Output: λ optimal for (16).
1: ← min{k : xi ← xi − v for all i ∈ S At this point xi = (x i − Λ) + for all i < 31: end function greater than (line 10), or to sets S with no element greater than and where ∈ S (line 20), where the corresponding dual constraints are satisfied at equality.Therefore, we conclude that λ and ( μ, γ) are an optimal primal-dual pair.Since problem (17) admits for any x ∈ [0, 1] an optimal solution of the form (15), it follows that those inequalities and bound constraints describe conv(G α ).
Finally, we obtain the main result of this section: that the (nonlinear) lifted supermodular inequalities are sufficient to describe the closure of the convex hull of X.
Proof.Follows immediately from Proposition 5 and Theorem 2.
Remark 4. We end this section with the remark that optimization of a linear function over X can be done easily using the projection function g α .Consider Projecting out the continuous variables using g α , the problem reduces to min which can be solved in linear time.

EXPLICIT FORM OF THE LIFTED SUPERMODULAR INEQUALITIES
In this section we derive explicit forms of the lifted supermodular inequalities ( 18)- (19).In §5.1 we describe the inequalities in the original space of variables, and describe how to solve the separation problem.In §5.2 we provide conic quadratic representable inequalities in an extended space, which can then be implemented with off-the-shelf conic solvers.

5.1.
Inequalities and separation in the original space of variables.

5.1.1.
Lifted inequalities for X.We first present the inequalities for the more general set X. Finding a closed form expression for the lifted supermodular inequalities (18) for all S + ⊆ N + amounts to solving the maximum lifting problem then inequality (20) otherwise, inequality (20) Below we state two remarks on Proposition 7, and then we prove the result.
Remark 5. Inequalities (22) are neither valid for cl conv(X) nor convex for all (x, y) ∈ [0, 1] N × R N + .Indeed, if condition (21a) is not satisfied, then (22) may not be convex.Moreover, suppose that L = { j} and U = {k} for some j, k ∈ S + : note that setting x i = y i = 0 for all i ∈ N \ { j, k}, x j = x k = 1, y j , y k > 0, and t = (y j + y k ) 2 is feasible for X, but this point is cut off by inequality (22) since In fact, if (x, y,t) ∈ cl conv(X), then ( 22) is guaranteed to hold only when conditions (21a), (21b), (21d), (21e), and (21g) are satisfied.Conditions (21c) and (21f) do not affect the validity of ( 22) but if they are not satisfied then ( 22) is weak, i.e., a stronger inequality can be obtained from another choice of L and U. Remark 6.If y(N + ) < y(N − ), there exists no L and U satisfying condition (21d) in Proposition 7.However, in this case, the role of N + and N − can be interchanged to satisfy (21d); interchanging N + and N − is equivalent to letting α ∈ B − .Proof of Proposition 7. Let us define variables auxiliary variables β , γ ∈ R as β = max α (N − ) and γ = max α (S + ), respectively.Then, inequality (20) where constraints (23b) and (23c) enforce the definitions of γ and β , and constraints (23d) and (23e) enforce that α ∈ B + .First, observe that there exists an optimal solution of (23) with γ ≤ α i for all i ∈ N + : if α i < γ for some i ∈ N + , then setting α i = γ results in a feasible solution with improved objective value.Therefore, the value of S + is completely determined by γ since S + = {i ∈ N + : α i ≤ γ} Also note that α i = β for all i ∈ N − : if α i < β for some i ∈ N − , then setting α i = β results in an improved objective value.We now consider two cases: Case 1. Suppose in an optimal solution of ( 23) we have γ = −β , which implies that α i = γ for all i ∈ N + and α i = −γ for all i ∈ N − .In this case, (23) 4 , which, after optimizing for γ, further reduces to the original rank-one quadratic inequality t ≥ y(N + ) − y(N − ) 2 .
Case 2. Now suppose γ < −β in an optimal solution.Let L = {i ∈ N + : α i = γ} and U = {i ∈ N + : α i = −β }.Then, from the discussion above, ( 23) reduces to Observe that for (L,U, γ) to correspond to an optimal solution, we must have 1 − x(N + \ L) ≥ 0 (otherwise, γ can be increased to another α i while improving the objective value) and y(U) − y(N − ) ≥ 0 (otherwise, −β can be decreased to another α i while improving the objective value).When both conditions are satisfied, from first-order conditions we see that , and (24) simplifies to (22).The constraints γ < α i are satisfied for all i ∈ N + \ (L ∪ U) if and only if (21b) hold, constraints α i ≤ −β are satisfied for all i ∈ N + \ (L ∪U) if and only if (21e) hold, and constraint α < −β , which may not be implied if N + \ (L ∪U) = / 0, is satisfied if and only if (21g) holds.
Finally, we verify that first order conditions are satisfied for j ∈ L, this is, setting α j > γ results in a worse solution.If condition (21c) does not hold for some j ∈ L, then increasing α j from γ = 2 y(L) 1−x(N + \L) to 2y j /x j improves the objective value.Similarly, we verify that first order conditions for j ∈ U: if condition (21f) does not hold for some j ∈ U, then α j can be decreased from β = y(U)−y(N − ) x(U) to improve the objective value.5.1.2.Lifted inequalities for X + .We now present the inequalities for X + , which can be interpreted as a special cases of the inequalities for X given in §5.1.1.Recall that for set X + , the set B used in (6a) is simply B = R N (we can assume B = R N + without loss of generality) and a closed form expression for (6a) requires solving the lifting problem Note that in the proof of Proposition 7, set U corresponds to the set of variables in N + where constraint α i ≤ − max α (N − ) is tight in an optimal solution of (23).Intuitively, set X + can be interpreted as a special case of X where N + = N and N − = / 0, and such constraints can be dropped from the lifting problem.Therefore, we may assume U = / 0 in Proposition 7. Proposition 8 formalizes this intuition; note however that it is slightly stronger as, unlike Proposition 7, it guarantees the existence of a set satisfying the conditions of the proposition.
+ , there exists a (possibly empty) set L ⊆ N such that and inequality (25) The proof of Proposition 8 is given in Appendix A.
Figure 1 plots the minimum values of t as a function of (x 1 , y 1 ) for cl conv(X f ) and cl conv(X + ).5.1.3.Separation.We now consider the separation problem for inequalities (20) and (25), i.e., given a point ( x, ȳ) ∈ [0, 1] N × R N + , finding sets L,U ⊆ N + satisfying the conditions in Proposition 7 or finding L ⊆ N satisfying the conditions in Proposition 8, respectively.
Separation for (20).First, as pointed out in Remark 6, we verify whether ȳ(N + ) ≥ ȳ(N − ) or ȳ(N + ) < ȳ(N − ); in the first case, we use directly the conditions in Proposition 7, and in the second one, we interchange the roles of N + and N − so that ȳ(N + ) ≥ ȳ(N − ).Next, index the variables so that ȳ1 / x1 ≤ ȳ2 / x2 ≤ . . .≤ ȳm / xm , where m = |N + |, which is done in O(m log m) by sorting.It follows from the conditions in Proposition 7 that if such sets L,U ⊆ N + exist, then L = {i ∈ N + : i ≤ } and U = {i ∈ N + : i ≥ u} for some , u ∈ {1, . . ., m} with < u.Therefore, one can simply enumerate all m(m − 1)/2 possible values of ( , u) and verify whether conditions (21) are satisfied for each candidate set L and U. Hence, the separation algorithm runs in O(n 2 ) time.
Separation for (25).First, we index the variables so that ȳ1 / x1 ≤ ȳ2 / x2 ≤ . . .≤ ȳn / xn ; the indexing process can be accomplished in O(n log n) time by sorting.It follows from the conditions in Proposition 8 that L = {i ∈ N + : i ≤ } for some ∈ {1, . . ., n}.Therefore, one can simply enumerate all n possible values of and verify whether conditions (26) are satisfied for each candidate set L. Since the sorting step dominates the complexity, the separation algorithm runs in O(n log n).

5.2.
Conic quadratic valid inequalities in an extended formulation.Inequalities (22) and (27) given in the original space of variables are valid only over restricted parts of the domain.They are neither valid nor convex over the entire domain of the variables, e.g., (22) is not convex whenever x(N + \ L) ≥ 1.Thus, such inequalities are difficult to utilize directly by the optimization solvers.In order to address this challenge, in this section, we give valid conic quadratic reformulations in an extended space, which can be readily used by conic quadratic solvers.
For a partitioning (L, R,U) of N + consider the inequality Note that each inequality (28) requires O(n) additional variables and constraints.Moreover, although not explicitly enforced, it is easy to verify that there exists an optimal solution to (28) with λ i ≤ y i and λ 0 ≤ y(L).Inequalities (28) are convex as they involve linear constraints and sums of ratios of convex quadratic terms and nonnegative linear terms, thus conic quadratic representable [3,37].We show, in Proposition 9, that inequalities (28) imply the strong formulations described in Proposition 7, and, in Proposition 10, that they are valid for X.
Note that when λ = µ = 0 and λ 0 = µ 0 = ζ = 0, inequality (28) reduces to (22).Thus, if sets L,U satisfy the conditions of Proposition 7 for a given (x, y), then there exists t ∈ R such that (x, y,t) ∈ conv(X) and ( 28) holds at equality.It remains to prove that inequalities (28) do not cut-off any points in X for any choice of partition (L, R,U).Proposition 10.For any partitioning (L, R,U) of N + , inequalities (28) are valid for X.
(30) Note that constraint (30), and assumptions y(N − ) < y i for all i ∈ R and y(N − ) < y(L), imply that λ i ≤ y i and λ 0 ≤ y(L).Observe that we can set Indeed, for any feasible λ , thus µ i ≥ 0. For this choice of µ, we find that Finally, substituting 1 − |R| + µ(R) and µ i in (29a) with their respective values, (29a) reduces to and since y(L) + y(R) = y(N + ), this inequality is valid.Case 5. y(N + ) ≥ y(N − ), x(U) = 0, x(R) ≥ 1, y(N − ) < y(L) but y(N − ) ≥ y j for some j ∈ R: In this case, y i = 0 for all i ∈ U, and we set µ 0 = 0. Note that, in (28), we can set λ j = y j and µ j = x j , resulting in the inequality This inequality of the same form as (28) but with R = R\ j and ŷ(N − ) = y(N − )−y j .After repeating sequentially this process so that λ i = y i and µ i = x i for some subset T ⊆ R, such that y(N − ) − y(T ) ≤ y i for all i ∈ R \ T , and applying a similar strategy as in Case 4, we obtain either an inequality of the form which is valid.Case 6. y(N + ) ≥ y(N − ), x(U) = 0, x(R) ≥ 1, and y(N − ) ≥ y(L): In this case, we can set λ 0 = y(L), µ 0 = 0, and (28) reduces to Moreover, if y(N − ) − y(L) ≥ y j for some j ∈ R, then we can set λ j = y j , µ j = y j as done in Case 5.After repeating this process, we obtain an inequality of the form where y(N − ) − y(L) − y(T ) < y i for all i ∈ R \ T , and therefore x i = 1 for all i ∈ R \ T .Note that constraint (31d) and y(N − ) − y(L) − y(T ) < y i imply that λ i < y i in any feasible solution.Then, for all i ∈ R \ T , we can set Clearly, µ i ≤ x i .Moreover, for all i ∈ R \ T , and constraint (31b) is satisfied.Substituting x i − λ i , i ∈ R \ T , with their explicit form in (31a), we find the equivalent form which is valid.
To derive the corresponding lifted inequalities for B − , it suffices to interchange N + and N − .Therefore, for a partitioning (L, R,U) of N − , we find the conic quadratic inequalities: The main result of the paper is stated below.
For the positive case of X + with N − = / 0, for a partitioning (L, R) of N, inequalities (28) reduce to Note that each inequality (33) also requires O(n) additional variables and constraints but is significantly simpler compared to (28).

COMPUTATIONAL EXPERIMENTS
In this section, we test the computational effectiveness of the conic quadratic inequalities given in §5.2 in solving convex quadratic minimization problems with indicators.In particular, we solve portfolio optimization problems with fixedcharges.All experiments are run with CPLEX 12.8 solver on a laptop with a 1.80GHz Intel®Core TM i7 CPU and 16 GB main memory on a single thread.We use CPLEX default settings but turn on the numerical emphasis parameter, unless stated otherwise.The data for the instances and problem formulations in .lpformat can be found online at https://sites.google.com/usc.edu/gomez/data.
where F ∈ R n×r + with r < n, a, b, d ∈ R N + .We test two classes of instances, general and positive, where either F has both positive and negative entries, or F has only non-negative entries, respectively.Note that constraints (34d) are in fact a big-M reformulation of complementary constraint y i (1 − x i ) = 0: indeed, constraint (34b) and y ≥ 0 imply the upper bound y ≤ 1.The parameters are generated as follows -we use the notation Y ∼ U[ , u] as "Y is generated from a continuous uniform distribution between and u": F: Let ρ be a positive weight parameter.Matrix F = EG where E ∈ R n×r + is an exposure matrix such that E i j = 0 with probability 0.8 and E i j ∼ U[0, 1] otherwise, and G ∈ R r×r + such that: G i j ∼ U[ρ, 1].If ρ ≥ 0, then matrix F is guaranteed to be positive, and we refer to such instances as positive.Otherwise, for ρ < 0, we refer to the instances as general.It is well-documented in the literature that for matrices with large diagonal dominance the perspective reformulation achieves close to 100% gap improvement.Therefore, we choose a low diagonal dominance δ = 0.01 to generate instances hard for the perspective reformulation.In our computations, unless stated otherwise, we use n = 200 and β = (e b)/n.
6.2.Methods.We test the following methods: • Perspective : Problem (34) formulated as s.t.q = F y (36b) • Supermodular : Problem (34) formulated as min where F j denotes the j-th column of F. Additionally, lifted supermodular inequalities (28) are added to strengthen the relaxations.Note that the convex relaxation of (37) without any additional inequalities is equivalent to the convex relaxation of (36).
Cuts (28) (for general instances) or (33) (for positive instances) for method Supermodular are added as follows: (1) We solve the convex relaxation of (37) to obtain a solution ( x, ȳ, t).By default, the convex relaxation is solved with an interior point method.(2) We find a most violated inequality (28) or (33) for each constraint (37b) using the separation algorithm given in §5.1.3.Denote by ν j the rhs value of ( 22) or ( 27) if sets L and U satisfying (21) exist; otherwise, let ν = −∞.(3) Let ε = 10 −3 be a precision parameter.Inequalities found in step (2) are added if either t j < ε and ( ν j − t j ) > ε; or t j ≥ ε and ( ν j − t j )/ t j > ε.At most r inequalities are added per iteration, one for each constraint (37b).(4) This process is repeated until either no inequality is added in step (3) or max number of cuts (3r) is reached.We point out that convexification based on X f [7], described in Proposition ??, is not effective with formulation (37) since t j ≥ (F j y) 2 / min{1, e x} reduces to t j ≥ (F j y) 2 due to (34b) and (34d).6.3.Results.Tables 1-4 present the results for ρ = {−1, −0.5, −0.2, 0}.They show, for different ranks r and values of the fixed cost parameter α, the optimal objective value (opt) and, for each method, the optimal objective value for the convex relaxation (val), the integrality gap (gap) computed as gap = opt−val opt × 100, the improvement (imp) of Supermodular over Perspective computed as imp = gap Persp.− gap Supermod.gap Persp., the time required to solve the relaxation in seconds (time) and the number of cuts added (cuts).The optimal solutions are computed using CPLEX branch-andbound method.The values opt and val are scaled so that, in a given instance, opt = 100.Each row corresponds to the average of five instances generated with the same parameters.First note that Perspective achieves only a very modest improvement over Basic due to the low diagonal dominance parameter δ = 0.01.We also point out that instances with smaller positive weight ρ have weaker natural convex relaxations, i.e., Basic has larger gaps -a similar phenomenon was observed in [26].
The relative performance of all methods in rank-one instances, r = 1, is virtually identical regardless of the value of the positive weight parameter ρ.In particular Supermodular substantially improves upon Basic and Perspective : it achieves 0% gaps in instances with α ≤ 10, and reduces to gap from 35% to 6% in instances with α = 50.
Our interpretation of the dependence of ρ in the strength of the formulation is as follows.For instances with small values of ρ, it is possible to reduce the systematic risk of the portfolio y (FF )y close to zero due to negative correlations,  branch-and-bound methods may required tailored implementations, not supported by current off-the-shelf branch-and-bound solvers.
6.4.On the performance with off-the-shelf branch-and-bound solvers.We also experimented with solving the formulations Supermodular obtained after adding cuts with CPLEX branch-and-bound algorithm.However, note that inequalities (28) and, to a lesser degree, inequalities (33), involve several ratios that can result in division by 0 -from the proof of Proposition 10, we see that this in fact the case in many scenarios.Therefore, while we did not observe any particular numerical difficulties when solving the convex relaxations (via interior point methods), in a small subset of the instances we observed that the branch-and-bound method (based on linear outer approximations) resulted in numerical issues leading to incorrect solutions.Table 5 reports the results on the two instances that exhibiting such pathological behavior.It shows, for each instance and method and different CPLEX settings, the bounds on the optimal solution obtained reported by CPLEX when solving the convex relaxation via interior point methods (barrier, corresponding to a lower bound), and lower and upper bounds reported by running the branch-and-bound algorithm for one hour.We do not scale the solutions obtained in Table 5.The tested settings are default CPLEX (def), default CPLEX with numerical emphasis enabled (+num), and CPLEX with numerical emphasis enabled and presolve and CPLEX cuts disabled (+num-pc).In the first instance shown in Table 5, when using Supermodular with the default CPLEX settings, the solution reported is worse than the optimal solution by 30%.By enabling the numerical emphasis option, the solution improves but is still 10% worse than the solution reported by Perspective.Nonetheless, if presolve and CPLEX cuts are disabled, then both solutions coincide.The second instance shown in Table 5 exhibits the opposite behavior: when used with the default settings, independently of the numerical emphasis, the solutions obtained by Perspective and Supermodular coincide; however, if presolve and CPLEX cuts are disabled, then the lower bound obtained after one hour of branchand-bound with the Supermodular method already precludes finding the correct solution.We point out that pathological behavior of conic quadratic branch-andbound solvers have been observed in the past for other nonlinear mixed-integer problems with a large number of variables, see for example [6,13,26,30].

CONCLUSIONS
In this paper we describe the convex hull of the epigraph of a rank-one quadratic functions with indicator variables.In order to do so, we first describe this convex hull of a underlying supermodular set function in a lower-dimensional space, and then maximally lift the resulting facets into nonlinear inequalities in the original space of variable.The approach is broadly applicable, as most of the existing results concerning convexifications of convex quadratic functions with indicator variables can be obtained in this way, as well as several well-known classes of facet-defining inequalities for mixed-integer linear problems.

FIGURE 2 .
FIGURE 2. Depiction of B + and B − in a two-dimensional example with N + = {1} and N − = {2}.The upper right shaded region (triangle) corresponds to the region where g α (x) = −∞; the lower left shaded region (square) corresponds to the region discarded, as optimal solutions of (6) can be found in either B + or B − .

6. 1 .
Instances.We consider optimization problems of the form min y,x y (FF )y + n d: Let δ be a diagonal dominance parameter.Define v = (1/n) ∑ n i=1 (FF ) ii to be the average diagonal element of FF ; thend 2 i ∼ U[0, vδ ]. b: We generate entries b i ∼ U[0.25, 0.75] × (FF ) ii + d 2i .Note that if the terms b i and ((FF ) ii + d 2 i ) are interpreted as the expectation and variance of a random variable, then expectations are approximately proportional to the standard deviations.This relation aims to avoid trivial instances, where one term dominates the other.a: Let α be a fixed cost parameter and a i = α(e b)/n, i ∈ N, where e is an n-dimensional vector of ones.

TABLE 5 .
Examples of pathological behavior in branch-andbound.