Abstract
We study the minimization of a rank-one quadratic with indicators and show that the underlying set function obtained by projecting out the continuous variables is supermodular. Although supermodular minimization is, in general, difficult, the specific set function for the rank-one quadratic can be minimized in linear time. We show that the convex hull of the epigraph of the quadratic can be obtained from inequalities for the underlying supermodular set function by lifting them into nonlinear inequalities in the original space of variables. Explicit forms of the convex-hull description are given, both in the original space of variables and in an extended formulation via conic quadratic-representable inequalities, along with a polynomial separation algorithm. Computational experiments indicate that the lifted supermodular inequalities in conic quadratic form are quite effective in reducing the integrality gap for quadratic optimization with indicators.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Consider the convex quadratic optimization problem with indicators
where \(a, b \in \mathbb {R}^n\) and \(Q\in \mathbb {R}^{n\times n}\) is a symmetric positive semi-definite matrix. For each \(i=1, \ldots , n\), the binary variable \(x_i\), along with the complementarity constraint \(y_i (1-x_i)=0\), indicates whether \(y_i\) may take positive values. Problem (1) arises in numerous practical applications, including portfolio optimization [16], signal/image denoising [13, 14], best subset selection [15, 20, 34], and unit commitment [25].
Constructing strong convex relaxations for non-convex optimization problems is critical in devising effective solution approaches for them. Natural convex relaxations of (1), where the complementarity constraints \(y_i(1-x_i)=0\) are linearized using the so-called “big-M" constraints \(y_i\le Mx_i\), are known to be weak [40, e.g.,]. Therefore, there is an increasing effort in the literature to better understand and describe the epigraph of quadratic functions with indicator variables. Dong and Linderoth [21] describe lifted linear inequalities for (1) from its continuous quadratic optimization counterpart over bounded variables. Bienstock and Michalka [17] give a characterization of linear inequalities obtained by strengthening gradient inequalities of a convex objective function over a non-convex set.
The majority of the work toward constructing strong relaxations of (1) is based on the perspective reformulation [2, 18, 23, 32, 35, 39, 55, 57]. The perspective reformulation, which may be seen as a consequence of the convexifications based on disjunctive programming derived in [19], is based on strengthening the epigraph of a univariate convex quadratic function \(y_i^2\le t\) by using its perspective \(y_i^2/x_i\le t\). The perspective strengthening can be applied to a general convex quadratic \(y'Qy\), by writing it as \(y' (Q-D) y + y'Dy\) for a diagonal matrix \(D \succ 0\) and \(Q-D\succeq 0\), and simply reformulating each separable quadratic term \(D_{ii}y_i^2\) as \(D_{ii}y_i^2/x_i\) [22, 24, 61]. While this approach is effective when Q is strongly diagonal dominant, it is ineffective otherwise, or inapplicable when Q is not full-rank as no such D exists.
To address the limitations of the perspective reformulation, a recent stream of research focuses on constructing strong relaxations of the epigraphs of simple but multi-variable quadratic functions. Jeon et al. [36] use linear lifting to construct valid inequalities for the epigraphs of two-variable quadratic functions. Frangioni et al. [26] use extended formulations based on disjunctive programming to derive stronger relaxations of the epigraph of two-variable functions. They study heuristics and semi-definite programming (SDP) approaches to extract from Q such two-variable terms. The disjunctive approach results in a substantial increase in the size of the formulations, which limits its use to small instances. Atamtürk and Gómez [6] describe the convex hull of the epigraph of the two-variable quadratic function \((y_1-y_2)^2\le t\) in the original space of variables, and Atamtürk et al. [13] generalize this result to convex two-variable quadratic functions \(a_1y_1^2-2y_1y_2+a_2y_2^2\le t\) and show how to optimally decompose an M-matrix (psd with non-positive off-diagonals) Q into such two-variable terms; their numerical results indicate that such formulations considerably improve the convex relaxations when Q is an M-matrix, but the relaxation quality degrades when Q has positive off-diagonal entries. Han et al. [33] give SDP formulations for (1) based on convex-hull descriptions of the 2x2 case. These SDP formulations require \(O(n^2)\) additional variables and constraints, which may not scale to large problems. Wei et al. [53] give an extended formulation via a single SDP constraint and linear inequalities. Atamtürk and Gómez [7] give the convex hull description of a rank-one function with free continuous variables, and propose an SDP formulation to tackle quadratic optimization problems with free variables arising in sparse regression. Wei et al. [51, 52] extend those results, deriving ideal formulations for rank-one functions with arbitrary constraints on the indicator variables x. These formulations are shown to be effective in sparse regression problems; however as they do not account for the non-negativity constraints on the continuous variables, they are weak for (1). The rank-one quadratic set studied in this paper addresses this gap and properly generalizes the perspective strengthening of a univariate quadratic to higher dimensions.
In the context of discrete optimization, submodularity/supermodularity plays a critical role in the design of algorithms [27, 31, 44] and in constructing convex relaxations to discrete problems [1, 5, 10, 42, 48, 56, 58,59,60]. Exploiting submodularity in settings involving continuous variables as well typically require specialized arguments, e.g., see [12, 37, 49]. A notable exception is Wolsey [54], presenting a systematic approach for exploiting submodularity in fixed-charge network problems. As submodularity arises in combinatorial optimization, where the convex hulls of the sets under study are polyhedral, there are few papers utilizing submodularity to describe non-polyhedral convex hulls [8], and those sets typically involve some degree of separability between continuous and discrete variables. In this paper, we show how to generalize the valid inequalities proposed in [54] to convexify non-polyhedral sets, where the continuous variables are linked with the binary variables via indicator constraints.
1.1 Contributions
Here, we study the mixed-integer epigraph of a rank-one quadratic function with indicator variables and non-negative continuous variables:
where \((N^+,N^-)\) is a partition of \(N := \{1, \ldots ,n\}\). Observe that any rank-one quadratic of the form \(\left( c'y\right) ^2\le t\) with \(c_i\ne 0\) for all \(i\in N\) can be written as in X by scaling the continuous variables. If all coefficients of c are of the same sign, then either \(N^+=\emptyset \) or \(N^-=\emptyset \), and X reduces to the simpler form
To the best of our knowledge, the convex hull structure of X or \(X_+\) has not been studied before. Interestingly, optimization of a linear function over X can be done in linear time (Sect. 4.2).
Our motivation for studying X stems from constructing strong convex relaxations for problem (1) by writing the convex quadratic \(y'Qy\) as a sum of rank-one quadratics. Especially in large-scale applications, it is effective to state Q as a sum of a low-rank matrix and a diagonal matrix. Specifically, suppose that \(Q=FF'+D\), where \(F\in \mathbb {R}^{n\times r}\) and \(D\in \mathbb {R}^{n\times n}\) is a (possibly equal to zero) nonnegative diagonal matrix. Such decompositions can be constructed in numerous ways, including singular-value decomposition, Cholesky decomposition, or via factor models. Letting \(F_j\) denote the j-th column of F, adding auxiliary variables \(t\in \mathbb {R}^r\), \(j=1,\dots ,r\), and using the perspective reformulation, problem (1) can be cast as
Formulation (2) arises naturally, for example, in portfolio risk minimization [16], where the covariance matrix Q is the sum of a low-rank factor covariance matrix and an idiosyncratic (diagonal) variance matrix. When the entries of the diagonal matrix D are small, the perspective reformulation is not effective in strengthening the formulation. However, noting that \((x,F_j\circ y,t_j)\in X\), where \((F_j\circ y)_i=F_{ij}y_i\), for each \(j=1,\ldots ,r\), one can employ strong relaxations based on the rank-one quadratic with indicators. Our approach for decomposing \(y'Qy\) into a sum of rank-one quadratics and utilizing strong relaxations of epigraphs of rank-one quadratics is analogous to employing cuts separately from individual rows of a constraint matrix \(Ax \le b\) in mixed-integer linear programming.
In this paper, we present a generic framework for obtaining valid inequalities for mixed-integer nonlinear optimization problems by exploiting supermodularity of the underlying set function. To do so, we project out the continuous variables and derive valid inequalities for the corresponding pure integer set and then lift these inequalities to the space of continuous variables as in Nguyen et al. [43], Richard and Tawarmalani [47]. It turns out that for the rank-one quadratic with indicators, the corresponding set function is supermodular and holds much of the structure of X. The lifted supermodular inequalities derived in this paper are nonlinear in both the continuous and discrete variables.
We show that this approach encompasses several previously known convexifications for quadratic optimization with indicator variables. Moreover, the well-known inequalities in the mixed-integer linear optimization literature given in [54], which include flow cover inequalities as a special case, can also be obtained via the lifted supermodular inequalities.
Finally, and more importantly, we show that the lifted supermodular inequalities and bound constraints are sufficient to describe \(\text {cl conv}(X)\). Such convex hull descriptions of high-dimensional nonlinear sets are rare in the literature. In particular, we give a characterization in the original space of variables. This description is defined by a piecewise valid function with exponentially many pieces; therefore, it cannot be used by the convex optimization solvers directly. To overcome this difficulty, we also give a conic quadratic representable description in an extended space, with exponentially many valid conic quadratic inequalities, along with a polynomial-time separation algorithm.
The rank-one quadratic sets X and \(X_+\) appear very similar to their relaxation
where the non-negativity constraints on the continuous variables \(y \ge 0\) are dropped. However, while only one additional inequality \(\frac{\left( \sum _{i\in N} y_i\right) ^2}{\sum _{i \in N} x_i} \le t\) is needed to describe \(\text {cl conv}(X_f)\) [7] , the convex hulls of X and \(X_+\) are substantially more complicated and rich. Indeed, \(\text {cl conv}(X_f)\) provides a weak relaxation for \(\text {cl conv}(X_+)\), as illustrated in the next example.
Example 1
Consider set \(X_+\) with \(n=3\). For the relaxation \(X_f\), the closure of the convex hull is described by \(0\le x \le 1\) and inequality \(t\ge \frac{(y_1+y_2+y_3)^2}{\min \{1,x_1+x_2+x_3\}}\). Figure 1a depicts this inequality as a function of \((x_1,y_1)\) for \(x_2=0.6\), \(x_3=0.3\), \(y_2=0.5\), and \(y_3=0.2\) (fixed). In Proposition 8, we give the function f describing \(\text {cl conv}(X_+)\). Figure 1b depicts f(x, y) (truncated at 5) as a function of \((x_1,y_1)\) when other variables are fixed as before.
We find that \(\text {cl conv}(X_f)\) is a very weak relaxation of \(\text {cl conv}(X_+)\) for low values of \(x_1\). For example, for \({x_1}=0.01\) and \({y_1}=1\), we find that \(\frac{(1+0.5+0.2)^2}{0.01+0.6+0.3}\approx 3.18\), whereas \(f(x,y)\approx 100.55\). The computation of f for this example is described after Proposition 8. \(\square \)
1.2 Outline
The rest of the paper is organized as follows. In Sect. 2 we review the valid inequalities for supermodular set functions and present the general form of the lifted supermodular inequalities. In Sect. 3 we re-derive known ideal formulations in the literature for quadratic optimization using the lifted supermodular inequalities. In Sect. 4 we show that the lifted supermodular inequalities are sufficient to describe the convex hull of X. In Sect. 5 we provide the explicit form of the lifted supermodular inequalities for X, both in the original space of variables and in a conic quadratic representable form in an extended space, and discuss the separation problem. In Sect. 6 we present computational results, and in Sect. 7 we conclude the paper.
1.3 Notation
For a set \(S\subseteq N\), define \(x_S\) as the indicator vector of S. By abusing notation, given a set function \(g:2^N\rightarrow \mathbb {R}\), we may equivalently write g(S) or \(g(x_S)\). To simplify the notation, given \(i\in N\) and \(S\subseteq N\), we write \(S\cup i\) instead of \(S\cup \{i\}\) and \(S{\setminus } i\) instead of \(S{\setminus }\{i\}\). For a set \(Y\subseteq \mathbb {R}^N\), let \(\text {conv}(Y)\) denote the convex hull of Y and \(\text {cl conv}\)(Y) denote its closure. We adopt the convention that \(a/0=\infty \) if \(a>0\) and \(a/0=0\) if \(a=0\). For a \(a\in \mathbb {R}\), let \(a_+=\max \{a,0\}\). For a vector \(c\in \mathbb {R}^N\) and a set \(S\subseteq N\), we let \(c(S)=\sum _{i\in S}c_i\), \(\max _c(S)=\max _{i\in S}c_i\) (by convention, \(\max _c(\emptyset )=0\)) and \(c_S\) be the subvector of c induced by S. For an optimization problem with variables x, an optimal solution is denoted by \(x^*\).
2 Preliminaries
In this section we cover a few preliminary results for the paper and, at the end, give the general form of the lifted supermodular inequalities (Theorem 1).
2.1 Supermodularity and valid inequalities
A set function \(g:2^N\rightarrow \mathbb {R}\) is supermodular if
where \(\rho (i,S)=g(S\cup i)-g(S)\) is the increment function.
Proposition 1
(Nemhauser et al. [42]) If g is a supermodular function, then
-
(1)
\(g(T)\ge g(S)+\sum \limits _{i\in T{\setminus } S}\rho (i,S)-\sum \limits _{i\in S{\setminus } T}\rho (i,N{\setminus } i )\) for all \(S,T\subseteq N\)
-
(2)
\(g(T)\ge g(S)+\sum \limits _{i\in T{\setminus } S}\rho (i,\emptyset )-\sum \limits _{i\in S{\setminus } T}\rho (i,S{\setminus } i )\) for all \(S,T\subseteq N\).
As a direct consequence of Proposition 1, one can construct valid inequalities for the epigraph of a supermodular function g, i.e.,
Specifically, for any \(S\subseteq N\), the linear supermodular inequalities [41]
are valid for Z.
2.2 Lifted supermodular inequalities
We now describe a family of lifted supermodular inequalities, using a lifting approach similar to the ones used in [28, 47]. Let \(h:\{0,1\}^N\times \mathbb {R}^N\rightarrow \mathbb {R}\cup \{\infty \}\) be a function defined over a mixed 0-1 domain and consider its epigraph
Observe that H allows for arbitrary constraints, which can be encoded via function h. For example, nonnegativity and complementarity constraints can be included by letting \(h(x,y)=\infty \) whenever \(y_i<0\) or \(y_i(1-x_i)\ne 0\) for some \(i\in N\).
For \(\alpha \in \mathbb {R}^N\), define the set function \(g_\alpha :\{0,1\}^N\rightarrow \mathbb {R}\cup \{\infty ,-\infty \}\) as
and let \(B\subseteq \mathbb {R}^N\) be the set of values of \(\alpha \) for which problem (4) is bounded for all \(x\in \{0,1\}^N\), i.e.,
Although supermodularity is defined for set functions only, we propose in Definition 1 below an extension for functions involving continuous variables as well.
Definition 1
Function h is supermodular if the set function \(g_\alpha \) defined in (4) is supermodular for all \(\alpha \in B\).
Remark 1
Suppose that h does not depend on the continuous variables y, i.e., \(h(x,y)=g(x)\). In this case problem (4) is unbounded unless \(\alpha =0\), i.e., \(B=\{0\}\), and we find that h(x, y) is supermodular if and only if \(g_0(x)=g(x)\) is supermodular. Thus, Definition 1 includes the usual definition of supermodularity for set functions as a special case. \(\square \)
Proposition 2
If function h is supermodular, then for any \(\alpha \in B\) and \(S\subseteq N\), the inequalities
are valid for H, where \(\rho _\alpha (i,S)=g_\alpha (S\cup i)-g_\alpha (S)\).
Proof
For any \(\alpha \in B\), \(S\subseteq N\), and \((x,y,t)\in H\), we find
where the first inequality follows directly from the definition of H, the second inequality follows by minimizing \(h(y)-\alpha 'y\) with respect to y, and the third inequality follows from the validity of (3a). Thus, by adding \(\alpha 'y\) on both sides, we find that inequality (5a) is valid. The validity of (5b) is proven identically. \(\square \)
Since inequalities (5) are valid for any \(\alpha \in B\), one can obtain stronger valid inequalities by optimally choosing vector \(\alpha \).
Theorem 1
(Lifted supermodular inequalities) If h is supermodular, then for any \(S\subseteq N\), the lifted supermodular inequalities
are valid for H.
Observe that while inequalities (5) are linear, inequalities (6) are nonlinear in x and y. Moreover, each inequality (6) is convex since it is defined as a supremum of linear inequalities. In addition, if the base supermodular inequalities (3) are strong for the convex hull of epi \(g_\alpha \), then the lifted supermodular inequalities (6) are strong for H as well, as formalized next. Given \(\alpha \in B\), define
Note that \(\text {conv}(G_\alpha )\) is a polyhedron. Theorem 2 below is a direct consequence of Theorem 1 in [47].
Theorem 2
([47]) If inequalities (3) and bound constraints \(0\le x\le 1\) describe \(\text {conv}(G_\alpha )\) for all \(\alpha \in B\), then the lifted supermodular inequalities (6) and bound constraints \(0\le x\le 1\) describe \(\text {cl conv}(H)\).
Although Definition 1 may appear to be too restrictive to arise in practice, we show in Sect. 2.3 that supermodular functions are in fact widespread in a class of well-studied problems in mixed-integer linear optimization. In Sect. 3 we show that several existing results for quadratic optimization with indicators can be obtained as lifted supermodular inequalities. Perhaps, more surprisingly, for the rank-one quadratic with indicators
we show in Sect. 4 that conditions in Definition 1 and Theorem 2 are satisfied as well.
2.3 Supermodular inequalities and fixed-charge networks
Given \(b\in \mathbb {R}\), \(u\in \mathbb {R}_+^N\), and a partition \(N=N^+\cup N^-\cup A^+\cup A^-\), define for all \(x\in \{0,1\}^N\) the fixed-charge network set
Wolsey [54] uses FC(x) to describe network structures arising in flow problems with fixed charges on the arcs: \(N^+\) denotes the incoming arcs into a given subgraph, \(N^-\) denotes the outgoing arcs, and whereas \(A^+\cup A^-\) denotes the internal arcs in the subgraph, and b represents the supply/demand of the subgraph. Finally, define
Proposition 3
([54])For any \(\alpha \in \mathbb {R}^N\), the function
is submodular.
It follows that the function \(g_\alpha (x)=-v_\alpha (x)=\min _{y\in \mathbb {R}_+^N}-\alpha 'y+h(x,y)\) is supermodular, and inequalities (5) and (6) are valid. Moreover, Wolsey [54] shows that the linear supermodular inequalities (5) with \(\alpha \in \{-1,0,1\}^N\) include as special cases well-known inequalities for mixed-integer linear optimization such as flow-cover inequalities [45, 50] and inequalities for capacitated lot-sizing [9, 46]; several other classes for fixed-charge network flow problems are special cases as well [4, 11, 12]. Therefore, the inequalities presented in this paper can be interpreted as nonlinear generalizations of the aforementioned inequalities.
3 Previous results as lifted supermodular inequalities
In order to illustrate the approach, in this section, we show how existing results for quadratic optimization with indicators can be derived using the lifted supermodular inequalities (6).
3.1 The single-variable case
Consider, first, the single-variable case
for which \(\text {cl conv}(X^1)\) is given by the perspective reformulation [2, 19, 23, 32]:
Note that \( \text {cl conv}(X^1) \subseteq {\mathbb {R}}^2\times ({\mathbb {R}}\cup \infty )\). We now derive the perspective reformulation as a special case, in fact, using a modular inequality. Note that \(g_\alpha (0)=0\) and \(g_\alpha (1) =\min _{y\in \mathbb {R}_+}-\alpha y+y^2 = -\frac{\alpha _+^2}{4}\) since \(y^*=\alpha /2\) if \(\alpha \ge 0\) and \(y^*=0\) otherwise. Thus, \(g_\alpha \) is a modular function for any \(\alpha \in \mathbb {R}^N\), and inequalities (3) reduce to
Then, we find that inequalities (6) reduce to the perspective of \(y^2\):
3.2 The rank-one case with free continuous variables
Consider the relaxation of X obtained by dropping the non-negativity constraints \(y \ge 0\):
Observe that any rank-one quadratic constraint of the form \(\left( \sum _{i\in N}c_iy_i\right) ^2\le t\) with \(c_i\ne 0\) can be transformed into the form given in \(X_f\) by scaling the continuous variables (so that \(|c_i|=1\)) and negating variables as \({\bar{y}}_i:=-y_i\) if \(c_i<0\). The closure of the convex hull of \(X_f\) is derived in [7], and the effectiveness of the resulting inequalities is demonstrated on sparse regression problems. We now re-derive the description of \(\text {cl conv}(X_f)\) using lifted supermodular inequalities.
For \(S \subseteq N\), we have
It is easy to see that \(g_\alpha (x_S)=-\infty \) unless \(\alpha _i=\alpha _j\) for all \(i\ne j\), see [7]. Therefore, letting \(\bar{\alpha }=\alpha _i\) for all \(i\in N\), we find that
where the optimal solution is found by setting \(y(S)=\bar{\alpha }/2\). The function \(g_\alpha \) is supermodular since \(\rho _{\bar{\alpha }}(i,\emptyset )=-\bar{\alpha }^2/4\) and \(\rho _{\bar{\alpha }}(i,S)=0\) for any \(S\ne \emptyset \).
Letting \(S=\{1\}\), inequality (6a) reduces to
Also letting \(S=\{1\}\), inequality (6b) reduces to
These two supermodular inequalities are indeed sufficient to describe \(\text {conv}(X_f)\) [7]. As we shall see in Sect. 4, incorporating the non-negativity constraints \(y \ge 0\), \(\text {conv}(X)\) is substantially more complex than \(\text {conv}(X_f)\). Nonetheless, as shown in Example 1, the resulting convexification is substantially stronger as well.
3.3 The rank-one case with a negative off-diagonal
Consider the special case of X with two continuous variables (\(N=\{1,2\}\)) with a negative off-diagonal:
Observe that any quadratic constraint of the form \(\left( c_1y_1-c_2y_2\right) ^2\le t\) with \(c_1, c_2>0\) can be written as in \(X_-^2\) by scaling the continuous variables.
For \(\alpha \in \mathbb {R}^2\), observe that if \(\alpha _1 + \alpha _2 > 0\),
is unbounded. Otherwise,
In particular, \(g_\alpha \) is supermodular (and in fact modular) for any fixed \(\alpha \) such that \(\alpha _1+\alpha _2{\le } 0\): for any \(i=1,2\) and \(S\subseteq N{\setminus } i\), \(\rho _\alpha (i,S)=-\frac{\max \{0,\alpha _i\}^2}{4}\). Letting \(S=\emptyset \), inequality (6a) reduces to
An optimal solution of (7) can be found as follows. If \(y_1\ge y_2\), then set \(\alpha _1>0\) and \(\alpha _2=-\alpha _1 < 0\). Moreover, in this case, the optimal value is given by
The case \(y_2\ge y_1\) is identical. The resulting piecewise valid inequality
along with the bound constraints \(0\le x\le 1\), \(0\le y\) describe \(\text {cl conv}(X_-^2)\) [6]. We point that a conic quadratic representation for \(\text {cl conv}(X_-^2)\) and generalizations to (not necessarily rank-one) quadratic functions with negative off-diagonals are given in [13].
3.4 Outlier detection with temporal data
In the context of outlier detection with temporal data, Gómez [30] studies the set
where \(a_1,a_2>0\) are constants. While we refer the reader to [30] for details on the derivation of \(\text {cl conv}(X_T)\), we point out that it can in fact be described by lifted supermodular inequalities. Indeed, in this case, function \(g_\alpha \) is given by
where \(K_1(\alpha )\) and \(K_2(\alpha )\) are constants that do not depend on x and \(K_2(\alpha )\ge 0\). Since \(\max \{x_1,x_2\}\) is a submodular function, it follows that \(g_\alpha \) is supermodular.
4 Convex hull via lifted supermodular inequalities
We now turn our attention to the rank-one sets X and \(X_+\). This section is devoted to showing that the lifted supermodular inequalities (6) are sufficient to describe \(\text {cl conv}(X)\) and \(\text {cl conv}(X_+)\). By Theorem 2, it suffices to derive an explicit form of the projection function \(g_\alpha \) and show that inequalities (3) describe the convex hull of its epigraph \(G_\alpha \). The rest of this section is organized as follows. In Sect. 4.1 we derive the set function \(g_\alpha \) defined in (4) for the rank-one quadratic function and then show that it is supermodular. In Sect. 4.2 we describe the convex hull of \(G_\alpha \) using only a small subset of the supermodular inequalities (3).
4.1 The set function \(g_\alpha \)
We present the derivation of set function \(g_\alpha \) for \(X_+\) and X separately, and then verify that \(g_\alpha \) is indeed supermodular.
4.1.1 Derivation for \(X_+\)
For \(X_+\),
Therefore, for \(S \subseteq N\),
Note that (9) is bounded for all \(\alpha \in \mathbb {R}^S\), thus \(B=\mathbb {R}^N\). Since, for \(\alpha _i<0\), \(y_i=0\) in any optimal solution, we assume for simplicity that \(\alpha \ge 0\) and \(B=\mathbb {R}_+^N\). From the KKT conditions corresponding to variable \(y_k\ge 0\) in (9), we find that
and, by complementary slackness, (10) holds at equality whenever \(y_k>0\). Moreover, let \(j\in S\) such that \(\alpha _j=\max _\alpha (S)\); setting \(y_j=\alpha _j/2\) and \(y_i=0\) for \(i\in S{\setminus } j\), we find a feasible solution for (9) that satisfies all dual feasibility conditions (10) and complementary slackness, and therefore is optimal for the convex optimization problem (9). Thus, we conclude that
4.1.2 Derivation for X
For the general case of X,
Therefore, for \(S \subseteq N^+ \cup N^-\),
If \(S\cap N^-=\emptyset \) or \(S\cap N^+=\emptyset \), then we find from Sect. 4.1.1 that \(g_\alpha (x_S)= -\max _\alpha (S)^2/4\). Now let \(S^+:=S\cap N^+\) and \(S^-:=S\cap N^-\), and assume \(S^+\ne \emptyset \) and \(S^-\ne \emptyset \). We first state conditions under which (11) is bounded, and then we provide the explicit description of \(g_\alpha \).
Lemma 1
Problem (11) is bounded if and only if
Proof
Let \(p = {{\,\mathrm{arg\,max}\,}}_{i \in S^+} \alpha _i\) and \(q = {{\,\mathrm{arg\,max}\,}}_{i \in S^-} \alpha _i\). If \(\alpha _p + \alpha _q > 0\) , then \(e_p + e_q\) is an unbounded direction. Otherwise,
where the second inequality follows from \(\alpha _p + \alpha _q \le 0\). \(\square \)
Note that we may equivalently rewrite (12) as \(\alpha _i+\alpha _j\le 0,\text { for all } i\in S^+,\; j\in S^-\), and in particular,
Proposition 4
Function \(g_\alpha \) is given by
Proof
If \(\alpha \le 0\), then \(g_\alpha (x_S)\ge 0\) and the lower bound can be obtained by setting \(y=0\). We now assume \(y\not \le 0\). Note that for (12) to hold, if there exists \(j\in S^-\) such that \(\alpha _j\ge 0\), then \(\alpha _i\le 0\) for all \(i\in S^+\), and vice versa. Therefore, either \(\alpha _i\le 0\) for all \(i\in S^+\) or \(\alpha _j\le 0\) for all \(j\in S^-\).
First, assume that \(\alpha _j\le 0\) for all \(j\in S^-\). In this case, there exists an optimal solution of (11) where \(y(S^-)=0\) and (11) reduces to (9). Then, we may assume that \(\alpha _i\ge 0\) for all \(i\in S^+\) as in Sect. 4.1.1, and arrive at
By symmetry, if \(\alpha _i\le 0\) for all \(i\in S^+\), we may assume that \(\alpha _j\ge 0\) for all \(i\in S^-\) and
\(\square \)
Observe that if \(\alpha _i\le 0\) for all \(i\in S^-\) and there exists \(\alpha _j\in S^+\) such that \(\alpha _j<0\), then setting \(\alpha _j=0\) does not change the function \(g_\alpha \). Thus we can assume without loss of generality in optimization problem (6) that
It is convenient to partition B into two sets so that \(B=B^+\cup B^-\), where
and analyze the inequalities separately for each set. Figure 2 depicts regions \(B^+\) and \(B^-\) for a two-dimensional case.
Therefore, instead of studying inequalities (6) directly, one can equivalently study their relaxation where either \(\alpha \in B^+\) or \(\alpha \in B^-\); consequently, each inequality (6) corresponds to (the maximum of) two simpler inequalities. Since the sets \(B^+\) and \(B^-\) are symmetric, and inequalities (6) corresponding to \(\alpha \in B^-\) are simply inequalities where the role of \(N^+\) and \(N^-\) is interchanged (and \(\alpha \in B^+\)), the analysis and derivation of the inequalities is simplified. Therefore, in the sequel, we will derive the inequalities for \(\alpha \in B^+\) only and then state the inequalities corresponding to \(B^-\) by interchanging \(N^+\) and \(N^-\).
4.1.3 Supermodularity
For \(\alpha \in B^+\), the set function \(g_\alpha (x)\) for X is monotone non-increasing, also it is supermodular as \(\max _\alpha (S^+)\) is submodular. The case for \(\alpha \in B^-\) is analogous.
4.2 Convex hull of epi \(g_\alpha \)
In this section we show that a small subset of the supermodular inequalities (3a) are sufficient to describe the convex hull of the epigraph of the set function \(g_\alpha \), i.e.,
where \(\alpha \ge 0\) – observe that since x is binary, \(\left( \max _{i\in N}\{\alpha _ix_i\}\right) ^2=\max _{i\in N}\{\alpha _i^2x_i\}\).
Given nonempty \( S\subseteq N\), \(\ell \in {{\,\mathrm{arg\,max}\,}}_{i\in S}\{\alpha _i\}\), \(k\in {{\,\mathrm{arg\,max}\,}}_{i\in N{\setminus } \ell }\{\alpha _i\}\), and \(T=\left\{ i\in N{\setminus } S:\alpha _i>\alpha _\ell \right\} \); observe that \(T=\emptyset \) if and only if \(\alpha _\ell \ge \alpha _k\). Then, valid inequalities (3a) for \(G_\alpha \) reduce to
If \(S=\emptyset \), then valid inequalities (3a) reduce to
Remark 2
Observe that if \(\alpha _\ell \ge \alpha _k\), then the inequality
can also be obtained by setting \(S=N{\setminus } \ell \) (or by choosing any \(S\subseteq N{\setminus } \ell \) such that \(k\in S\)). Therefore, when considering inequalities (14), we can assume without loss of generality that there exists \(k\in {{\,\mathrm{arg\,max}\,}}_{i\in N}\{\alpha _i\}\) such that \(k\not \in S\) and, thus, the case \(\alpha _\ell \ge \alpha _k\) can be ignored. \(\square \)
Remark 3
Suppose that the variables are indexed such that \(\alpha _1\le \cdots \le \alpha _n\), let \(\alpha _0=0\), and let \(\ell = \max _{i\in S}\{i \}\) if \(S\ne \emptyset \) and \(\ell =0\) otherwise. Observe that we can assume without loss of generality that \(i\in S\) for all \(i\le \ell \), since inequalities (14) are the same whether \(i\in S\) or not. Therefore, it follows that there are only n inequalities (14) given by
\(\square \)
We now show that inequalities (14) characterize the convex hull of \(G_\alpha \).
Proposition 5
Inequalities (14) and bound constraints describe \(\text {conv}(G_\alpha )\).
Proof
Let \((x,t)\in [0,1]^N\times \mathbb {R}\). By definition, \((x,t)\in \text {conv}(G_\alpha )\) if and only
where constraints (16b) can be restated as \(x=\sum _{i\in S}\lambda _Sx_S\). From linear programming duality, we find the equivalent condition
Any feasible solution \((\mu ,\gamma )\) of (17) yields a valid inequality for \(\text {conv}(G_\alpha )\). Moreover, characterizing the optimal solutions of (17) (for all \(x\in [0,1]^N\)) results in the convex hull description of \(G_\alpha \).
Suppose, without loss of generality, that \(\alpha _1\le \ldots \le \alpha _n\), let \(\alpha _0=0\), and let \(\ell \in \{0,\ldots ,n-1\}\) be the smallest index such that \(\sum _{i=\ell +1}^nx_i\le 1\); thus, if \(\ell >0\), then \(\sum _{i=\ell }^nx_i>1\). We claim that the dual solution given by \(\hat{\gamma }=-\frac{\alpha _\ell ^2}{4}\), \(\hat{\mu }_i=0\) for \(i\le \ell \) and \(\hat{\mu }_i=-\frac{\alpha _i^2-\alpha _\ell ^2}{4}\) for \(i>\ell \) is optimal for (17).
First, we verify that \((\hat{\mu },\hat{\gamma })\) is feasible for (17). Observe that for any \(S\subseteq \{1,\ldots ,\ell \}\), constraint (17b) reduces to \(-\frac{\alpha _\ell ^2}{4}\le -\frac{\max _\alpha (S)^2}{4}\), which is indeed satisfied. For any S such that the maximum element \(j>\ell \), we find that (17b) reduces to \(\sum _{i\in S:i\ne j}\hat{\mu }_i\le 0\); since \(\hat{\mu }\le 0\), the constraint is satisfied. For \(S=\emptyset \), constraint (17b) reduces to \(\gamma \le 0\), which is satisfied. To verify complementary slackness (later), note that constraints (17b) corresponding to sets (a) \(S = T \cup \{j\}\), where \(T \subseteq \{1, \ldots , \ell \}\) and \(j > \ell \) (i.e., containing exactly one element greater than \(\ell \)), and (b) \(S = T \cup \{\ell \}\), where \(T \subseteq \{1, \ldots , \ell -1\}\) (i.e., containing \(\ell \) but no greater element) are satisfied at equality.
Finally, for \((\hat{\mu },\hat{\gamma })\), the objective function (17a) is of the form (15):
To verify that \((\hat{\mu }, \hat{\gamma })\) is optimal for (17), we construct a primal solution \(\hat{\lambda }\) feasible for (16) satisfying complementary slackness. The greedy algorithm for constructing \(\hat{\lambda }\) is presented in Algorithm 1 and illustrated with an example in Fig. 3.
We now check that constraint (16c) is satisfied. At the end of the algorithm, \(\sum _{S\subseteq N}\hat{\lambda }_S=\Lambda \) (since variable \(\Lambda \) is updated each time \(\hat{\lambda }\) is updated). Moreover, at the end of the first cycle (line 13) we have \(\Lambda =\sum _{i=\ell +1}^n x_i\). If \(\ell =0\), then \(\Lambda =1\) trivially (line 16); otherwise, at the end of the second cycle (line 22) and additional value of \(\hat{x}_\ell =1-\sum _{i=\ell +1}^nx_i\) (line 18) is added to \(\Lambda \). Hence, at the end of the algorithm
Next, we verify that constraints (16b) are satisfied. For \(i\in \{1,\ldots ,\ell -1\}\), at any point in the algorithm, we have that \(\sum _{S\subseteq N: i\in S}\lambda _S=x_i-\hat{x}_i\). Since, at any point, \(\hat{x}_i=\left( x_i-\Lambda \right) _+\) and \(\Lambda =1\) at the end of the algorithm, it follows that \(\sum _{S\subseteq N: i\in S}\lambda _S=x_i\). For \(i\in \{\ell +1,\ldots ,n\}\) we also have that \(\sum _{S\subseteq N: i\in S}\lambda _S=x_i-\hat{x}_i\), and \(\hat{x}_i=0\) at the end (line 13). Finally, for \(i=\ell >0\), we have that
Finally, to check that \(\hat{\lambda }\) satisfies complementary slackness, it suffices to observe that all updates of \(\hat{\lambda }\) correspond to sets S such that exactly one element of S is greater than \(\ell \) (line 10), or to sets S with no element greater than \(\ell \) and where \(\ell \in S\) (line 20), where the corresponding dual constraints are satisfied at equality.
Therefore, we conclude that \(\hat{\lambda }\) and \((\hat{\mu },\hat{\gamma })\) are an optimal primal-dual pair. Since problem (17) admits for any \(x\in [0,1]\) an optimal solution of the form (15), it follows that those inequalities and bound constraints describe \(\text {conv}(G_\alpha )\). \(\square \)
Finally, we obtain the main result of this section: that the (nonlinear) lifted supermodular inequalities
are sufficient to describe the closure of the convex hull of X.
Proposition 6
Lifted supermodular inequalities (18)–(19) and the bound constraints \(0\le x\le 1\), \(y \ge 0\) describe \(\text {cl conv}(X)\).
Proof
Follows immediately from Proposition 5 and Theorem 2. \(\square \)
Remark 4
We end this section with the remark that optimization of a linear function over X can be done easily using the projection function \(g_\alpha \). Consider
Projecting out the continuous variables using \(g_\alpha \), the problem reduces to
Assume without loss of generality that \(\beta \ge 0\) (otherwise, set \(x_i=1\) whenever \(\beta _i<0\)). Then an optimal solution of (20) corresponds to either setting \(x=0\), or setting a single variable \(x_i=1\) where \(i\in {{\,\mathrm{arg\,max}\,}}_{i\in N}\beta _i {+ \alpha _i/4}\). Identifying such an index can be done in O(n). \(\square \)
5 Explicit form of the lifted supermodular inequalities
In this section we derive explicit forms of the lifted supermodular inequalities (18)–(19). In Sect. 5.1 we describe the inequalities in the original space of variables, and describe how to solve the separation problem. In Sect. 5.2 we provide conic quadratic representable inequalities in an extended space, which can then be implemented with off-the-shelf conic solvers.
5.1 Inequalities and separation in the original space of variables
5.1.1 Lifted inequalities for X
We first present the inequalities for the more general set X. Finding a closed form expression for the lifted supermodular inequalities (18) for all \(S^+ \subseteq N^+\) amounts to solving the maximum lifting problem
We now give a closed form expression for (21). Let \(m=|N^+|\), and given \((\bar{x},\bar{y})\in [0,1]^N\times \mathbb {R}_+^N\), index variables in \(N^+\) so that \(\bar{y}_{(1)}/ \bar{x}_{(1)}\le \bar{y}_{(2)}/ \bar{x}_{(2)}\le \cdots \le \bar{y}_{(m)}/\bar{x}_{(m)}\).
Proposition 7
Given \(({\bar{x}},{\bar{y}},{\bar{t}})\in [0,1]^N\times \mathbb {R}_+^N{\times \mathbb {R}}\), if there exist indexes \(0\le \kappa _1<\kappa _2\le m+1\) such that the (possibly empty) sets \(L=\left\{ (i)\in N^+: i\le \kappa _1\right\} \) and \(U=\left\{ (i)\in N^+: i\ge \kappa _2\right\} \) satisfy
then inequality (21) is satisfied if and only if
otherwise, inequality (21) is satisfied if and only if \({\bar{t}}\ge \big ({\bar{y}}(N^+)-{\bar{y}}(N^-)\big )^2\).
Below we state two remarks on Proposition 7, and then we prove the result.
Remark 5
Inequalities (23), when sets L and U are fixed, are neither valid for \(\text {cl conv}(X)\) nor convex for all \((x,y)\in [0,1]^N\times \mathbb {R}_+^N\). Indeed, if condition (22a) is not satisfied, then (23) may not be convex. Moreover, suppose that \(L=\{j\}\) and \(U=\{k\}\) for some \(j,k\in S^+\): note that setting \(x_i=y_i=0\) for all \(i\in N{\setminus }\{j,k\}\), \(x_j=x_k=1\), \(y_j,y_k>0\), and \(t=(y_j+y_k)^2\) is feasible for X, but this point is cut off by inequality (23) since \(\frac{y(L)^2}{1-x(N^+{\setminus } L)}=\frac{y_j^2}{1-x_k}=\infty \).
In fact, if \((x,y,t)\in \text {cl conv}(X)\), then (23) holds only when conditions (22a), (22b), (22d), (22e), and (22g) are satisfied. Conditions (22c) and (22f) do not affect the validity of (23) but if they are not satisfied then (23) is weak, i.e., a stronger inequality can be obtained from another choice of L and U. \(\square \)
Remark 6
If \({\bar{y}}(N^+)<{\bar{y}}(N^-)\), then condition (22d) in Proposition 7 cannot be satisfied. However, in this case, the role of \(N^+\) and \(N^-\) can be interchanged to satisfy (22d); interchanging \(N^+\) and \(N^-\) is equivalent to letting \(\alpha \in B^-\). \(\square \)
Proof of Proposition 7
Let us define auxiliary variables \(\beta ,\gamma \in \mathbb {R}\) as \(\beta =\max _\alpha (N^-)\) and \(\gamma =\max _\alpha (S^+)\), respectively. Then, inequality (21) reduces to
where constraints (24b) and (24c) enforce the definitions of \(\gamma \) and \(\beta \), and constraints (24d) and (24e) enforce that \(\alpha \in B^+\).
First, observe that there exists an optimal solution of (24) with \(\gamma \le \alpha _i\) for all \(i\in N^+\): if \(\alpha _i<\gamma \) for some \(i\in N^+\), then setting \(\alpha _i=\gamma \) results in a feasible solution with improved objective value. Therefore, the value of \(S^+\) is completely determined by \(\gamma \) since \(S^+=\left\{ i\in N^+: \alpha _i\le \gamma \right\} \). Also note that \(\alpha _i=\beta \) for all \(i\in N^-\): if \(\alpha _i<\beta \) for some \(i\in N^-\), then setting \(\alpha _i=\beta \) results in an improved (or identical) objective value. We now consider two cases:
Case 1 Suppose in an optimal solution of (24) we have \(\gamma =-\beta \), which implies that \(\alpha _i=\gamma \) for all \(i\in N^+\) and \(\alpha _i=-\gamma \) for all \(i\in N^-\). In this case, (24) simplifies to \( {\bar{t}}\ge \max _{\gamma \in \mathbb {R}_+} \gamma \big ({\bar{y}}(N^+)-{\bar{y}}(N^-)\big )-\frac{\gamma ^2}{4}, \) which, after optimizing for \(\gamma \), further reduces to the original rank-one quadratic inequality \({\bar{t}}\ge \big ({\bar{y}}(N^+)-{\bar{y}}(N^-)\big )^2.\)
Case 2 Now suppose \(\gamma <-\beta \) in an optimal solution. Let \(L=\left\{ i\in N^+:\alpha _i=\gamma \right\} \) and \(U=\{i\in N^+:\alpha _i=-\beta \}\). Then, from the discussion above, (24) reduces to
Observe that for \((L,U,\gamma )\) to correspond to an optimal solution, we must have \(1-{\bar{x}}(N^+{\setminus } L)\ge 0\) (otherwise, \(\gamma \) can be increased to another \(\alpha _i\) while improving the objective value) and \({\bar{y}}(U)-{\bar{y}}(N^-)\ge 0\) (otherwise, \(-\beta \) can be decreased to another \(\alpha _i\) while improving the objective value). When both conditions are satisfied, from first-order conditions we see that \(\alpha _i=2{\bar{y}_i}/{\bar{x}_i}\) for \(i\in N^+{\setminus } (L\cup U)\), \(\gamma = 2 {\bar{y}}(L)/\big (1-{\bar{x}}(N^+{\setminus } L)\big )\) and \(\beta =-2\big ({\bar{y}}(U)-{\bar{y}}(N^-)\big )/{\bar{x}}(U)\), and (25) simplifies to (23). The constraints \(\gamma <\alpha _i\) are satisfied for all \(i\in N^+{\setminus } (L\cup U)\) if and only if (22b) hold, constraints \(\alpha _i\le -\beta \) are satisfied for all \(i\in N^+{\setminus } (L\cup U)\) if and only if (22e) hold, and constraint \(\alpha <-\beta \), which may not be implied if \(N^+{\setminus } (L\cup U)=\emptyset \), is satisfied if and only if (22g) holds.
Finally, we verify that first order conditions are satisfied for \(j\in L\), this is, setting \(\alpha _j>\gamma \) results in a worse solution. If condition (22c)
does not hold for some \(j\in L\), then increasing \(\alpha _j\) from \(\gamma =2\frac{{\bar{y}}(L)}{1-{\bar{x}}(N^+{\setminus } L)}\) to \(2{\bar{y}_j}/{\bar{x}_j}\) improves the objective value. Similarly, we verify that first order conditions for \(j\in U\): if condition (22f)
does not hold for some \(j\in U\), then \(\alpha _j\) can be decreased from \(\beta = \frac{{\bar{y}}(U)-{\bar{y}}(N^-)}{{\bar{x}}(U)}\) to improve the objective value.
Note that conditions (22b) and (22c) together imply that \(\bar{y}_i/\bar{x}_i< \bar{y}_j/\bar{x}_j\) whenever \(i\in L\) and \(j\not \in L\); in other words, if \(L\ne \emptyset \), then \(L=\left\{ (1),(2),\dots ,(\kappa _1)\right\} \) for some \(1\le \kappa _1\le m\). Similarly, from conditions (22e) and (22f), we conclude that either \(U=\emptyset \) or \(U=\left\{ (\kappa _2),(\kappa _2+1),\dots ,(m)\right\} \) for some \(1\le \kappa _2\le m\). \(\square \)
5.1.2 Lifted inequalities for \(X_+\)
We now present the inequalities for \(X_+\), which can be interpreted as a special cases of the inequalities for X given in Sect. 5.1.1. Recall that for set \(X_+\), the set B used in (6a) is simply \(B=\mathbb {R}^N\) (we can assume \(B=\mathbb {R}_+^N\) without loss of generality) and a closed form expression for (6a) requires solving the lifting problem
Note that in the proof of Proposition 7, set U corresponds to the set of variables in \(N^+\) where constraint \(\alpha _i\le -\max _{\alpha }(N^-)\) is tight in an optimal solution of (24). Intuitively, set \(X_+\) can be interpreted as a special case of X where \(N^+=N\) and \(N^-=\emptyset \), and such constraints can be dropped from the lifting problem. Therefore, we may assume \(U=\emptyset \) in Proposition 7. Proposition 8 formalizes this intuition; note however that it is slightly stronger as, unlike Proposition 7, it guarantees the existence of a set satisfying the conditions of the proposition. Similar to Proposition 7, index variables in N so that \(\bar{y}_{(1)}/ \bar{x}_{(1)}\le \bar{y}_{(2)}/ \bar{x}_{(2)}\le \cdots \le \bar{y}_{(n)}/\bar{x}_{(n)}\).
Proposition 8
Given \(({\bar{x}},{\bar{y}},{\bar{t}})\in [0,1]^N\times \mathbb {R}_+^N{\times \mathbb {R}}\), there exists an index \(0\le \kappa \le n\) such that the (possibly empty) set \(L=\left\{ (i)\in N: i\le \kappa \right\} \) satisfies
and inequality (26) is satisfied if and only if
The proof of Proposition 8 is given in “Appendix A”.
Example 2
(cont) Consider \(X_+\) with \(n=3\), and assume \(x_2=0.6\), \(x_3=0.3\), \(y_2=0.5\) and \(y_3=0.2\). Note that \(y_2/x_2\approx 0.83>0.67\approx y_3/x_3\). We now compute the minimum values t such \((x,y,t)\in \text {cl conv}(X_+)\), for different values of \((x_1,y_1)\).
-
Let \((x_1,y_1)=(0.01,1)\) and \(y_1/x_1=100\). Then \(L=\emptyset \) satisfies all conditions (22): \(x(N)=0.91<1\), conditions (27b) are trivially satisfied since \(y(\emptyset )=0\), and conditions (27c) are void. In this case, we find that \((x,y,t)\in \text {cl conv}(X_+)\) iff \(t\ge 1^2/0.01+0.5^2/0.6+0.2^2/0.3\approx 100.55\). In contrast, \((x,y,t)\in \text {cl conv}(X_f)\) iff \(t\ge \left( 0.01+0.5+0.2\right) ^2/0.91\approx 3.18\).
-
Let \((x_1,y_1)=(0.1,0.5)\) and \(y_1/x_1=5\). Then \(L=\{3\}\) satisfies all conditions (22): \(x_1+x_2=0.7<1\), \(0.2/0.3\approx 0.67<y_2/x_2\) and \(0.2/0.3\approx 0.67= y_3/x_3\). In this case, \((x,y,t)\in \text {cl conv}(X_+)\) iff \(t\ge 0.2^2/0.3+0.5^2/0.1+0.5^2/0.6\approx 3.05\). In contrast, \((x,y,t)\in \text {cl conv}(X_f)\) iff \(t\ge \left( 0.5+0.5+0.2\right) ^2/1= 1.44\).
-
Let \((x_1,y_1)=(0.4,0.1)\) and \(y_1/x_1=0.25\). Then \(L=\{1,3\}\) satisfies all conditions (22): \(x_2=0.6<1\), \((0.1+0.2)/0.4=0.75<y_2/x_2\) and \((0.1+0.2)/0.4=0.75\ge y_3/x_3\). In this case, \((x,y,t)\in \text {cl conv}(X_+)\) iff \(t\ge (0.1+0.2)^2/0.4+0.5^2/0.6\approx 0.642\). In contrast, \((x,y,t)\in \text {cl conv}(X_f)\) iff \(t\ge \left( 0.1+0.5+0.2\right) ^2= 0.640\).
-
Let \((x_1,y_1)=(0.5,0.2)\) and \(y_1/x_1=0.4\). Then \(L=\{1,2,3\}\) satisfies all conditions (22): (27a) is trivially satisfied, (27b) is void and \((0.2+0.5+0.2)/1=0.9\ge y_2/x_2\). In this case, \((x,y,t)\in \text {cl conv}(X_+)\) iff \(t\ge (0.2+0.5+0.2)^2= 0.81\), which coincides with \(\text {cl conv}(X_f)\) and the natural inequality \(t\ge y(N)^2\).
Figure 1 plots the minimum values of t as a function of \((x_1,y_1)\) for \(\text {cl conv}(X_f)\) and \(\text {cl conv}(X_+)\). \(\square \)
5.1.3 Separation
We now consider the separation problem for inequalities (21) and (26), i.e., given a point \((\bar{x},\bar{y}, {\bar{t}})\in [0,1]^N\times \mathbb {R}_+^N{\times \mathbb {R}}\), finding sets \(L,U\subseteq N^+\) satisfying the conditions in Proposition 7 or finding \(L\subseteq N\) satisfying the conditions in Proposition 8, respectively.
Separation for (21) First, as pointed out in Remark 6, we verify whether \(\bar{y}(N^+)\ge \bar{y}(N^-)\) or \(\bar{y}(N^+)<\bar{y}(N^-)\); in the first case, we use directly the conditions in Proposition 7, and in the second one, we interchange the roles of \(N^+\) and \(N^-\) so that \(\bar{y}(N^+)\ge \bar{y}(N^-)\). Next, indexing the variables so that \({\bar{y}_{(1)}/ \bar{x}_{(1)}\le \cdots \le \bar{y}_{(m)}/\bar{x}_{(m)}}\), where \(m=|N^+|\), can be done in \(O(m\log m)\) by sorting. Finally, one can simply enumerate all \(m(m-1)/2\) possible values of \({(\kappa _1,\kappa _2)}\) and verify whether conditions (22) are satisfied for each candidate set L and U. Hence, the separation algorithm runs in \(O(n^2)\) time.
Separation for (26) First, indexing the variables so that \({\bar{y}_{(1)}/ \bar{x}_{(1)}\le \cdots \le \bar{y}_{(n)}/\bar{x}_{(n)}}\) can be accomplished in \(O(n\log n)\) time by sorting. Then, one can simply enumerate all n possible values of \({\kappa }\) and verify whether conditions (27) are satisfied for each candidate set L. Since the sorting step dominates the complexity, the separation algorithm runs in \(O(n\log n)\).
5.2 Conic quadratic valid inequalities in an extended formulation
Inequalities (23) and (28) given in the original space of variables are valid only over restricted parts of the domain. They are neither valid nor convex over the entire domain of the variables, e.g., (23) is not convex whenever \(x(N^+{\setminus } L)\ge 1\). Thus, such inequalities are difficult to utilize directly by the optimization solvers. In order to address this challenge, in this section, we give valid conic quadratic reformulations in an extended space, which can be readily used by conic quadratic solvers.
For a partitioning (L, R, U) of \(N^+\) consider the inequality
Note that each inequality (29) requires O(n) additional variables and constraints. Moreover, although not explicitly enforced, it is easy to verify that there exists an optimal solution to (29) with \(\lambda _i\le y_i\) and \(\lambda _0\le y(L)\). Inequalities (29) are convex as they involve linear constraints and sums of ratios of convex quadratic terms and nonnegative linear terms, thus conic quadratic representable [3, 38]. We show, in Proposition 9, that inequalities (29) imply the strong formulations described in Proposition 7, and, in Proposition 10, that they are valid for X.
Proposition 9
If conditions (22a), (22b), (22d), (22e) and (22g) are satisfied, then \(\lambda =\mu = 0\) and \(\lambda _0=\mu _0=\zeta =0\) in an optimal solution of (29).
Proof
Observe that \(\zeta \) does not appear in any constraint of (29). Thus, since \(y(U)-y(N^-)\ge 0\) and \(\lambda ,\lambda _0\ge 0\), it follows that \(\zeta =0\) in an optimal solution. Moreover, since (22a) is satisfied, then setting \(\mu =0\) is feasible for (29). Finally, find that KKT conditions are satisfied for \(\lambda ,\mu =0\) and \(\lambda _0=\mu _0=0\) if
The KKT condition above for \(\lambda _0\) is precisely (22g). Since \(x(R)+x(U)\le 1\) by (22a), and \(y(U)-y(N^-)\ge 0\) by (22d), the KKT condition for \(\mu _0\) is equivalent to \(\frac{y(L)}{1-x(R)-x(U)}+\frac{y(U)-y(N^-)}{x(U)}\ge 0\), and thus reduces to (22g). The KKT conditions for \(\lambda _i\) are satisfied since (22e) holds. Finally, the KKT conditions for \(\mu _i\) can be equivalently stated as \(\frac{y(L)}{1-x(R)-x(U)}\le \frac{y_i}{x_i}\) (since \(x(R)+x(U)\le 1\) and \(x,y \ge 0\)), which are satisfied since (22b) holds. \(\square \)
Note that when \(\lambda =\mu =0\) and \(\lambda _0=\mu _0=\zeta =0\), inequality (29) reduces to (23). Thus, if sets L, U satisfy the conditions of Proposition 7 for a given (x, y), then there exists \(t\in \mathbb {R}\) such that \((x,y,t)\in \text {conv}(X)\) and (29) holds at equality. It remains to prove that inequalities (29) do not cut-off any points in X for any choice of partition (L, R, U).
Proposition 10
For any partitioning (L, R, U) of \(N^+\), inequalities (29) are valid for X.
Proof
It suffices to show that for any \((x,y)\in X\), i.e., \(x_i\in \{0,1\}\) and \(x_i(1-y_i)=0\) for all \(i\in N\), there exists \((\lambda ,\mu ,\lambda _0,\mu _0, \zeta )\) satisfying (29b)–(29e) such that inequality (29a) is valid. We prove the result by cases.
Case 1 \(y(N^+)< y(N^-)\): In this case, we can set \(\lambda _i=y_i\) and \(\mu _i=x_i\) for \(i\in R\), \(\lambda _0=y(L)\), \(\mu _0=x(U)\), \(\zeta =y(N^-)-y(U)-y(L)-y(R)\), and inequality (29a) reduces to \(t\ge 0\), which is valid.
Case 2 \(y(N^+)\ge y(N^-)\), \(x(R)=0\) and \(x(U)=0\): In this case, \(y_i=0\), \(i\in R\cup U\). Setting \(\mu _i=\lambda _i=0\) for \(i\in R\), \(\lambda _0=y(N^-)\), \(\mu _0=0\) and \(\zeta =0\), we find that inequality (29a) reduces to \(t\ge \big (y(L)-y(N^-)\big )^2=\big (y(N^+)-y(N^-)\big )^2\), which is valid.
Case 3 \(y(N^+)\ge y(N^-)\) and \(x(U)\ge 1\): Setting \(\lambda _i=y_i\) and \(\mu _i=x_i\) for \(i\in R\), \(\lambda _0=y(L)\), \(\mu _0=x(U)-1\), and \(\zeta =0\), inequality (29a) reduces to \(t\ge \big (y(N^+)-y(N^-)\big )^2\), which is valid.
Case 4 \(y(N^+)\ge y(N^-)\), \(x(U)= 0\), \(x(R)\ge 1\), \(y(N^-)<y_i\) for all \(i\in R\) and \(y(N^-)<y(L)\): In this case, \(y_i=0\), for all \(i\in U\) and \(x_i=1\), for all \(i\in R\), we can set \(\mu _0=0\), and inequality (29) reduces to
Constraint (30d) is obtained since the denominator of the third term in (29a) is zero, thus constraining the numerator to vanish as well. Moreover, since variable \(\zeta \ge 0\) only appears in (30d), after projecting \(\zeta \) out we find that constraint (30d) reduces to
Note that constraint (31), and assumptions \(y(N^-)<y_i\) for all \(i\in R\) and \(y(N^-)<y(L)\), imply that \(\lambda _i\le y_i\) and \(\lambda _0\le y(L)\). Observe that we can set
Indeed, for any feasible \(\lambda \), \(y(L)+y(R)-\lambda (\bar{R})-\lambda _0\ge y(L)+y(R)-y(N^-)\ge 0\); thus \(\mu _i\le 1\). Moreover,
thus \(\mu _i\ge 0\). For this choice of \(\mu \), we find that
Finally, substituting \(1-|R|+\mu (R)\) and \(\mu _i\) in (30a) with their respective values, (30a) reduces to
and since \(y(L)+y(R)=y(N^+)\), this inequality is valid.
Case 5 \(y(N^+)\ge y(N^-)\), \(x(U)= 0\), \(x(R)\ge 1\), \(y(N^-)< y(L)\) but \(y(N^-)\ge y_j\) for some \(j\in R\): In this case, \(y_i=0\) for all \(i\in U\), and we set \(\mu _0=0\). Note that, in (29), we can set \(\lambda _j=y_j\) and \(\mu _j=x_j\), resulting in the inequality
This inequality of the same form as (29) but with \(\hat{R}=R{\setminus } j\) and \(\hat{y}(N^-)=y(N^-) - y_j\). After repeating sequentially this process so that \(\lambda _i=y_i\) and \(\mu _i=x_i\) for some subset \(T \subseteq R\), such that \(y(N^-)-y(T)\le y_i\) for all \(i\in R{\setminus } T\), and applying a similar strategy as in Case 4, we obtain an inequality of the form
which is valid.
Case 6 \(y(N^+)\ge y(N^-)\), \(x(U)= 0\), \(x(R)\ge 1\), and \(y(N^-)\ge y(L)\): In this case, we can set \(\lambda _0=y(L)\), \(\mu _0=0\), and (29) reduces to
Moreover, if \(y(N^-)-y(L)\ge y_j\) for some \(j\in R\), then we can set \(\lambda _j=y_j\), \(\mu _j=y_j\) as done in Case 5. After repeating this process, we obtain an inequality of the form
where \(y(N^-)-y(L)-y(T)<y_i\) for all \(i\in R{\setminus } T\), and therefore \(x_i=1\) for all \(i\in R{\setminus } T\).
Note that constraint (32d) and \(y(N^-)-y(L)-y(T)<y_i\) imply that \(\lambda _i< y_i\) in any feasible solution. Then, for all \(i \in R{\setminus } T\), we can set
Clearly, \(\mu _i\le x_i\). Moreover, for all \(i \in R{\setminus } T\),
thus \(\mu _i\ge 0\). Finally,
and constraint (32b) is satisfied. Substituting \(x_i-\lambda _i\), \( i \in R {\setminus } T\), with their explicit form in (32a), we find the equivalent form
which is valid. \(\square \)
To derive the corresponding lifted inequalities for \(B^-\), it suffices to interchange \(N^+\) and \(N^-\). Therefore, for a partitioning (L, R, U) of \(N^-\), we find the conic quadratic inequalities:
One of the main results of the paper, that is, an explicit description of \(\text {cl conv}(X)\) via a finite number of conic quadratic inequalities, is stated below.
Theorem 3
\(\text {cl conv}(X)\) is given by bound constraints \(0 \le x\le 1\), \(y \ge 0\), and inequalities (29) and (33).
For the positive case of \(X_+\) with \(N^- =\emptyset \), for a partitioning (L, R) of N, inequalities (29) reduce to
Note that each inequality (34) also requires O(n) additional variables and constraints but is significantly simpler compared to (29).
Theorem 4
\(\text {cl conv}(X_+)\) is given by bound constraints \(0 \le x \le 1\), \(y \ge 0\), and inequalities (34).
6 Computational experiments
In this section, we test the computational effectiveness of the conic quadratic inequalities given in Sect. 5.2 in solving convex quadratic minimization problems with indicators. In particular, we solve portfolio optimization problems with fixed-charges. All experiments are run with CPLEX 12.8 solver on a laptop with a 1.80GHz Intel®CoreTM i7 CPU and 16 GB main memory on a single thread. We use CPLEX default settings but turn on the numerical emphasis parameter, unless stated otherwise. The data for the instances and problem formulations in .lp format can be found online at https://sites.google.com/usc.edu/gomez/data.
6.1 Instances
We consider optimization problems of the form
where \(F\in \mathbb {R}_+^{n\times r}\) with \(r<n\), \(a,b,d\in \mathbb {R}_+^N\). We test two classes of instances, general and positive, where either F has both positive and negative entries, or F has only non-negative entries, respectively. Note that constraints (35d) are in fact a big-M reformulation of complementarity constraint \(y_i(1-x_i)=0\): indeed, constraint (35b) and \(y\ge 0\) imply the upper bound \(y\le 1\). The parameters are generated as follows—we use the notation \(Y\sim U[\ell ,u]\) as “Y is generated from a continuous uniform distribution between \(\ell \) and u":
- F:
-
Let \(\rho \) be a positive weight parameter. Matrix \(F=EG\) where \(E\in \mathbb {R}_+^{n\times r}\) is an exposure matrix such that \(E_{ij}=0\) with probability 0.8 and \(E_{ij}\sim U[0,1]\) otherwise, and \(G\in \mathbb {R}_+^{r\times r}\) such that: \(G_{ij}\sim U[\rho ,1]\). If \(\rho \ge 0\), then matrix F is guaranteed to be positive, and we refer to such instances as positive. Otherwise, for \(\rho <0\), we refer to the instances as general.
- d:
-
Let \(\delta \) be a diagonal dominance parameter. Define \(v=(1/n)\sum _{i=1}^n (FF')_{ii}\) to be the average diagonal element of \(FF'\); then \(d_i^2\sim U[0,v\delta ]\).
- b:
-
We generate entries \(b_i\sim U[0.25,0.75]\times \sqrt{(FF')_{ii}+d_i^2}\). Note that if the terms \(b_i\) and \(((FF')_{ii}+d_i^2)\) are interpreted as the expectation and variance of a random variable, then expectations are approximately proportional to the standard deviations. This relation aims to avoid trivial instances, where one term dominates the other.
- a:
-
Let \(\omega \) be a fixed cost parameter and \(a_i={\omega }(e'b)/n\), \(i\in N\), where e is an n-dimensional vector of ones.
It is well-documented in the literature that for matrices with large diagonal dominance the perspective reformulation achieves close to \(100\%\) gap improvement. Therefore, we choose a low diagonal dominance \(\delta =0.01\) to generate instances hard for the perspective reformulation. In our computations, unless stated otherwise, we use \(n=200\) and \(\beta =(e'b)/n\).
6.2 Methods
We test the following methods:
-
Basic : Problem (35) formulated as
$$\begin{aligned} \min \;&\Vert q\Vert _2^2+\sum _{i=1}^n(d_iy_i)^2 \end{aligned}$$(36a)$$\begin{aligned} \text {s.t.}\;&q=F'y \end{aligned}$$(36b)$$\begin{aligned}&(35b)-(35d) \end{aligned}$$(36c)$$\begin{aligned}&x\in \{0,1\}^n,\; y\in \mathbb {R}_+^n,\; q\in \mathbb {R}^r. \end{aligned}$$(36d) -
Perspective : Problem (35) formulated as
$$\begin{aligned} \min \;&\Vert q\Vert _2^2+\sum _{i=1}^nd_i^2p_i \end{aligned}$$(37a)$$\begin{aligned} \text {s.t.}\;&q=F'y \end{aligned}$$(37b)$$\begin{aligned}&y_i^2\le p_ix_i,{} & {} i=1,\ldots ,n \end{aligned}$$(37c)$$\begin{aligned}&(35b)-(35d) \end{aligned}$$(37d)$$\begin{aligned}&x\in \{0,1\}^n,\; y\in \mathbb {R}_+^n,\;{} & {} p\in \mathbb {R}_+^n,\; q\in \mathbb {R}^r. \end{aligned}$$(37e) -
Supermodular : Problem (35) formulated as
$$\begin{aligned} \min \;&\sum _{j=1}^r t_j+\sum _{i=1}^nd_i^2p_i \end{aligned}$$(38a)$$\begin{aligned} \text {s.t.}\;&\left( F_j'y\right) ^2\le t_j,{} & {} j=1,\ldots ,r \end{aligned}$$(38b)$$\begin{aligned}&y_i^2\le p_ix_i,{} & {} i=1,\ldots ,n \end{aligned}$$(38c)$$\begin{aligned}&(35b)-(35d) \end{aligned}$$(38d)$$\begin{aligned}&x\in \{0,1\}^n,\; y\in \mathbb {R}_+^n,\;{p\in \mathbb {R}_+^n},\;{} & {} t\in \mathbb {R}_+^r, \end{aligned}$$(38e)where \(F_j\) denotes the j-th column of F. Additionally, lifted supermodular inequalities (29) are added to strengthen the relaxations. Note that the convex relaxation of (38) without any additional inequalities is equivalent to the convex relaxation of (37).
Cuts (29) (for general instances) or (34) (for positive instances) for method Supermodular are added as follows:
-
(1)
We solve the convex relaxation of (38) to obtain a solution \((\bar{x}, \bar{y}, \bar{t})\). By default, the convex relaxation is solved with an interior point method.
-
(2)
We find a most violated inequality (29) or (34) for each constraint (38b) using the separation algorithm given in Sect. 5.1.3. Denote by \(\bar{\nu }_j\) the rhs value of (23) or (28) if sets L and U satisfying (22) exist; otherwise, let \(\bar{\nu }=-\infty \).
-
(3)
Let \(\epsilon =10^{-3}\) be a precision parameter. Inequalities found in step (2) are added if either \(\bar{t}_j<\epsilon \) and \((\bar{\nu }_j-\bar{t}_j)>\epsilon \); or \(\bar{t}_j\ge \epsilon \) and \((\bar{\nu }_j-\bar{t}_j)/\bar{t}_j>\epsilon \). At most r inequalities are added per iteration, one for each constraint (38b).
-
(4)
This process is repeated until either no inequality is added in step (3) or max number of cuts (3r) is reached.
We point out that convexification based on \(X_f\) [7], described in Sect. 3.2, is not effective with formulation (38) since \(t_j\ge (F_j'y)^2/\min \{1,e'x\}\) reduces to \(t_j\ge (F_j'y)^2\) due to (35b) and (35d).
6.3 Results
Tables 1, 2, 3 and 4 present the results for \(\rho =\{-1,-0.5,-0.2,0\}\). They show, for different ranks r and values of the fixed cost parameter \({\omega }\), the optimal objective value (opt) and, for each method, the optimal objective value for the convex relaxation (val), the integrality gap (gap) computed as \(\texttt {gap}=\frac{\texttt {opt}-\texttt {val}}{\texttt {opt}}\times 100\), the improvement (imp) of Supermodular over Perspective computed as
the time required to solve the relaxation in seconds (time) and the number of cuts added (cuts). The optimal solutions are computed using CPLEX branch-and-bound method using the Perspective formulation. The values opt and val are scaled so that, in a given instance, \(\texttt {opt}=100\). Each row corresponds to the average of five instances generated with the same parameters.
First note that Perspective achieves only a very modest improvement over Basic due to the low diagonal dominance parameter \(\delta =0.01\). We also point out that instances with smaller positive weight \(\rho \) have weaker natural convex relaxations, i.e., Basic has larger gaps—a similar phenomenon was observed in [26].
The relative performance of all methods in rank-one instances, \(r=1\), is virtually identical regardless of the value of the positive weight parameter \(\rho \). In particular Supermodular substantially improves upon Basic and Perspective : it achieves \(0\%\) gaps in instances with \({\omega }\le 10\), and reduces to gap from 35 to 6% in instances with \({\omega }=50\).
In instances with \(r\ge 5\), the relative performance of Supermodular depends on the positive weight parameter \(\rho \): for larger values of \(\rho \), more cuts are added and Supermodular results in higher quality formulations. For example, in instances with \(r=5\), \({\omega }=50\), the improvements achieved by Supermodular are 40.3% (\(\rho =-1\)), 53.2% (\(\rho =-0.5\)), 62.0% (\(\rho =-0.2\)) and 72.7% (\(\rho =0\)). Similar behavior can be observed for other combinations of parameters with \(r\ge 5\).
Our interpretation of the dependence of \(\rho \) in the strength of the formulation is as follows. For instances with small values of \(\rho \), it is possible to reduce the systematic risk of the portfolio \(y'(FF')y\) close to zero due to negative correlations, i.e., achieve “perfect hedge" although it may be unrealistic in practice. In such instances, the idiosynctratic risk \(\sum _{i=1}^n (d_iy_i)^2\) and constraints (35b)–(35d), which limit diversification, are the most important components behind the portfolio variance. In contrast, as \(\rho \) increases, it is increasingly difficult to reduce the systematic risk (and altogether impossible for \(\rho \ge 0\)). Thus, in such instances, the systematic risk \(y'(FF')y\) accounts for the majority of the variance of the portfolio. Thus, the lifted supermodular inequalities, which exploit the structure induced by the systematic risk, are particularly effective in the later class of instances.
Figure 4 depicts the integrality gap of different formulations as a function of rank for instances with \(\rho =0\). We see that Supermodular achieves large (\(> 70\%\)) improvement over Perspective especially in the challenging low-rank settings. The improvement is significant (44%) also for high-rank settings with \(r=35\).
Finally, to evaluate the computational burden associated with the formulations, we plot in Fig. 5 the time in seconds (in a logarithmic scale) require the solve the convex relaxations of each method for different dimensions n. Each point in Fig. 5 corresponds to an average of 15 portfolio optimization instances generated with parameters \(r=10\), \(\delta =0.01\) and \({\omega }\in \{2,10,50\}\) (5 instances for each value of \({\omega }\)). The time for Supermodular includes the total time used to generate cuts and solving the convex relaxations many times.
We see that, in general, formulation Basic is an order-of-magnitude faster than Perspective, which in turn is an order-of-magnitude faster than Supermodular. Nonetheless, the computation times for Supermodular are adequate for many applications, solving instances with \(n=1000\) on average under four seconds.
Contrary to expectations, Supermodular is faster for general instances than for positive instances, despite the larger and more complex inequalities (29) used for the general case; for \(n=1000\), Supermodular runs in 1.9 s in general instances versus 3.8 s in positive instances. This counter-intuitive behavior is explained by the number of cuts added, as several more violated cuts are found in instances with large values of \(\rho \), leading to larger convex formulations and the need to resolve them more times; for \(n=1000\), 20 cuts are added in each instance with \(\rho =0\), whereas on average only 3.7 cuts are added in instances with \(\rho =-1\).
The computation times are especially promising for tackling large-scale quadratic optimization problems with indicators, where alternatives to constructing strong convex relaxations (often based on decomposition of matrix \(FF'+D\) into lower-dimensional terms) may not scale. For example, Frangioni et al. [26] solve convex relaxations of instances up to \(n=50\), Han et al. [33] solve relaxations for instances up to \(n=150\), and Atamtürk and Gómez [6] report that solving the convex relaxation of quadratic instances with \(n=200\) requires up to 1000 s. All of these methods require adding \(O(n^2)\) variables and constraints to the formulations to achieve strengthening. In contrast, the supermodular inequalities (29) and (34) yield formulations with O(nr) additional variables and constraints, which can be solved efficiently even if n is large provided that the rank r is sufficiently small: in our computations, instances with \(r=10\) and \(n=1000\) are solved in under 4 s. Nonetheless, as discussed in the next section, even if the convex relaxations can be solved easily, incorporating the proposed convexification in branch-and-bound methods may required tailored implementations, not supported by current off-the-shelf branch-and-bound solvers.
6.4 On the performance with off-the-shelf branch-and-bound solvers
We also experimented with solving the formulations Supermodular obtained after adding cuts with CPLEX branch-and-bound algorithm. However, note that inequalities (29) and, to a lesser degree, inequalities (34), involve several ratios that can result in division by 0—from the proof of Proposition 10, we see that this in fact the case in many scenarios. Therefore, while we did not observe any particular numerical difficulties when solving the convex relaxations (via interior point methods), in a small subset of the instances we observed that the branch-and-bound method (based on linear outer approximations) resulted in numerical issues leading to incorrect solutions.
Table 5 reports the results on the two instances that exhibit such pathological behavior. It shows, for each instance and method and different CPLEX settings, the bounds on the optimal solution obtained reported by CPLEX when solving the convex relaxation via interior point methods (barrier, corresponding to a lower bound), and lower and upper bounds reported by running the branch-and-bound algorithm for one hour. We do not scale the solutions obtained in Table 5. The tested settings are default CPLEX (def), default CPLEX with numerical emphasis enabled (+num), and CPLEX with numerical emphasis enabled and presolve and CPLEX cuts disabled (+num-pc).
In the first instance shown in Table 5, when using Supermodular with the default CPLEX settings, the solution reported is worse than the optimal solution by 30%. By enabling the numerical emphasis option, the solution improves but is still 10% worse than the solution reported by Perspective. Nonetheless, if presolve and CPLEX cuts are disabled, then both solutions coincide. The second instance shown in Table 5 exhibits the opposite behavior: when used with the default settings, independently of the numerical emphasis, the solutions obtained by Perspective and Supermodular coincide; however, if presolve and CPLEX cuts are disabled, then the lower bound obtained after one hour of branch-and-bound with the Supermodular method already precludes finding the correct solution. We point out that pathological behavior of conic quadratic branch-and-bound solvers have been observed in the past for other nonlinear mixed-integer problems with a large number of variables, see for example [6, 13, 26, 29].
7 Conclusions
In this paper we describe the convex hull of the epigraph of a rank-one quadratic functions with indicator variables. In order to do so, we first describe the convex hull of a underlying supermodular set function in a lower-dimensional space, and then maximally lift the resulting facets into nonlinear inequalities in the original space of variables. The approach is broadly applicable, as most of the existing results concerning convexifications of convex quadratic functions with indicator variables can be obtained in this way, as well as several well-known classes of facet-defining inequalities for mixed-integer linear problems.
References
Ahmed, S., Atamtürk, A.: Maximizing a class of submodular utility functions. Math. Program. 128(1–2), 149–169 (2011)
Aktürk, M.S., Atamtürk, A., Gürel, S.: A strong conic quadratic reformulation for machine-job assignment with controllable processing times. Oper. Res. Lett. 37, 187–191 (2009)
Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. 95, 3–51 (2003)
Atamtürk, A.: Flow pack facets of the single node fixed-charge flow polytope. Oper. Res. Lett. 29, 107–114 (2001)
Atamtürk, A., Bhardwaj, A.: Supermodular covering knapsack polytope. Discret. Optim. 18, 74–86 (2015)
Atamtürk, A., Gómez, A.: Strong formulations for quadratic optimization with M-matrices and indicator variables. Math. Program. 170, 141–176 (2018)
Atamtürk, A., Gómez, A.: Rank-one convexification for sparse regression (2019). arXiv:1901.10334
Atamtürk, A., Gómez, A.: Submodularity in conic quadratic mixed 0–1 optimization. Oper. Res. 68(2), 609–630 (2020)
Atamtürk, A., Muñoz, J.C.: A study of the lot-sizing polytope. Math. Program. 99, 443–465 (2004)
Atamtürk, A., Narayanan, V.: Submodular function minimization and polarity. Math. Program. 196, 57–67 (2022)
Atamtürk, A., Nemhauser, G.L., Savelsbergh, M.W.P.: Valid inequalities for problems with additive variable upper bounds. Math. Program. 91, 145–162 (2001)
Atamtürk, A., Küçükyavuz, S., Tezel, B.: Path cover and path pack inequalities for the capacitated fixed-charge network flow problem. SIAM J. Optim. 27(3), 1943–1976 (2017)
Atamtürk, A., Gómez, A., Han, S.: Sparse and smooth signal estimation: convexification of \(\ell _0\)-formulations. J. Mach. Learn. Res. 22, 1–4 (2021)
Bach, F.: Submodular functions: from discrete to continuous domains. Math. Program. 175, 419–459 (2019)
Bertsimas, D., King, A.: Or forum—an algorithmic approach to linear regression. Oper. Res. 64, 2–16 (2015)
Bienstock, D.: Computational study of a family of mixed-integer quadratic programming problems. Math. Program. 74(2), 121–140 (1996)
Bienstock, D., Michalka, A.: Cutting-planes for optimization of convex functions over nonconvex sets. SIAM J. Optim. 24, 643–677 (2014)
Bonami, P., Lodi, A., Tramontani, A., Wiese, S.: On mathematical programming with indicator constraints. Math. Program. 151, 191–223 (2015)
Ceria, S., Soares, J.: Convex programming for disjunctive convex optimization. Math. Program. 86, 595–614 (1999)
Cozad, A., Sahinidis, N.V., Miller, D.C.: A combined first-principles and data-driven approach to model building. Comput. Chem. Eng. 73, 116–127 (2015)
Dong, H., Linderoth, J.: On valid inequalities for quadratic programming with continuous variables and binary indicators. In: Goemans, M., Correa, J. (eds.) Proceedings of IPCO 2013, pp. 169–180. Springer, Berlin (2013)
Dong, H., Chen, K., Linderoth, J.: Regularization vs. relaxation: a conic optimization perspective of statistical variable selection (2015). arXiv:1510.06083
Frangioni, A., Gentile, C.: Perspective cuts for a class of convex 0–1 mixed integer programs. Math. Program. 106, 225–236 (2006)
Frangioni, A., Gentile, C.: SDP diagonalizations and perspective cuts for a class of nonseparable MIQP. Oper. Res. Lett. 35, 181–185 (2007)
Frangioni, A., Gentile, C., Lacalandra, F.: Tighter approximated MILP formulations for unit commitment problems. IEEE Trans. Power Syst. 24(1), 105–113 (2009)
Frangioni, A., Gentile, C., Hungerford, J.: Decompositions of semidefinite matrices and the perspective reformulation of nonseparable quadratic programs. Math. Oper. Res. 45(1), 15–33 (2020)
Fujishige, S.: Submodular Functions and Optimization, vol. 58. Elsevier, Amsterdam (2005)
Gómez, A.: Submodularity and valid inequalities in nonlinear optimization with indicator variables (2018)
Gómez, A.: Strong formulations for conic quadratic optimization with indicator variables. Math. Program. 188, 193–226 (2020)
Gómez, A.: Outlier detection in time series via mixed-integer conic quadratic optimization. SIAM J. Optim. 31, 1897–1925 (2021)
Grötschel, M., Lovász, L., Schrijver, A.: The ellipsoid method and its consequences in combinatorial optimization. Combinatorica 1, 169–197 (1981)
Günlük, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear programs with indicator variables. Math. Program. 124, 183–205 (2010)
Han, S., Gómez, A., Atamtürk, A.: 2x2 convexifications for convex quadratic optimization with indicator variables (2020). arXiv:2004.07448
Hazimeh, H., Mazumder, R., Saab, A.: Sparse regression at scale: branch-and-bound rooted in first-order optimization. Math. Program. 196, 347–388 (2022)
Hijazi, H., Bonami, P., Cornuéjols, G., Ouorou, A.: Mixed-integer nonlinear programs featuring “on/off’’ constraints. Comput. Optim. Appl. 52, 537–558 (2012)
Jeon, H., Linderoth, J., Miller, A.: Quadratic cone cutting surfaces for quadratic programs with on-off constraints. Discret. Optim. 24, 32–50 (2017)
Kılınç-Karzan, F., Küçükyavuz, S., Lee, D.: Joint chance-constrained programs and the intersection of mixing sets through a submodularity lens. Math. Program. 195, 283–326 (2022)
Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming. Linear Algebra Appl. 284, 193–228 (1998)
Mahajan, A., Leyffer, S., Linderoth, J., Luedtke, J., Munson, T.: Minotaur: a mixed-integer nonlinear optimization toolkit. ANL/MCS-P8010-0817, Argonne National Lab (2017)
Manzour, H., Küçükyavuz, S., Shojaie, A.: Integer programming for learning directed acyclic graphs from continuous data. INFORMS J. Optim. 3, 46–73 (2020)
Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley, Hoboken (1988)
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions—I. Math. Program. 14, 265–294 (1978)
Nguyen, T.T., Richard, J.P.P., Tawarmalani, M.: Deriving convex hulls through lifting and projection. Math. Program. 169(2), 377–415 (2018)
Orlin, J.B.: A faster strongly polynomial time algorithm for submodular function minimization. Math. Program. 118, 237–251 (2009)
Padberg, M.W., Van Roy, T.J., Wolsey, L.A.: Valid linear inequalities for fixed charge problems. Oper. Res. 33(4), 842–861 (1985)
Pochet, Y.: Valid inequalities and separation for capacitated economic lot sizing. Oper. Res. Lett. 7, 109–115 (1988)
Richard, J.P.P., Tawarmalani, M.: Lifting inequalities: a framework for generating strong cuts for nonlinear programs. Math. Program. 121, 61–104 (2010)
Shi, X., Prokopyev, O.A., Zeng, B.: Sequence independent lifting for the set of submodular maximization problem. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 378–390. Springer (2020)
Tjandraatmadja, C., Anderson, R., Huchette, J., Ma, W., Patel, K.K., Vielma, J.P.: The convex relaxation barrier, revisited: tightened single-neuron relaxations for neural network verification. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, p. 21,675-21,686. Curran Associates Inc., Red Hook (2020)
Van Roy, T.J., Wolsey, L.A.: Valid inequalities for mixed 0–1 programs. Discret. Appl. Math. 14, 199–213 (1986)
Wei, L., Gómez, A., Küçükyavuz, S.: On the convexification of constrained quadratic optimization problems with indicator variables. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 433–447. Springer (2020)
Wei, L., Gómez, A., Küçükyavuz, S.: Ideal formulations for constrained convex optimization problems with indicator variables. Math. Program. 192, 57–88 (2022)
Wei, L., Atamtürk, A., Gómez, A., Küçükyavuz, S.: On the convex hull of convex quadratic optimization problems with indicators (2022). arXiv:2201.00387
Wolsey, L.A.: Submodularity and valid inequalities in capacitated fixed charge networks. Oper. Res. Lett. 8, 119–124 (1989)
Wu, B., Sun, X., Li, D., Zheng, X.: Quadratic convex reformulations for semicontinuous quadratic programming. SIAM J. Optim. 27, 1531–1553 (2017)
Wu, H.H., Küçükyavuz, S.: Maximizing influence in social networks: a two-stage stochastic programming approach that exploits submodularity (2015). arXiv:1512.04180
Xie, W., Deng, X.: Scalable algorithms for the sparse ridge regression. SIAM J. Optim. 30, 3359–3386 (2020)
Yu, J., Ahmed, S.: Maximizing a class of submodular utility functions with constraints. Math. Program. 162(1–2), 145–164 (2017)
Yu, J., Ahmed, S.: Polyhedral results for a class of cardinality constrained submodular minimization problems. Discret. Optim. 24, 87–102 (2017)
Yu, Q., Küçükyavuz, S.: A polyhedral approach to bisubmodular function minimization. Oper. Res. Lett. 49, 5–10 (2021)
Zheng, X., Sun, X., Li, D.: Improving the performance of MIQP solvers for quadratic programs with cardinality and minimum threshold constraints: a semidefinite program approach. INFORMS J. Comput. 26, 690–703 (2014)
Acknowledgements
Alper Atamtürk is supported, in part, by NSF AI Institute for Advances in Optimization Award 2112533, NSF Grant 1807260 and DOD ONR grant 12951270. A Gómez is supported, in part, by NSF Grants 1818700 and 1930582.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix A
Appendix A
Proof of Proposition 8
In order to solve problem (26) we introduce an auxiliary variable \(\gamma \in \mathbb {R}_+\) such that \(\gamma =\max _\alpha (S^+)\). Then, inequality (26) reduces to
where constraint (39b) enforces the definition of \(\gamma \).
Note that there exists an optimal solution for (39) were \(\gamma \le \alpha _i\) for all \(i\in N\): if \(\alpha _i<\gamma \) for some \(i\in N\), then setting \(\alpha _i=\gamma \) yields a feasible solution with improved objective value. Therefore, S is completely determined by \(\gamma \) since \(S=\left\{ i\in N: \alpha _i\le \gamma \right\} \).
Now, let \(L=\left\{ i\in N:\alpha _i=\gamma \right\} \) in a solution of (39). From the discussion above, we find that (39) reduces to
Observe that for \((L,\gamma )\) to correspond to an optimal solution, we require that \(1-{\bar{x}}(N{\setminus } L)\ge 0\) (otherwise \(\gamma \) can be increased and set to an upper bound while improving the objective value). When this condition is satisfied, we find by taking derivatives of the objective and setting to 0, that \(\alpha _i=2{\bar{y}_i}/{\bar{x}_i}\) for \(i\in N{\setminus } L\) and \(\gamma = 2 {\bar{y}}(L)/\big (1-{\bar{x}}(N{\setminus } L)\big )\), and (40) simplifies to (28). Note however that, in general, \((\alpha ,\gamma )\) may not satisfy constraints (40b) for any choice of sets \(L\subseteq N\). The constraints are satisfied if and only if \(\gamma <\alpha _i\) for all \(i\in N{\setminus } L\), i.e., if and only if conditions (27b) are satisfied.
In order for L to be optimal we require condition (27c), i.e.,
Indeed, if this condition is not satisfied for some \(j\in L\), then increasing \(\alpha _j\) from \(\gamma =2\frac{{\bar{y}}(L)}{1-{\bar{x}}(N{\setminus } L)}\ge 2\frac{{\bar{y}_j}}{{\bar{x}_j}}\) to \(2{\bar{y}_j}/{\bar{x}_j}\) (or setting it to \(\beta \) if \(\beta <2{\bar{y}_j}/{\bar{x}_j}\)) results in a better objective value.
Finally, note that conditions (27b) and (27c) together imply that \(\bar{y}_i/\bar{x}_i< \bar{y}_j/\bar{x}_j\) whenever \(i\in L\) and \(j\not \in L\); in other words, if \(L\ne \emptyset \), then \(L=\left\{ (1),(2),\ldots ,(\kappa )\right\} \) for some \(1\le \kappa \le n\). \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Atamtürk, A., Gómez, A. Supermodularity and valid inequalities for quadratic optimization with indicators. Math. Program. 201, 295–338 (2023). https://doi.org/10.1007/s10107-022-01908-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01908-2
Keywords
- Quadratic optimization
- Supermodular inequalities
- Perspective formulation
- Conic quadratic cuts
- Convex piecewise valid inequalities
- Lifting