1 Introduction

Consider the convex quadratic optimization problem with indicators

$$\begin{aligned} \min \! \big \{a'x\!+\!b'y\!+\!y'Qy\! : \ y_i(1\!-\!x_i)=0, i=1,\ldots ,n; \ (x,y) \in \{0,1\}^n \! \times \mathbb {R}_+^n\big \}\qquad \end{aligned}$$
(1)

where \(a, b \in \mathbb {R}^n\) and \(Q\in \mathbb {R}^{n\times n}\) is a symmetric positive semi-definite matrix. For each \(i=1, \ldots , n\), the binary variable \(x_i\), along with the complementarity constraint \(y_i (1-x_i)=0\), indicates whether \(y_i\) may take positive values. Problem (1) arises in numerous practical applications, including portfolio optimization [16], signal/image denoising [13, 14], best subset selection [15, 20, 34], and unit commitment [25].

Constructing strong convex relaxations for non-convex optimization problems is critical in devising effective solution approaches for them. Natural convex relaxations of (1), where the complementarity constraints \(y_i(1-x_i)=0\) are linearized using the so-called “big-M" constraints \(y_i\le Mx_i\), are known to be weak [40, e.g.,]. Therefore, there is an increasing effort in the literature to better understand and describe the epigraph of quadratic functions with indicator variables. Dong and Linderoth [21] describe lifted linear inequalities for (1) from its continuous quadratic optimization counterpart over bounded variables. Bienstock and Michalka [17] give a characterization of linear inequalities obtained by strengthening gradient inequalities of a convex objective function over a non-convex set.

The majority of the work toward constructing strong relaxations of (1) is based on the perspective reformulation [2, 18, 23, 32, 35, 39, 55, 57]. The perspective reformulation, which may be seen as a consequence of the convexifications based on disjunctive programming derived in [19], is based on strengthening the epigraph of a univariate convex quadratic function \(y_i^2\le t\) by using its perspective \(y_i^2/x_i\le t\). The perspective strengthening can be applied to a general convex quadratic \(y'Qy\), by writing it as \(y' (Q-D) y + y'Dy\) for a diagonal matrix \(D \succ 0\) and \(Q-D\succeq 0\), and simply reformulating each separable quadratic term \(D_{ii}y_i^2\) as \(D_{ii}y_i^2/x_i\) [22, 24, 61]. While this approach is effective when Q is strongly diagonal dominant, it is ineffective otherwise, or inapplicable when Q is not full-rank as no such D exists.

To address the limitations of the perspective reformulation, a recent stream of research focuses on constructing strong relaxations of the epigraphs of simple but multi-variable quadratic functions. Jeon et al. [36] use linear lifting to construct valid inequalities for the epigraphs of two-variable quadratic functions. Frangioni et al. [26] use extended formulations based on disjunctive programming to derive stronger relaxations of the epigraph of two-variable functions. They study heuristics and semi-definite programming (SDP) approaches to extract from Q such two-variable terms. The disjunctive approach results in a substantial increase in the size of the formulations, which limits its use to small instances. Atamtürk and Gómez [6] describe the convex hull of the epigraph of the two-variable quadratic function \((y_1-y_2)^2\le t\) in the original space of variables, and Atamtürk et al. [13] generalize this result to convex two-variable quadratic functions \(a_1y_1^2-2y_1y_2+a_2y_2^2\le t\) and show how to optimally decompose an M-matrix (psd with non-positive off-diagonals) Q into such two-variable terms; their numerical results indicate that such formulations considerably improve the convex relaxations when Q is an M-matrix, but the relaxation quality degrades when Q has positive off-diagonal entries. Han et al. [33] give SDP formulations for (1) based on convex-hull descriptions of the 2x2 case. These SDP formulations require \(O(n^2)\) additional variables and constraints, which may not scale to large problems. Wei et al. [53] give an extended formulation via a single SDP constraint and linear inequalities. Atamtürk and Gómez [7] give the convex hull description of a rank-one function with free continuous variables, and propose an SDP formulation to tackle quadratic optimization problems with free variables arising in sparse regression. Wei et al. [51, 52] extend those results, deriving ideal formulations for rank-one functions with arbitrary constraints on the indicator variables x. These formulations are shown to be effective in sparse regression problems; however as they do not account for the non-negativity constraints on the continuous variables, they are weak for (1). The rank-one quadratic set studied in this paper addresses this gap and properly generalizes the perspective strengthening of a univariate quadratic to higher dimensions.

In the context of discrete optimization, submodularity/supermodularity plays a critical role in the design of algorithms [27, 31, 44] and in constructing convex relaxations to discrete problems [1, 5, 10, 42, 48, 56, 58,59,60]. Exploiting submodularity in settings involving continuous variables as well typically require specialized arguments, e.g., see [12, 37, 49]. A notable exception is Wolsey [54], presenting a systematic approach for exploiting submodularity in fixed-charge network problems. As submodularity arises in combinatorial optimization, where the convex hulls of the sets under study are polyhedral, there are few papers utilizing submodularity to describe non-polyhedral convex hulls [8], and those sets typically involve some degree of separability between continuous and discrete variables. In this paper, we show how to generalize the valid inequalities proposed in [54] to convexify non-polyhedral sets, where the continuous variables are linked with the binary variables via indicator constraints.

1.1 Contributions

Here, we study the mixed-integer epigraph of a rank-one quadratic function with indicator variables and non-negative continuous variables:

$$\begin{aligned} X&=\left\{ (x,y,t)\in \{0,1\}^N\times \mathbb {R}_+^N\times \mathbb {R}_+: \left( \sum _{i\in N^+}y_i-\sum _{i\in N^-}y_i\right) ^2\le t,\; y_i(1-x_i)=0, \ i\in N\right\} , \end{aligned}$$

where \((N^+,N^-)\) is a partition of \(N := \{1, \ldots ,n\}\). Observe that any rank-one quadratic of the form \(\left( c'y\right) ^2\le t\) with \(c_i\ne 0\) for all \(i\in N\) can be written as in X by scaling the continuous variables. If all coefficients of c are of the same sign, then either \(N^+=\emptyset \) or \(N^-=\emptyset \), and X reduces to the simpler form

$$\begin{aligned} X_+=\left\{ (x,y,t)\in \{0,1\}^N\times \mathbb {R}_+^N\times \mathbb {R}_+: \left( \sum _{i\in N}y_i\right) ^2\le t,\; y_i(1-x_i)=0, \ i\in N\right\} \cdot \end{aligned}$$

To the best of our knowledge, the convex hull structure of X or \(X_+\) has not been studied before. Interestingly, optimization of a linear function over X can be done in linear time (Sect. 4.2).

Our motivation for studying X stems from constructing strong convex relaxations for problem (1) by writing the convex quadratic \(y'Qy\) as a sum of rank-one quadratics. Especially in large-scale applications, it is effective to state Q as a sum of a low-rank matrix and a diagonal matrix. Specifically, suppose that \(Q=FF'+D\), where \(F\in \mathbb {R}^{n\times r}\) and \(D\in \mathbb {R}^{n\times n}\) is a (possibly equal to zero) nonnegative diagonal matrix. Such decompositions can be constructed in numerous ways, including singular-value decomposition, Cholesky decomposition, or via factor models. Letting \(F_j\) denote the j-th column of F, adding auxiliary variables \(t\in \mathbb {R}^r\), \(j=1,\dots ,r\), and using the perspective reformulation, problem (1) can be cast as

$$\begin{aligned} \min _{x,y,t}\;&{a'x + b'y +} \sum _{j=1}^rt_j+\sum _{i=1}^nD_{ii}\frac{y_i^2}{x_i} \end{aligned}$$
(2a)
$$\begin{aligned} \text {s.t.}\;&(F_j'y)^2\le t_j, \ j =1, \ldots , r \end{aligned}$$
(2b)
$$\begin{aligned}&(x,y)\in \{0,1\}^N\times \mathbb {R}_+^n, \ {t \in \mathbb {R}^r}. \end{aligned}$$
(2c)

Formulation (2) arises naturally, for example, in portfolio risk minimization [16], where the covariance matrix Q is the sum of a low-rank factor covariance matrix and an idiosyncratic (diagonal) variance matrix. When the entries of the diagonal matrix D are small, the perspective reformulation is not effective in strengthening the formulation. However, noting that \((x,F_j\circ y,t_j)\in X\), where \((F_j\circ y)_i=F_{ij}y_i\), for each \(j=1,\ldots ,r\), one can employ strong relaxations based on the rank-one quadratic with indicators. Our approach for decomposing \(y'Qy\) into a sum of rank-one quadratics and utilizing strong relaxations of epigraphs of rank-one quadratics is analogous to employing cuts separately from individual rows of a constraint matrix \(Ax \le b\) in mixed-integer linear programming.

In this paper, we present a generic framework for obtaining valid inequalities for mixed-integer nonlinear optimization problems by exploiting supermodularity of the underlying set function. To do so, we project out the continuous variables and derive valid inequalities for the corresponding pure integer set and then lift these inequalities to the space of continuous variables as in Nguyen et al. [43], Richard and Tawarmalani [47]. It turns out that for the rank-one quadratic with indicators, the corresponding set function is supermodular and holds much of the structure of X. The lifted supermodular inequalities derived in this paper are nonlinear in both the continuous and discrete variables.

We show that this approach encompasses several previously known convexifications for quadratic optimization with indicator variables. Moreover, the well-known inequalities in the mixed-integer linear optimization literature given in [54], which include flow cover inequalities as a special case, can also be obtained via the lifted supermodular inequalities.

Finally, and more importantly, we show that the lifted supermodular inequalities and bound constraints are sufficient to describe \(\text {cl conv}(X)\). Such convex hull descriptions of high-dimensional nonlinear sets are rare in the literature. In particular, we give a characterization in the original space of variables. This description is defined by a piecewise valid function with exponentially many pieces; therefore, it cannot be used by the convex optimization solvers directly. To overcome this difficulty, we also give a conic quadratic representable description in an extended space, with exponentially many valid conic quadratic inequalities, along with a polynomial-time separation algorithm.

The rank-one quadratic sets X and \(X_+\) appear very similar to their relaxation

$$\begin{aligned} X_f =\left\{ (x,y,t)\in \{0,1\}^N\times \mathbb {R}^N\times \mathbb {R}: \left( \sum _{i\in N}y_i\right) ^2\le t,\ y_i(1-x_i)=0, \ i\in N\right\} , \end{aligned}$$

where the non-negativity constraints on the continuous variables \(y \ge 0\) are dropped. However, while only one additional inequality \(\frac{\left( \sum _{i\in N} y_i\right) ^2}{\sum _{i \in N} x_i} \le t\) is needed to describe \(\text {cl conv}(X_f)\) [7] , the convex hulls of X and \(X_+\) are substantially more complicated and rich. Indeed, \(\text {cl conv}(X_f)\) provides a weak relaxation for \(\text {cl conv}(X_+)\), as illustrated in the next example.

Example 1

Consider set \(X_+\) with \(n=3\). For the relaxation \(X_f\), the closure of the convex hull is described by \(0\le x \le 1\) and inequality \(t\ge \frac{(y_1+y_2+y_3)^2}{\min \{1,x_1+x_2+x_3\}}\). Figure 1a depicts this inequality as a function of \((x_1,y_1)\) for \(x_2=0.6\), \(x_3=0.3\), \(y_2=0.5\), and \(y_3=0.2\) (fixed). In Proposition 8, we give the function f describing \(\text {cl conv}(X_+)\). Figure 1b depicts f(xy) (truncated at 5) as a function of \((x_1,y_1)\) when other variables are fixed as before.

We find that \(\text {cl conv}(X_f)\) is a very weak relaxation of \(\text {cl conv}(X_+)\) for low values of \(x_1\). For example, for \({x_1}=0.01\) and \({y_1}=1\), we find that \(\frac{(1+0.5+0.2)^2}{0.01+0.6+0.3}\approx 3.18\), whereas \(f(x,y)\approx 100.55\). The computation of f for this example is described after Proposition 8. \(\square \)

Fig. 1
figure 1

Comparison of \(\text {cl conv}(X_f)\) and \(\text {cl conv}(X_+)\). Variables \(x_2=0.6\), \(x_3=0.3\), \(y_2=0.5\), and \(y_3=0.2\) are fixed

1.2 Outline

The rest of the paper is organized as follows. In Sect. 2 we review the valid inequalities for supermodular set functions and present the general form of the lifted supermodular inequalities. In Sect. 3 we re-derive known ideal formulations in the literature for quadratic optimization using the lifted supermodular inequalities. In Sect. 4 we show that the lifted supermodular inequalities are sufficient to describe the convex hull of X. In Sect. 5 we provide the explicit form of the lifted supermodular inequalities for X, both in the original space of variables and in a conic quadratic representable form in an extended space, and discuss the separation problem. In Sect. 6 we present computational results, and in Sect. 7 we conclude the paper.

1.3 Notation

For a set \(S\subseteq N\), define \(x_S\) as the indicator vector of S. By abusing notation, given a set function \(g:2^N\rightarrow \mathbb {R}\), we may equivalently write g(S) or \(g(x_S)\). To simplify the notation, given \(i\in N\) and \(S\subseteq N\), we write \(S\cup i\) instead of \(S\cup \{i\}\) and \(S{\setminus } i\) instead of \(S{\setminus }\{i\}\). For a set \(Y\subseteq \mathbb {R}^N\), let \(\text {conv}(Y)\) denote the convex hull of Y and \(\text {cl conv}\)(Y) denote its closure. We adopt the convention that \(a/0=\infty \) if \(a>0\) and \(a/0=0\) if \(a=0\). For a \(a\in \mathbb {R}\), let \(a_+=\max \{a,0\}\). For a vector \(c\in \mathbb {R}^N\) and a set \(S\subseteq N\), we let \(c(S)=\sum _{i\in S}c_i\), \(\max _c(S)=\max _{i\in S}c_i\) (by convention, \(\max _c(\emptyset )=0\)) and \(c_S\) be the subvector of c induced by S. For an optimization problem with variables x, an optimal solution is denoted by \(x^*\).

2 Preliminaries

In this section we cover a few preliminary results for the paper and, at the end, give the general form of the lifted supermodular inequalities (Theorem 1).

2.1 Supermodularity and valid inequalities

A set function \(g:2^N\rightarrow \mathbb {R}\) is supermodular if

$$\begin{aligned} \rho (i,S)\le \rho (i,T) \quad \forall i\in N\text { and } \forall S\subseteq T\subseteq N{\setminus } i, \end{aligned}$$

where \(\rho (i,S)=g(S\cup i)-g(S)\) is the increment function.

Proposition 1

(Nemhauser et al. [42]) If g is a supermodular function, then

  1. (1)

    \(g(T)\ge g(S)+\sum \limits _{i\in T{\setminus } S}\rho (i,S)-\sum \limits _{i\in S{\setminus } T}\rho (i,N{\setminus } i )\) for all \(S,T\subseteq N\)

  2. (2)

    \(g(T)\ge g(S)+\sum \limits _{i\in T{\setminus } S}\rho (i,\emptyset )-\sum \limits _{i\in S{\setminus } T}\rho (i,S{\setminus } i )\) for all \(S,T\subseteq N\).

As a direct consequence of Proposition 1, one can construct valid inequalities for the epigraph of a supermodular function g, i.e.,

$$\begin{aligned} Z=\left\{ (x,t)\in \{0,1\}^N\times \mathbb {R}: g(x)\le t\right\} . \end{aligned}$$

Specifically, for any \(S\subseteq N\), the linear supermodular inequalities [41]

$$\begin{aligned} g(S)+\sum \limits _{i\in N{\setminus } S}\rho (i,S)x_i-\sum \limits _{i\in S}\rho (i,N{\setminus } i )(1-x_i)&\le t,\text { and} \end{aligned}$$
(3a)
$$\begin{aligned} g(S)+\sum \limits _{i\in N{\setminus } S}\rho (i,\emptyset )x_i-\sum \limits _{i\in S}\rho (i,S{\setminus } i )(1-x_i)&\le t \end{aligned}$$
(3b)

are valid for Z.

2.2 Lifted supermodular inequalities

We now describe a family of lifted supermodular inequalities, using a lifting approach similar to the ones used in [28, 47]. Let \(h:\{0,1\}^N\times \mathbb {R}^N\rightarrow \mathbb {R}\cup \{\infty \}\) be a function defined over a mixed 0-1 domain and consider its epigraph

$$\begin{aligned} H=\left\{ (x,y,t)\in \{0,1\}^N\times \mathbb {R}^N\times \mathbb {R}: h(x,y)\le t\right\} . \end{aligned}$$

Observe that H allows for arbitrary constraints, which can be encoded via function h. For example, nonnegativity and complementarity constraints can be included by letting \(h(x,y)=\infty \) whenever \(y_i<0\) or \(y_i(1-x_i)\ne 0\) for some \(i\in N\).

For \(\alpha \in \mathbb {R}^N\), define the set function \(g_\alpha :\{0,1\}^N\rightarrow \mathbb {R}\cup \{\infty ,-\infty \}\) as

$$\begin{aligned} g_\alpha (x)=\min _{y\in \mathbb {R}^N}-\alpha 'y+h(x,y), \end{aligned}$$
(4)

and let \(B\subseteq \mathbb {R}^N\) be the set of values of \(\alpha \) for which problem (4) is bounded for all \(x\in \{0,1\}^N\), i.e.,

$$\begin{aligned} B=\left\{ \alpha \in \mathbb {R}^N: |g_\alpha (x)|<\infty , \ \forall x\in \{0,1\}^N \right\} . \end{aligned}$$

Although supermodularity is defined for set functions only, we propose in Definition 1 below an extension for functions involving continuous variables as well.

Definition 1

Function h is supermodular if the set function \(g_\alpha \) defined in (4) is supermodular for all \(\alpha \in B\).

Remark 1

Suppose that h does not depend on the continuous variables y, i.e., \(h(x,y)=g(x)\). In this case problem (4) is unbounded unless \(\alpha =0\), i.e., \(B=\{0\}\), and we find that h(xy) is supermodular if and only if \(g_0(x)=g(x)\) is supermodular. Thus, Definition 1 includes the usual definition of supermodularity for set functions as a special case. \(\square \)

Proposition 2

If function h is supermodular, then for any \(\alpha \in B\) and \(S\subseteq N\), the inequalities

$$\begin{aligned} \alpha 'y+g_\alpha (S)+\sum \limits _{i\in N{\setminus } S}\rho _\alpha (i,S)x_i-\sum \limits _{i\in S}\rho _\alpha (i,N{\setminus } i )(1-x_i)&\le t,\text { and} \end{aligned}$$
(5a)
$$\begin{aligned} \alpha 'y+g_\alpha (S)+\sum \limits _{i\in N{\setminus } S}\rho _\alpha (i,\emptyset )x_i-\sum \limits _{i\in S}\rho _\alpha (i,S{\setminus } i )(1-x_i)&\le t \end{aligned}$$
(5b)

are valid for H, where \(\rho _\alpha (i,S)=g_\alpha (S\cup i)-g_\alpha (S)\).

Proof

For any \(\alpha \in B\), \(S\subseteq N\), and \((x,y,t)\in H\), we find

$$\begin{aligned} t-\alpha 'y\ge h(x,y)-\alpha 'y\ge g_\alpha (x)\ge g_\alpha (S)+\sum \limits _{i\in N{\setminus } S}\rho _\alpha (i,S)x_i-\sum \limits _{i\in S}\rho _\alpha (i,N{\setminus } i )(1-x_i), \end{aligned}$$

where the first inequality follows directly from the definition of H, the second inequality follows by minimizing \(h(y)-\alpha 'y\) with respect to y, and the third inequality follows from the validity of (3a). Thus, by adding \(\alpha 'y\) on both sides, we find that inequality (5a) is valid. The validity of (5b) is proven identically. \(\square \)

Since inequalities (5) are valid for any \(\alpha \in B\), one can obtain stronger valid inequalities by optimally choosing vector \(\alpha \).

Theorem 1

(Lifted supermodular inequalities) If h is supermodular, then for any \(S\subseteq N\), the lifted supermodular inequalities

$$\begin{aligned} \max _{\alpha \in B}\;g_\alpha (S)+\sum \limits _{i\in N{\setminus } S}\rho _\alpha (i,S)x_i-\sum \limits _{i\in S}\rho _\alpha (i,N{\setminus } i)(1-x_i)+\alpha 'y&\le t,\text { and} \end{aligned}$$
(6a)
$$\begin{aligned} \max _{\alpha \in B}\;g_\alpha (S)+\sum \limits _{i\in N{\setminus } S}\rho _\alpha (i,\emptyset )x_i-\sum \limits _{i\in S}\rho _\alpha (i,S{\setminus } i)(1-x_i)+\alpha 'y&\le t \end{aligned}$$
(6b)

are valid for H.

Observe that while inequalities (5) are linear, inequalities (6) are nonlinear in x and y. Moreover, each inequality (6) is convex since it is defined as a supremum of linear inequalities. In addition, if the base supermodular inequalities (3) are strong for the convex hull of epi \(g_\alpha \), then the lifted supermodular inequalities (6) are strong for H as well, as formalized next. Given \(\alpha \in B\), define

$$\begin{aligned} G_\alpha =\left\{ (x,t)\in \{0,1\}^N\times \mathbb {R}: g_\alpha (x)\le t\right\} . \end{aligned}$$

Note that \(\text {conv}(G_\alpha )\) is a polyhedron. Theorem 2 below is a direct consequence of Theorem 1 in [47].

Theorem 2

([47]) If inequalities (3) and bound constraints \(0\le x\le 1\) describe \(\text {conv}(G_\alpha )\) for all \(\alpha \in B\), then the lifted supermodular inequalities (6) and bound constraints \(0\le x\le 1\) describe \(\text {cl conv}(H)\).

Although Definition 1 may appear to be too restrictive to arise in practice, we show in Sect. 2.3 that supermodular functions are in fact widespread in a class of well-studied problems in mixed-integer linear optimization. In Sect. 3 we show that several existing results for quadratic optimization with indicators can be obtained as lifted supermodular inequalities. Perhaps, more surprisingly, for the rank-one quadratic with indicators

$$\begin{aligned} h(x,y)={\left\{ \begin{array}{ll} \big (y(N^+) - y(N^-) \big )^2 &{}\text {if }y\ge 0\text { and }y_i(1-x_i)=0, \ \forall i\in N^+ \cup N^-\\ \infty &{}\text {otherwise,} \end{array}\right. } \end{aligned}$$

we show in Sect. 4 that conditions in Definition 1 and Theorem 2 are satisfied as well.

2.3 Supermodular inequalities and fixed-charge networks

Given \(b\in \mathbb {R}\), \(u\in \mathbb {R}_+^N\), and a partition \(N=N^+\cup N^-\cup A^+\cup A^-\), define for all \(x\in \{0,1\}^N\) the fixed-charge network set

$$\begin{aligned}FC(x)=\Big \{y\in \mathbb {R}_+^N: \ {}&y(N^+)+y( A^+)-y(A^-)-y(N^-)\le b,\; y_i\le u_i, \; i\in N,\\&y_i(1-x_i)=0, \ i\in N^+,\; y_ix_i=0, \ i\in N^- \Big \} \cdot \end{aligned}$$

Wolsey [54] uses FC(x) to describe network structures arising in flow problems with fixed charges on the arcs: \(N^+\) denotes the incoming arcs into a given subgraph, \(N^-\) denotes the outgoing arcs, and whereas \(A^+\cup A^-\) denotes the internal arcs in the subgraph, and b represents the supply/demand of the subgraph. Finally, define

$$\begin{aligned} h(x,y)={\left\{ \begin{array}{ll}0 &{} \text {if }y\in FC(x)\\ \infty &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$

Proposition 3

([54])For any \(\alpha \in \mathbb {R}^N\), the function

$$\begin{aligned} v_\alpha (x)=\max _{y\in \mathbb {R}_+^N}\alpha 'y-h(x,y) \end{aligned}$$

is submodular.

It follows that the function \(g_\alpha (x)=-v_\alpha (x)=\min _{y\in \mathbb {R}_+^N}-\alpha 'y+h(x,y)\) is supermodular, and inequalities (5) and (6) are valid. Moreover, Wolsey [54] shows that the linear supermodular inequalities (5) with \(\alpha \in \{-1,0,1\}^N\) include as special cases well-known inequalities for mixed-integer linear optimization such as flow-cover inequalities [45, 50] and inequalities for capacitated lot-sizing [9, 46]; several other classes for fixed-charge network flow problems are special cases as well [4, 11, 12]. Therefore, the inequalities presented in this paper can be interpreted as nonlinear generalizations of the aforementioned inequalities.

3 Previous results as lifted supermodular inequalities

In order to illustrate the approach, in this section, we show how existing results for quadratic optimization with indicators can be derived using the lifted supermodular inequalities (6).

3.1 The single-variable case

Consider, first, the single-variable case

$$\begin{aligned} X^1=\left\{ (x,y,t)\in \{0,1\}\times \mathbb {R}_+\times \mathbb {R}: y^2\le t,\; y(1-x)=0 \right\} \end{aligned}$$

for which \(\text {cl conv}(X^1)\) is given by the perspective reformulation [2, 19, 23, 32]:

$$\begin{aligned} \text {cl conv}(X^1)=\left\{ (x,y,t)\in [0,1]\times \mathbb {R}_+\times {(\mathbb {R}\cup \infty )}: \frac{y^2}{x}\le t \right\} . \end{aligned}$$

Note that \( \text {cl conv}(X^1) \subseteq {\mathbb {R}}^2\times ({\mathbb {R}}\cup \infty )\). We now derive the perspective reformulation as a special case, in fact, using a modular inequality. Note that \(g_\alpha (0)=0\) and \(g_\alpha (1) =\min _{y\in \mathbb {R}_+}-\alpha y+y^2 = -\frac{\alpha _+^2}{4}\) since \(y^*=\alpha /2\) if \(\alpha \ge 0\) and \(y^*=0\) otherwise. Thus, \(g_\alpha \) is a modular function for any \(\alpha \in \mathbb {R}^N\), and inequalities (3) reduce to

$$\begin{aligned} t \ge -\frac{1}{4}\alpha _+^2x. \end{aligned}$$

Then, we find that inequalities (6) reduce to the perspective of \(y^2\):

figure a

3.2 The rank-one case with free continuous variables

Consider the relaxation of X obtained by dropping the non-negativity constraints \(y \ge 0\):

$$\begin{aligned} X_f=\big \{(x,y,t)\in \{0,1\}^N\times \mathbb {R}^N\times \mathbb {R}: y(N)^2\le t,\; y_i(1-x_i)=0, \ \forall i\in N\big \} \cdot \end{aligned}$$

Observe that any rank-one quadratic constraint of the form \(\left( \sum _{i\in N}c_iy_i\right) ^2\le t\) with \(c_i\ne 0\) can be transformed into the form given in \(X_f\) by scaling the continuous variables (so that \(|c_i|=1\)) and negating variables as \({\bar{y}}_i:=-y_i\) if \(c_i<0\). The closure of the convex hull of \(X_f\) is derived in [7], and the effectiveness of the resulting inequalities is demonstrated on sparse regression problems. We now re-derive the description of \(\text {cl conv}(X_f)\) using lifted supermodular inequalities.

For \(S \subseteq N\), we have

$$\begin{aligned}g_\alpha ({S})&=\min _{y\in \mathbb {R}^S}-\alpha 'y+y(S)^2.\end{aligned}$$

It is easy to see that \(g_\alpha (x_S)=-\infty \) unless \(\alpha _i=\alpha _j\) for all \(i\ne j\), see [7]. Therefore, letting \(\bar{\alpha }=\alpha _i\) for all \(i\in N\), we find that

$$\begin{aligned}g_{\bar{\alpha }}(x_S)&=\min _{y\in \mathbb {R}^S}-\bar{\alpha }y(S)+y(S)^2={\left\{ \begin{array}{ll}0 &{} \text {if }S=\emptyset \\ -\bar{\alpha }^2/4&{}\text {otherwise,}\end{array}\right. }\end{aligned}$$

where the optimal solution is found by setting \(y(S)=\bar{\alpha }/2\). The function \(g_\alpha \) is supermodular since \(\rho _{\bar{\alpha }}(i,\emptyset )=-\bar{\alpha }^2/4\) and \(\rho _{\bar{\alpha }}(i,S)=0\) for any \(S\ne \emptyset \).

Letting \(S=\{1\}\), inequality (6a) reduces to

figure b

Also letting \(S=\{1\}\), inequality (6b) reduces to

figure c

These two supermodular inequalities are indeed sufficient to describe \(\text {conv}(X_f)\) [7]. As we shall see in Sect. 4, incorporating the non-negativity constraints \(y \ge 0\), \(\text {conv}(X)\) is substantially more complex than \(\text {conv}(X_f)\). Nonetheless, as shown in Example 1, the resulting convexification is substantially stronger as well.

3.3 The rank-one case with a negative off-diagonal

Consider the special case of X with two continuous variables (\(N=\{1,2\}\)) with a negative off-diagonal:

$$\begin{aligned} X_-^2=\left\{ (x,y,t)\in \{0,1\}^2\times \mathbb {R}_+^2\times \mathbb {R}: (y_1-y_2)^2\le t,\; y_i(1-x_i)=0, \ i=1,2\right\} . \end{aligned}$$

Observe that any quadratic constraint of the form \(\left( c_1y_1-c_2y_2\right) ^2\le t\) with \(c_1, c_2>0\) can be written as in \(X_-^2\) by scaling the continuous variables.

For \(\alpha \in \mathbb {R}^2\), observe that if \(\alpha _1 + \alpha _2 > 0\),

$$\begin{aligned} g_\alpha ({\{1,2\}})=\min _{y \in \mathbb {R}^2_+} -\alpha _1y_1-\alpha _2y_2+(y_1-y_2)^2 \end{aligned}$$

is unbounded. Otherwise,

$$\begin{aligned} g_\alpha (\emptyset )&=0,\\ g_\alpha (\{1\})&=-\frac{\alpha _1^2}{4}\text { if }\alpha _1\ge 0 \text { and }g_\alpha (\{1\})=0\text { otherwise},\\ g_\alpha (\{2\})&=-\frac{\alpha _2^2}{4}\text { if }\alpha _2\ge 0 \text { and }g_\alpha (\{2\})=0\text { otherwise},\\ g_\alpha (\{1,2\})&={\left\{ \begin{array}{ll}-\frac{\alpha _1^2}{4}&{}\text {if }\alpha _1\ge 0 \\ -\frac{\alpha _2^2}{4}&{}\text {if }\alpha _2\ge 0 \\ 0 &{} \text {if }\alpha _1\le 0 \text { and }\alpha _2\le 0.\end{array}\right. } \end{aligned}$$

In particular, \(g_\alpha \) is supermodular (and in fact modular) for any fixed \(\alpha \) such that \(\alpha _1+\alpha _2{\le } 0\): for any \(i=1,2\) and \(S\subseteq N{\setminus } i\), \(\rho _\alpha (i,S)=-\frac{\max \{0,\alpha _i\}^2}{4}\). Letting \(S=\emptyset \), inequality (6a) reduces to

$$\begin{aligned} \max _{\alpha _1+\alpha _2\le 0}\;-\frac{\max \{0,\alpha _1\}^2}{4}x_1-\frac{\max \{0,\alpha _2\}^2}{4}x_2+\alpha _1y_1+\alpha _2y_2&\le t. \end{aligned}$$
(7)

An optimal solution of (7) can be found as follows. If \(y_1\ge y_2\), then set \(\alpha _1>0\) and \(\alpha _2=-\alpha _1 < 0\). Moreover, in this case, the optimal value is given by

$$\begin{aligned} \max -\frac{\alpha _1^2}{4}x_1+\alpha _1(y_1-y_2)=\frac{(y_1-y_2)^2}{x_1}. \end{aligned}$$

The case \(y_2\ge y_1\) is identical. The resulting piecewise valid inequality

$$\begin{aligned} t\ge {\left\{ \begin{array}{ll}\frac{(y_1-y_2)^2}{x_1} &{} \text {if }y_1\ge y_2\\ \frac{(y_1-y_2)^2}{x_2} &{} \text {if }y_2\ge y_1\end{array}\right. } \end{aligned}$$
(8)

along with the bound constraints \(0\le x\le 1\), \(0\le y\) describe \(\text {cl conv}(X_-^2)\) [6]. We point that a conic quadratic representation for \(\text {cl conv}(X_-^2)\) and generalizations to (not necessarily rank-one) quadratic functions with negative off-diagonals are given in [13].

3.4 Outlier detection with temporal data

In the context of outlier detection with temporal data, Gómez [30] studies the set

$$\begin{aligned}X_T=\Big \{(x,y,t)\in \{0,1\}^2\times \mathbb {R}^4\times \mathbb {R}:&\frac{a_1}{2}(y_3-y_1)^2+(y_3-y_4)^2+\frac{a_2}{2}(y_4-y_2)^2\le t,\\&y_1(1-x_1)=0,\; y_2(1-x_2)=0\Big \} \end{aligned}$$

where \(a_1,a_2>0\) are constants. While we refer the reader to [30] for details on the derivation of \(\text {cl conv}(X_T)\), we point out that it can in fact be described by lifted supermodular inequalities. Indeed, in this case, function \(g_\alpha \) is given by

$$\begin{aligned} g_\alpha (x)=K_1(\alpha )-K_2(\alpha )\max \{x_1,x_2\}, \end{aligned}$$

where \(K_1(\alpha )\) and \(K_2(\alpha )\) are constants that do not depend on x and \(K_2(\alpha )\ge 0\). Since \(\max \{x_1,x_2\}\) is a submodular function, it follows that \(g_\alpha \) is supermodular.

4 Convex hull via lifted supermodular inequalities

We now turn our attention to the rank-one sets X and \(X_+\). This section is devoted to showing that the lifted supermodular inequalities (6) are sufficient to describe \(\text {cl conv}(X)\) and \(\text {cl conv}(X_+)\). By Theorem 2, it suffices to derive an explicit form of the projection function \(g_\alpha \) and show that inequalities (3) describe the convex hull of its epigraph \(G_\alpha \). The rest of this section is organized as follows. In Sect. 4.1 we derive the set function \(g_\alpha \) defined in (4) for the rank-one quadratic function and then show that it is supermodular. In Sect. 4.2 we describe the convex hull of \(G_\alpha \) using only a small subset of the supermodular inequalities (3).

4.1 The set function \(g_\alpha \)

We present the derivation of set function \(g_\alpha \) for \(X_+\) and X separately, and then verify that \(g_\alpha \) is indeed supermodular.

4.1.1 Derivation for \(X_+\)

For \(X_+\),

$$\begin{aligned} h(x,y)={\left\{ \begin{array}{ll} y(N)^2 &{}\text {if }y\ge 0\text { and }y_i(1-x_i)=0, \ \forall i\in N\\ \infty &{}\text {otherwise.} \end{array}\right. } \end{aligned}$$

Therefore, for \(S \subseteq N\),

$$\begin{aligned} g_\alpha (x_S)&=\min _{y\in \mathbb {R}_+^S}-{\alpha _S}'y_S+y(S)^2. \end{aligned}$$
(9)

Note that (9) is bounded for all \(\alpha \in \mathbb {R}^S\), thus \(B=\mathbb {R}^N\). Since, for \(\alpha _i<0\), \(y_i=0\) in any optimal solution, we assume for simplicity that \(\alpha \ge 0\) and \(B=\mathbb {R}_+^N\). From the KKT conditions corresponding to variable \(y_k\ge 0\) in (9), we find that

$$\begin{aligned}&2y(S)\ge \alpha _k, \end{aligned}$$
(10)

and, by complementary slackness, (10) holds at equality whenever \(y_k>0\). Moreover, let \(j\in S\) such that \(\alpha _j=\max _\alpha (S)\); setting \(y_j=\alpha _j/2\) and \(y_i=0\) for \(i\in S{\setminus } j\), we find a feasible solution for (9) that satisfies all dual feasibility conditions (10) and complementary slackness, and therefore is optimal for the convex optimization problem (9). Thus, we conclude that

$$\begin{aligned} g_\alpha (x_S)=-\frac{\max _\alpha (S)^2}{4} \cdot \end{aligned}$$

4.1.2 Derivation for X

For the general case of X,

$$\begin{aligned} h(x,y)={\left\{ \begin{array}{ll} \big (y(N^+) - y(N^-) \big )^2 &{}\text {if }y\ge 0\text { and }y_i(1-x_i)=0, \ \forall i\in N^+ \cup N^-\\ \infty &{}\text {otherwise.} \end{array}\right. } \end{aligned}$$

Therefore, for \(S \subseteq N^+ \cup N^-\),

$$\begin{aligned} g_\alpha (x_S)&=\min _{y\in \mathbb {R}_+^S}-\alpha 'y+\Big (y(N^+\cap S)-y(N^-\cap S)\Big )^2. \end{aligned}$$
(11)

If \(S\cap N^-=\emptyset \) or \(S\cap N^+=\emptyset \), then we find from Sect. 4.1.1 that \(g_\alpha (x_S)= -\max _\alpha (S)^2/4\). Now let \(S^+:=S\cap N^+\) and \(S^-:=S\cap N^-\), and assume \(S^+\ne \emptyset \) and \(S^-\ne \emptyset \). We first state conditions under which (11) is bounded, and then we provide the explicit description of \(g_\alpha \).

Lemma 1

Problem (11) is bounded if and only if

$$\begin{aligned} \max _\alpha (S^+)\le -\max _\alpha (S^-). \end{aligned}$$
(12)

Proof

Let \(p = {{\,\mathrm{arg\,max}\,}}_{i \in S^+} \alpha _i\) and \(q = {{\,\mathrm{arg\,max}\,}}_{i \in S^-} \alpha _i\). If \(\alpha _p + \alpha _q > 0\) , then \(e_p + e_q\) is an unbounded direction. Otherwise,

$$\begin{aligned} -\alpha 'y+\Big (y(S^+)-y(S^-)\Big )^2&\ge -\alpha _p y(S^+) - \alpha _q y(S^-) + \Big (y(S^+)-y(S^-)\Big )^2 \\&\ge -\alpha _p (y(S^+)-y(S^-)) + \Big (y(S^+)-y(S^-)\Big )^2 \\&\ge -\alpha _p^2/4, \end{aligned}$$

where the second inequality follows from \(\alpha _p + \alpha _q \le 0\). \(\square \)

Note that we may equivalently rewrite (12) as \(\alpha _i+\alpha _j\le 0,\text { for all } i\in S^+,\; j\in S^-\), and in particular,

$$\begin{aligned} B=\left\{ \alpha \in \mathbb {R}^N: \alpha _i+\alpha _j\le 0 \ \text { for all } i\in N^+,j\in N^-\right\} . \end{aligned}$$

Proposition 4

Function \(g_\alpha \) is given by

$$\begin{aligned} g_\alpha (x_S)= {\left\{ \begin{array}{ll}0&{}\text {if }\alpha \le 0\\ -\max _\alpha (S^+)^2/4&{}\text {if }\alpha \not \le 0, (12)\,\text { and }\,\alpha _i\le 0\,\text { for all }\,i\in S^-\\ -\max _\alpha (S^-)^2/4&{}\text {if }\alpha \not \le 0, (12)\,\hbox { and }\,\alpha _i\le 0\,\text { for all }\,i\in S^+\\ -\infty &{}\text {otherwise.}\end{array}\right. } \end{aligned}$$

Proof

If \(\alpha \le 0\), then \(g_\alpha (x_S)\ge 0\) and the lower bound can be obtained by setting \(y=0\). We now assume \(y\not \le 0\). Note that for (12) to hold, if there exists \(j\in S^-\) such that \(\alpha _j\ge 0\), then \(\alpha _i\le 0\) for all \(i\in S^+\), and vice versa. Therefore, either \(\alpha _i\le 0\) for all \(i\in S^+\) or \(\alpha _j\le 0\) for all \(j\in S^-\).

First, assume that \(\alpha _j\le 0\) for all \(j\in S^-\). In this case, there exists an optimal solution of (11) where \(y(S^-)=0\) and (11) reduces to (9). Then, we may assume that \(\alpha _i\ge 0\) for all \(i\in S^+\) as in Sect. 4.1.1, and arrive at

$$\begin{aligned}g_\alpha (x_S)&=-\frac{\max _\alpha (S^+)^2}{4} \cdot \end{aligned}$$

By symmetry, if \(\alpha _i\le 0\) for all \(i\in S^+\), we may assume that \(\alpha _j\ge 0\) for all \(i\in S^-\) and

$$\begin{aligned}g_\alpha (x_S)&=-\frac{\max _\alpha (S^-)^2}{4} \cdot \end{aligned}$$

\(\square \)

Observe that if \(\alpha _i\le 0\) for all \(i\in S^-\) and there exists \(\alpha _j\in S^+\) such that \(\alpha _j<0\), then setting \(\alpha _j=0\) does not change the function \(g_\alpha \). Thus we can assume without loss of generality in optimization problem (6) that

$$\begin{aligned} B=\left\{ \alpha \in \mathbb {R}^N:\alpha _i\alpha _j\le 0 \text { and } \alpha _i+\alpha _j\le 0 \ \text { for all } i\in N^+,j\in N^-\right\} . \end{aligned}$$

It is convenient to partition B into two sets so that \(B=B^+\cup B^-\), where

$$\begin{aligned}B^+&=\left\{ \alpha \in \mathbb {R}^N:\alpha _i\ge 0 \ \forall i\in N^+,\alpha _j\le 0 \ \forall j\in N^-,\text { and } \alpha _i+\alpha _j\le 0 \ \forall i\in N^+,j\in N^-\right\} \\ B^-&=\left\{ \alpha \in \mathbb {R}^N:\alpha _i\le 0 \ \forall i\in N^+,\alpha _j\ge 0 \ \forall j\in N^-,\text { and } \alpha _i+\alpha _j\le 0 \ \forall i\in N^+,j\in N^-\right\} \end{aligned}$$

and analyze the inequalities separately for each set. Figure 2 depicts regions \(B^+\) and \(B^-\) for a two-dimensional case.

Fig. 2
figure 2

Depiction of \(B^+\) and \(B^-\) in a two-dimensional example with \(N^+=\{1\}\) and \(N^-=\{2\}\). The upper right shaded region (triangle) corresponds to the region where \(g_\alpha (x)=-\infty \); the lower left shaded region (square) corresponds to the region discarded, as equivalent solutions of (6) can be found in either \(B^+\) or \(B^-\)

Therefore, instead of studying inequalities (6) directly, one can equivalently study their relaxation where either \(\alpha \in B^+\) or \(\alpha \in B^-\); consequently, each inequality (6) corresponds to (the maximum of) two simpler inequalities. Since the sets \(B^+\) and \(B^-\) are symmetric, and inequalities (6) corresponding to \(\alpha \in B^-\) are simply inequalities where the role of \(N^+\) and \(N^-\) is interchanged (and \(\alpha \in B^+\)), the analysis and derivation of the inequalities is simplified. Therefore, in the sequel, we will derive the inequalities for \(\alpha \in B^+\) only and then state the inequalities corresponding to \(B^-\) by interchanging \(N^+\) and \(N^-\).

4.1.3 Supermodularity

For \(\alpha \in B^+\), the set function \(g_\alpha (x)\) for X is monotone non-increasing, also it is supermodular as \(\max _\alpha (S^+)\) is submodular. The case for \(\alpha \in B^-\) is analogous.

4.2 Convex hull of epi \(g_\alpha \)

In this section we show that a small subset of the supermodular inequalities (3a) are sufficient to describe the convex hull of the epigraph of the set function \(g_\alpha \), i.e.,

$$\begin{aligned} G_\alpha =\left\{ (x,t)\in \{0,1\}^N\times \mathbb {R}: -\frac{\max _{i\in N}\{\alpha _i^2x_i\}}{4}\le t\right\} , \end{aligned}$$
(13)

where \(\alpha \ge 0\) – observe that since x is binary, \(\left( \max _{i\in N}\{\alpha _ix_i\}\right) ^2=\max _{i\in N}\{\alpha _i^2x_i\}\).

Given nonempty \( S\subseteq N\), \(\ell \in {{\,\mathrm{arg\,max}\,}}_{i\in S}\{\alpha _i\}\), \(k\in {{\,\mathrm{arg\,max}\,}}_{i\in N{\setminus } \ell }\{\alpha _i\}\), and \(T=\left\{ i\in N{\setminus } S:\alpha _i>\alpha _\ell \right\} \); observe that \(T=\emptyset \) if and only if \(\alpha _\ell \ge \alpha _k\). Then, valid inequalities (3a) for \(G_\alpha \) reduce to

$$\begin{aligned} t\ge {\left\{ \begin{array}{ll}-\frac{\alpha _\ell ^2}{4}+\frac{\alpha _\ell ^2-\alpha _k^2}{4}(1-x_\ell )&{}\text {if }\quad \,\alpha _\ell \ge \alpha _k\\ -\frac{\alpha _\ell ^2}{4}-\sum \limits _{i\in T}\frac{\alpha _i^2-\alpha _\ell ^2}{4}x_i &{} \text {if }\quad \, \alpha _\ell \le \alpha _k. \end{array}\right. } \end{aligned}$$
(14)

If \(S=\emptyset \), then valid inequalities (3a) reduce to

$$\begin{aligned} t\ge -\sum \limits _{i\in N}\frac{\alpha _i^2}{4}x_i. \end{aligned}$$

Remark 2

Observe that if \(\alpha _\ell \ge \alpha _k\), then the inequality

$$\begin{aligned} t\ge -\frac{\alpha _\ell ^2}{4}+\frac{\alpha _\ell ^2-\alpha _k^2}{4}(1-x_\ell )=-\frac{\alpha _k^2}{4}-\frac{\alpha _\ell ^2-\alpha _k^2}{4}x_\ell \end{aligned}$$

can also be obtained by setting \(S=N{\setminus } \ell \) (or by choosing any \(S\subseteq N{\setminus } \ell \) such that \(k\in S\)). Therefore, when considering inequalities (14), we can assume without loss of generality that there exists \(k\in {{\,\mathrm{arg\,max}\,}}_{i\in N}\{\alpha _i\}\) such that \(k\not \in S\) and, thus, the case \(\alpha _\ell \ge \alpha _k\) can be ignored. \(\square \)

Remark 3

Suppose that the variables are indexed such that \(\alpha _1\le \cdots \le \alpha _n\), let \(\alpha _0=0\), and let \(\ell = \max _{i\in S}\{i \}\) if \(S\ne \emptyset \) and \(\ell =0\) otherwise. Observe that we can assume without loss of generality that \(i\in S\) for all \(i\le \ell \), since inequalities (14) are the same whether \(i\in S\) or not. Therefore, it follows that there are only n inequalities (14) given by

$$\begin{aligned} t \ge -\frac{\alpha _\ell ^2}{4}-\sum _{i=\ell +1}^n\frac{\alpha _i^2-\alpha _\ell ^2}{4}x_i, \quad \ell =0,\ldots ,n-1.\end{aligned}$$
(15)

\(\square \)

We now show that inequalities (14) characterize the convex hull of \(G_\alpha \).

Proposition 5

Inequalities (14) and bound constraints describe \(\text {conv}(G_\alpha )\).

Proof

Let \((x,t)\in [0,1]^N\times \mathbb {R}\). By definition, \((x,t)\in \text {conv}(G_\alpha )\) if and only

$$\begin{aligned} t\ge \min _{\lambda }\;&-\sum _{S\subseteq N}\frac{\max _\alpha (S)^2}{4}\lambda _S \end{aligned}$$
(16a)
$$\begin{aligned} \text {s.t.}\;&\sum _{S\subseteq N: i\in S}\lambda _S=x_i, \ \ i\in N \end{aligned}$$
(16b)
$$\begin{aligned}&\sum _{S\subseteq N}\lambda _S=1 \end{aligned}$$
(16c)
$$\begin{aligned}&\lambda _S\ge 0, \ \ S\subseteq N, \end{aligned}$$
(16d)

where constraints (16b) can be restated as \(x=\sum _{i\in S}\lambda _Sx_S\). From linear programming duality, we find the equivalent condition

$$\begin{aligned} t\ge \max _{\mu ,\gamma }\;&\sum _{i\in N}x_i\mu _i+\gamma \end{aligned}$$
(17a)
$$\begin{aligned} \text {s.t.}\;&\sum _{i\in S}\mu _i+\gamma \le -\frac{\max _\alpha (S)^2}{4}, \ \ S\subseteq N \end{aligned}$$
(17b)
$$\begin{aligned}&\mu \in \mathbb {R}^N,\; \gamma \in \mathbb {R}. \end{aligned}$$
(17c)

Any feasible solution \((\mu ,\gamma )\) of (17) yields a valid inequality for \(\text {conv}(G_\alpha )\). Moreover, characterizing the optimal solutions of (17) (for all \(x\in [0,1]^N\)) results in the convex hull description of \(G_\alpha \).

Suppose, without loss of generality, that \(\alpha _1\le \ldots \le \alpha _n\), let \(\alpha _0=0\), and let \(\ell \in \{0,\ldots ,n-1\}\) be the smallest index such that \(\sum _{i=\ell +1}^nx_i\le 1\); thus, if \(\ell >0\), then \(\sum _{i=\ell }^nx_i>1\). We claim that the dual solution given by \(\hat{\gamma }=-\frac{\alpha _\ell ^2}{4}\), \(\hat{\mu }_i=0\) for \(i\le \ell \) and \(\hat{\mu }_i=-\frac{\alpha _i^2-\alpha _\ell ^2}{4}\) for \(i>\ell \) is optimal for (17).

First, we verify that \((\hat{\mu },\hat{\gamma })\) is feasible for (17). Observe that for any \(S\subseteq \{1,\ldots ,\ell \}\), constraint (17b) reduces to \(-\frac{\alpha _\ell ^2}{4}\le -\frac{\max _\alpha (S)^2}{4}\), which is indeed satisfied. For any S such that the maximum element \(j>\ell \), we find that (17b) reduces to \(\sum _{i\in S:i\ne j}\hat{\mu }_i\le 0\); since \(\hat{\mu }\le 0\), the constraint is satisfied. For \(S=\emptyset \), constraint (17b) reduces to \(\gamma \le 0\), which is satisfied. To verify complementary slackness (later), note that constraints (17b) corresponding to sets (a) \(S = T \cup \{j\}\), where \(T \subseteq \{1, \ldots , \ell \}\) and \(j > \ell \) (i.e., containing exactly one element greater than \(\ell \)), and (b) \(S = T \cup \{\ell \}\), where \(T \subseteq \{1, \ldots , \ell -1\}\) (i.e., containing \(\ell \) but no greater element) are satisfied at equality.

Finally, for \((\hat{\mu },\hat{\gamma })\), the objective function (17a) is of the form (15):

$$\begin{aligned} t\ge -\frac{\alpha _\ell ^2}{4}-\sum _{i=\ell +1}^n\frac{\alpha _i^2-\alpha _\ell ^2}{4}x_i. \end{aligned}$$

To verify that \((\hat{\mu }, \hat{\gamma })\) is optimal for (17), we construct a primal solution \(\hat{\lambda }\) feasible for (16) satisfying complementary slackness. The greedy algorithm for constructing \(\hat{\lambda }\) is presented in Algorithm 1 and illustrated with an example in Fig. 3.

Fig. 3
figure 3

Algorithm 1 with \(x =(1, 0.2, 0.5, 0.6, 0.3)\) and \(\ell =3\)

figure d

We now check that constraint (16c) is satisfied. At the end of the algorithm, \(\sum _{S\subseteq N}\hat{\lambda }_S=\Lambda \) (since variable \(\Lambda \) is updated each time \(\hat{\lambda }\) is updated). Moreover, at the end of the first cycle (line 13) we have \(\Lambda =\sum _{i=\ell +1}^n x_i\). If \(\ell =0\), then \(\Lambda =1\) trivially (line 16); otherwise, at the end of the second cycle (line 22) and additional value of \(\hat{x}_\ell =1-\sum _{i=\ell +1}^nx_i\) (line 18) is added to \(\Lambda \). Hence, at the end of the algorithm

$$\begin{aligned} \Lambda =\sum _{S\subseteq N}\hat{\lambda }_S=\sum _{i=\ell +1}^n x_i+\left( 1-\sum _{i=\ell +1}^nx_i\right) =1. \end{aligned}$$

Next, we verify that constraints (16b) are satisfied. For \(i\in \{1,\ldots ,\ell -1\}\), at any point in the algorithm, we have that \(\sum _{S\subseteq N: i\in S}\lambda _S=x_i-\hat{x}_i\). Since, at any point, \(\hat{x}_i=\left( x_i-\Lambda \right) _+\) and \(\Lambda =1\) at the end of the algorithm, it follows that \(\sum _{S\subseteq N: i\in S}\lambda _S=x_i\). For \(i\in \{\ell +1,\ldots ,n\}\) we also have that \(\sum _{S\subseteq N: i\in S}\lambda _S=x_i-\hat{x}_i\), and \(\hat{x}_i=0\) at the end (line 13). Finally, for \(i=\ell >0\), we have that

$$\begin{aligned} \sum _{S\subseteq N: \ell \in S}\lambda _S= \left( x_\ell -\left( 1-\sum _{i=\ell +1}^n x_i\right) \right) +\left( 1-\sum _{i=\ell +1}^nx_i\right) =x_\ell . \end{aligned}$$

Finally, to check that \(\hat{\lambda }\) satisfies complementary slackness, it suffices to observe that all updates of \(\hat{\lambda }\) correspond to sets S such that exactly one element of S is greater than \(\ell \) (line 10), or to sets S with no element greater than \(\ell \) and where \(\ell \in S\) (line 20), where the corresponding dual constraints are satisfied at equality.

Therefore, we conclude that \(\hat{\lambda }\) and \((\hat{\mu },\hat{\gamma })\) are an optimal primal-dual pair. Since problem (17) admits for any \(x\in [0,1]\) an optimal solution of the form (15), it follows that those inequalities and bound constraints describe \(\text {conv}(G_\alpha )\)\(\square \)

Finally, we obtain the main result of this section: that the (nonlinear) lifted supermodular inequalities

$$\begin{aligned} t\ge & {} \max _{\alpha \in B^+}-\frac{\max _\alpha (S^+)^2}{4}-\sum \limits _{i\in N^+{\setminus } S^+}\frac{\left( \alpha _i^2-\max _\alpha (S^+)^2\right) _+}{4}x_i+\alpha 'y, \ \ \forall S^+\subseteq N^+ \end{aligned}$$
(18)
$$\begin{aligned} t\ge & {} \max _{\alpha \in B^-}-\frac{\max _\alpha (S^-)^2}{4}-\sum \limits _{i\in N^-{\setminus } S^-}\frac{\left( \alpha _i^2-\max _\alpha (S^-)^2\right) _+}{4}x_i+\alpha 'y, \ \ \forall S^-\subseteq N^- \end{aligned}$$
(19)

are sufficient to describe the closure of the convex hull of X.

Proposition 6

Lifted supermodular inequalities (18)–(19) and the bound constraints \(0\le x\le 1\), \(y \ge 0\) describe \(\text {cl conv}(X)\).

Proof

Follows immediately from Proposition 5 and Theorem 2. \(\square \)

Remark 4

We end this section with the remark that optimization of a linear function over X can be done easily using the projection function \(g_\alpha \). Consider

$$\begin{aligned} \min \big \{ - \alpha ' y + \beta ' x + t: (x,y,t) \in X \big \} \cdot \end{aligned}$$

Projecting out the continuous variables using \(g_\alpha \), the problem reduces to

$$\begin{aligned} \min _{x \in \{0,1\}^N} \beta ' x - \max _{i \in N} \{\alpha _i^2 x_i\}/4. \end{aligned}$$
(20)

Assume without loss of generality that \(\beta \ge 0\) (otherwise, set \(x_i=1\) whenever \(\beta _i<0\)). Then an optimal solution of (20) corresponds to either setting \(x=0\), or setting a single variable \(x_i=1\) where \(i\in {{\,\mathrm{arg\,max}\,}}_{i\in N}\beta _i {+ \alpha _i/4}\). Identifying such an index can be done in O(n). \(\square \)

5 Explicit form of the lifted supermodular inequalities

In this section we derive explicit forms of the lifted supermodular inequalities (18)–(19). In Sect. 5.1 we describe the inequalities in the original space of variables, and describe how to solve the separation problem. In Sect. 5.2 we provide conic quadratic representable inequalities in an extended space, which can then be implemented with off-the-shelf conic solvers.

5.1 Inequalities and separation in the original space of variables

5.1.1 Lifted inequalities for X

We first present the inequalities for the more general set X. Finding a closed form expression for the lifted supermodular inequalities (18) for all \(S^+ \subseteq N^+\) amounts to solving the maximum lifting problem

$$\begin{aligned} t\ge \max _{S^+\subseteq N^+, \alpha \in B^+}-\frac{\max _\alpha (S^+)^2}{4}-\sum \limits _{i\in N^+{\setminus } S^+}\frac{\left( \alpha _i^2-\max _\alpha (S^+)^2\right) _+}{4}x_i+\alpha 'y. \end{aligned}$$
(21)

We now give a closed form expression for (21). Let \(m=|N^+|\), and given \((\bar{x},\bar{y})\in [0,1]^N\times \mathbb {R}_+^N\), index variables in \(N^+\) so that \(\bar{y}_{(1)}/ \bar{x}_{(1)}\le \bar{y}_{(2)}/ \bar{x}_{(2)}\le \cdots \le \bar{y}_{(m)}/\bar{x}_{(m)}\).

Proposition 7

Given \(({\bar{x}},{\bar{y}},{\bar{t}})\in [0,1]^N\times \mathbb {R}_+^N{\times \mathbb {R}}\), if there exist indexes \(0\le \kappa _1<\kappa _2\le m+1\) such that the (possibly empty) sets \(L=\left\{ (i)\in N^+: i\le \kappa _1\right\} \) and \(U=\left\{ (i)\in N^+: i\ge \kappa _2\right\} \) satisfy

$$\begin{aligned}&1-{\bar{x}}(N^+{\setminus } L)\ge 0 \end{aligned}$$
(22a)
$$\begin{aligned}&\frac{{\bar{y}}(L)}{1-{\bar{x}}(N^+{\setminus } L)}< \frac{{\bar{y}_i}}{{\bar{x}_i}}, \quad i\in N^+{\setminus } L \end{aligned}$$
(22b)
$$\begin{aligned}&\frac{{\bar{y}}(L)}{1-{\bar{x}}(N^+{\setminus } L)}\ge \frac{{\bar{y}_i}}{{\bar{x}_i}}, \quad i\in L \end{aligned}$$
(22c)
$$\begin{aligned}&{\bar{y}}(U)-{\bar{y}}(N^-)\ge 0 \end{aligned}$$
(22d)
$$\begin{aligned}&\frac{{\bar{y}}(U)-{\bar{y}}(N^-)}{{\bar{x}}(U)}> \frac{{y_i}}{{x_i}}, \quad i\in N^+{\setminus } U \end{aligned}$$
(22e)
$$\begin{aligned}&\frac{{\bar{y}}(U)-{\bar{y}}(N^-)}{{\bar{x}}(U)}\le \frac{{\bar{y}_i}}{{\bar{x}_i}}, \quad i\in U \end{aligned}$$
(22f)
$$\begin{aligned}&\frac{{\bar{y}}(L)}{1-{\bar{x}}(N^+{\setminus } L)}< \frac{{\bar{y}}(U)-{\bar{y}}(N^-)}{{\bar{x}}(U)}, \end{aligned}$$
(22g)

then inequality (21) is satisfied if and only if

$$\begin{aligned} {\bar{t}}\ge \frac{{\bar{y}}(L)^2}{1-{\bar{x}}(N^+{\setminus } L)}+\sum \limits _{i\in N^+{\setminus } (L\cup U)}\frac{{\bar{y}_i}^2}{{\bar{x}_i}}+\frac{\big ({\bar{y}}(U)-{\bar{y}}(N^-)\big )^2}{{\bar{x}}(U)}; \end{aligned}$$
(23)

otherwise, inequality (21) is satisfied if and only if \({\bar{t}}\ge \big ({\bar{y}}(N^+)-{\bar{y}}(N^-)\big )^2\).

Below we state two remarks on Proposition 7, and then we prove the result.

Remark 5

Inequalities (23), when sets L and U are fixed, are neither valid for \(\text {cl conv}(X)\) nor convex for all \((x,y)\in [0,1]^N\times \mathbb {R}_+^N\). Indeed, if condition (22a) is not satisfied, then (23) may not be convex. Moreover, suppose that \(L=\{j\}\) and \(U=\{k\}\) for some \(j,k\in S^+\): note that setting \(x_i=y_i=0\) for all \(i\in N{\setminus }\{j,k\}\), \(x_j=x_k=1\), \(y_j,y_k>0\), and \(t=(y_j+y_k)^2\) is feasible for X, but this point is cut off by inequality (23) since \(\frac{y(L)^2}{1-x(N^+{\setminus } L)}=\frac{y_j^2}{1-x_k}=\infty \).

In fact, if \((x,y,t)\in \text {cl conv}(X)\), then (23) holds only when conditions (22a), (22b), (22d), (22e), and (22g) are satisfied. Conditions (22c) and (22f) do not affect the validity of (23) but if they are not satisfied then (23) is weak, i.e., a stronger inequality can be obtained from another choice of L and U. \(\square \)

Remark 6

If \({\bar{y}}(N^+)<{\bar{y}}(N^-)\), then condition (22d) in Proposition 7 cannot be satisfied. However, in this case, the role of \(N^+\) and \(N^-\) can be interchanged to satisfy (22d); interchanging \(N^+\) and \(N^-\) is equivalent to letting \(\alpha \in B^-\). \(\square \)

Proof of Proposition 7

Let us define auxiliary variables \(\beta ,\gamma \in \mathbb {R}\) as \(\beta =\max _\alpha (N^-)\) and \(\gamma =\max _\alpha (S^+)\), respectively. Then, inequality (21) reduces to

$$\begin{aligned} {\bar{t}}\ge \max _{S^+\subseteq N^+}\max _{\alpha ,\beta ,\gamma }&-\frac{\gamma ^2}{4}-\sum \limits _{i\in N^+{\setminus } S^+}\frac{\left( \alpha _i^2-\gamma ^2\right) _+}{4}{\bar{x}_i}+\alpha '{\bar{y}} \end{aligned}$$
(24a)
$$\begin{aligned} \text {s.t.}\;&\alpha _i\le \gamma , \ \ \ \ \ \ \forall i\in S^+ \end{aligned}$$
(24b)
$$\begin{aligned}&\alpha _i \le \beta , \ \ \ \ \ \forall i\in N^- \end{aligned}$$
(24c)
$$\begin{aligned}&\beta \le -\alpha _i, \ \ \forall i\in N^+ \end{aligned}$$
(24d)
$$\begin{aligned}&\alpha \in \mathbb {R}^N,\;\gamma \in \mathbb {R}_+,\beta \in \mathbb {R}_-, \end{aligned}$$
(24e)

where constraints (24b) and (24c) enforce the definitions of \(\gamma \) and \(\beta \), and constraints (24d) and (24e) enforce that \(\alpha \in B^+\).

First, observe that there exists an optimal solution of (24) with \(\gamma \le \alpha _i\) for all \(i\in N^+\): if \(\alpha _i<\gamma \) for some \(i\in N^+\), then setting \(\alpha _i=\gamma \) results in a feasible solution with improved objective value. Therefore, the value of \(S^+\) is completely determined by \(\gamma \) since \(S^+=\left\{ i\in N^+: \alpha _i\le \gamma \right\} \). Also note that \(\alpha _i=\beta \) for all \(i\in N^-\): if \(\alpha _i<\beta \) for some \(i\in N^-\), then setting \(\alpha _i=\beta \) results in an improved (or identical) objective value. We now consider two cases:

Case 1 Suppose in an optimal solution of (24) we have \(\gamma =-\beta \), which implies that \(\alpha _i=\gamma \) for all \(i\in N^+\) and \(\alpha _i=-\gamma \) for all \(i\in N^-\). In this case, (24) simplifies to \( {\bar{t}}\ge \max _{\gamma \in \mathbb {R}_+} \gamma \big ({\bar{y}}(N^+)-{\bar{y}}(N^-)\big )-\frac{\gamma ^2}{4}, \) which, after optimizing for \(\gamma \), further reduces to the original rank-one quadratic inequality \({\bar{t}}\ge \big ({\bar{y}}(N^+)-{\bar{y}}(N^-)\big )^2.\)

Case 2 Now suppose \(\gamma <-\beta \) in an optimal solution. Let \(L=\left\{ i\in N^+:\alpha _i=\gamma \right\} \) and \(U=\{i\in N^+:\alpha _i=-\beta \}\). Then, from the discussion above, (24) reduces to

$$\begin{aligned} t\ge \max _{\alpha ,\beta ,\gamma }\;&\gamma \cdot {\bar{y}}(L)-\frac{\gamma ^2}{4}\big (1-{\bar{x}}(N^+{\setminus } L)\big )+\sum \limits _{i\in N^+{\setminus } (L\cup U)}\left( \alpha _i{\bar{y}_i}-\frac{\alpha _i^2}{4}{\bar{x}_i}\right) \nonumber \\&-\beta \big ({\bar{y}}(U)-{\bar{y}}(N^-)\big )-\frac{\beta ^2}{4} {\bar{x}}(U) \end{aligned}$$
(25a)
$$\begin{aligned} \text {s.t.}\;&\gamma<\alpha _i< -\beta , \ \ \ \forall i\in N^+{\setminus } (L\cup U) \end{aligned}$$
(25b)
$$\begin{aligned}&\alpha \in \mathbb {R}^N,\;\gamma ,\beta \in \mathbb {R}_+. \end{aligned}$$
(25c)

Observe that for \((L,U,\gamma )\) to correspond to an optimal solution, we must have \(1-{\bar{x}}(N^+{\setminus } L)\ge 0\) (otherwise, \(\gamma \) can be increased to another \(\alpha _i\) while improving the objective value) and \({\bar{y}}(U)-{\bar{y}}(N^-)\ge 0\) (otherwise, \(-\beta \) can be decreased to another \(\alpha _i\) while improving the objective value). When both conditions are satisfied, from first-order conditions we see that \(\alpha _i=2{\bar{y}_i}/{\bar{x}_i}\) for \(i\in N^+{\setminus } (L\cup U)\), \(\gamma = 2 {\bar{y}}(L)/\big (1-{\bar{x}}(N^+{\setminus } L)\big )\) and \(\beta =-2\big ({\bar{y}}(U)-{\bar{y}}(N^-)\big )/{\bar{x}}(U)\), and (25) simplifies to (23). The constraints \(\gamma <\alpha _i\) are satisfied for all \(i\in N^+{\setminus } (L\cup U)\) if and only if (22b) hold, constraints \(\alpha _i\le -\beta \) are satisfied for all \(i\in N^+{\setminus } (L\cup U)\) if and only if (22e) hold, and constraint \(\alpha <-\beta \), which may not be implied if \(N^+{\setminus } (L\cup U)=\emptyset \), is satisfied if and only if (22g) holds.

Finally, we verify that first order conditions are satisfied for \(j\in L\), this is, setting \(\alpha _j>\gamma \) results in a worse solution. If condition (22c)

$$\begin{aligned} \frac{{\bar{y}}(L)}{1-{\bar{x}}(N^+{\setminus } L)}\ge \frac{{\bar{y}_i}}{{\bar{x}_i}}, \quad \forall i\in L \end{aligned}$$

does not hold for some \(j\in L\), then increasing \(\alpha _j\) from \(\gamma =2\frac{{\bar{y}}(L)}{1-{\bar{x}}(N^+{\setminus } L)}\) to \(2{\bar{y}_j}/{\bar{x}_j}\) improves the objective value. Similarly, we verify that first order conditions for \(j\in U\): if condition (22f)

$$\begin{aligned} \frac{{\bar{y}}(U)-{\bar{y}}(N^-)}{{\bar{x}}(U)}\le \frac{{\bar{y}_i}}{{\bar{x}_i}}, \quad \forall i\in U \end{aligned}$$

does not hold for some \(j\in U\), then \(\alpha _j\) can be decreased from \(\beta = \frac{{\bar{y}}(U)-{\bar{y}}(N^-)}{{\bar{x}}(U)}\) to improve the objective value.

Note that conditions (22b) and (22c) together imply that \(\bar{y}_i/\bar{x}_i< \bar{y}_j/\bar{x}_j\) whenever \(i\in L\) and \(j\not \in L\); in other words, if \(L\ne \emptyset \), then \(L=\left\{ (1),(2),\dots ,(\kappa _1)\right\} \) for some \(1\le \kappa _1\le m\). Similarly, from conditions (22e) and (22f), we conclude that either \(U=\emptyset \) or \(U=\left\{ (\kappa _2),(\kappa _2+1),\dots ,(m)\right\} \) for some \(1\le \kappa _2\le m\). \(\square \)

5.1.2 Lifted inequalities for \(X_+\)

We now present the inequalities for \(X_+\), which can be interpreted as a special cases of the inequalities for X given in Sect. 5.1.1. Recall that for set \(X_+\), the set B used in (6a) is simply \(B=\mathbb {R}^N\) (we can assume \(B=\mathbb {R}_+^N\) without loss of generality) and a closed form expression for (6a) requires solving the lifting problem

$$\begin{aligned} t\ge \max _{S\subseteq N}\max _{\alpha \in \mathbb {R}_+^N}-\frac{\max _\alpha (S)^2}{4}-\sum \limits _{i\in N{\setminus } S}\frac{\left( \alpha _i^2-\max _\alpha (S)^2\right) _+}{4}x_i+\alpha 'y. \end{aligned}$$
(26)

Note that in the proof of Proposition 7, set U corresponds to the set of variables in \(N^+\) where constraint \(\alpha _i\le -\max _{\alpha }(N^-)\) is tight in an optimal solution of (24). Intuitively, set \(X_+\) can be interpreted as a special case of X where \(N^+=N\) and \(N^-=\emptyset \), and such constraints can be dropped from the lifting problem. Therefore, we may assume \(U=\emptyset \) in Proposition 7. Proposition 8 formalizes this intuition; note however that it is slightly stronger as, unlike Proposition 7, it guarantees the existence of a set satisfying the conditions of the proposition. Similar to Proposition 7, index variables in N so that \(\bar{y}_{(1)}/ \bar{x}_{(1)}\le \bar{y}_{(2)}/ \bar{x}_{(2)}\le \cdots \le \bar{y}_{(n)}/\bar{x}_{(n)}\).

Proposition 8

Given \(({\bar{x}},{\bar{y}},{\bar{t}})\in [0,1]^N\times \mathbb {R}_+^N{\times \mathbb {R}}\), there exists an index \(0\le \kappa \le n\) such that the (possibly empty) set \(L=\left\{ (i)\in N: i\le \kappa \right\} \) satisfies

$$\begin{aligned}&1-{\bar{x}}(N{\setminus } L)\ge 0 \end{aligned}$$
(27a)
$$\begin{aligned}&\frac{{\bar{y}}(L)}{1-{\bar{x}}(N{\setminus } L)}< \frac{{\bar{y}_i}}{{\bar{x}_i}}, \quad i\in N{\setminus } L \end{aligned}$$
(27b)
$$\begin{aligned}&\frac{{\bar{y}}(L)}{1-{\bar{x}}(N{\setminus } L)}\ge \frac{{\bar{y}_i}}{{\bar{x}_i}}, \quad i\in L \end{aligned}$$
(27c)

and inequality (26) is satisfied if and only if

$$\begin{aligned} t\ge \frac{{\bar{y}}(L)^2}{1-{\bar{x}}(N{\setminus } L)}+\sum \limits _{i\in N{\setminus } L}\frac{{\bar{y}_i}^2}{{\bar{x}_i}}. \end{aligned}$$
(28)

The proof of Proposition 8 is given in “Appendix A”.

Example 2

(cont) Consider \(X_+\) with \(n=3\), and assume \(x_2=0.6\), \(x_3=0.3\), \(y_2=0.5\) and \(y_3=0.2\). Note that \(y_2/x_2\approx 0.83>0.67\approx y_3/x_3\). We now compute the minimum values t such \((x,y,t)\in \text {cl conv}(X_+)\), for different values of \((x_1,y_1)\).

  • Let \((x_1,y_1)=(0.01,1)\) and \(y_1/x_1=100\). Then \(L=\emptyset \) satisfies all conditions (22): \(x(N)=0.91<1\), conditions (27b) are trivially satisfied since \(y(\emptyset )=0\), and conditions (27c) are void. In this case, we find that \((x,y,t)\in \text {cl conv}(X_+)\) iff \(t\ge 1^2/0.01+0.5^2/0.6+0.2^2/0.3\approx 100.55\). In contrast, \((x,y,t)\in \text {cl conv}(X_f)\) iff \(t\ge \left( 0.01+0.5+0.2\right) ^2/0.91\approx 3.18\).

  • Let \((x_1,y_1)=(0.1,0.5)\) and \(y_1/x_1=5\). Then \(L=\{3\}\) satisfies all conditions (22): \(x_1+x_2=0.7<1\), \(0.2/0.3\approx 0.67<y_2/x_2\) and \(0.2/0.3\approx 0.67= y_3/x_3\). In this case, \((x,y,t)\in \text {cl conv}(X_+)\) iff \(t\ge 0.2^2/0.3+0.5^2/0.1+0.5^2/0.6\approx 3.05\). In contrast, \((x,y,t)\in \text {cl conv}(X_f)\) iff \(t\ge \left( 0.5+0.5+0.2\right) ^2/1= 1.44\).

  • Let \((x_1,y_1)=(0.4,0.1)\) and \(y_1/x_1=0.25\). Then \(L=\{1,3\}\) satisfies all conditions (22): \(x_2=0.6<1\), \((0.1+0.2)/0.4=0.75<y_2/x_2\) and \((0.1+0.2)/0.4=0.75\ge y_3/x_3\). In this case, \((x,y,t)\in \text {cl conv}(X_+)\) iff \(t\ge (0.1+0.2)^2/0.4+0.5^2/0.6\approx 0.642\). In contrast, \((x,y,t)\in \text {cl conv}(X_f)\) iff \(t\ge \left( 0.1+0.5+0.2\right) ^2= 0.640\).

  • Let \((x_1,y_1)=(0.5,0.2)\) and \(y_1/x_1=0.4\). Then \(L=\{1,2,3\}\) satisfies all conditions (22): (27a) is trivially satisfied, (27b) is void and \((0.2+0.5+0.2)/1=0.9\ge y_2/x_2\). In this case, \((x,y,t)\in \text {cl conv}(X_+)\) iff \(t\ge (0.2+0.5+0.2)^2= 0.81\), which coincides with \(\text {cl conv}(X_f)\) and the natural inequality \(t\ge y(N)^2\).

Figure 1 plots the minimum values of t as a function of \((x_1,y_1)\) for \(\text {cl conv}(X_f)\) and \(\text {cl conv}(X_+)\). \(\square \)

5.1.3 Separation

We now consider the separation problem for inequalities (21) and (26), i.e., given a point \((\bar{x},\bar{y}, {\bar{t}})\in [0,1]^N\times \mathbb {R}_+^N{\times \mathbb {R}}\), finding sets \(L,U\subseteq N^+\) satisfying the conditions in Proposition 7 or finding \(L\subseteq N\) satisfying the conditions in Proposition 8, respectively.

Separation for (21) First, as pointed out in Remark 6, we verify whether \(\bar{y}(N^+)\ge \bar{y}(N^-)\) or \(\bar{y}(N^+)<\bar{y}(N^-)\); in the first case, we use directly the conditions in Proposition 7, and in the second one, we interchange the roles of \(N^+\) and \(N^-\) so that \(\bar{y}(N^+)\ge \bar{y}(N^-)\). Next, indexing the variables so that \({\bar{y}_{(1)}/ \bar{x}_{(1)}\le \cdots \le \bar{y}_{(m)}/\bar{x}_{(m)}}\), where \(m=|N^+|\), can be done in \(O(m\log m)\) by sorting. Finally, one can simply enumerate all \(m(m-1)/2\) possible values of \({(\kappa _1,\kappa _2)}\) and verify whether conditions (22) are satisfied for each candidate set L and U. Hence, the separation algorithm runs in \(O(n^2)\) time.

Separation for (26) First, indexing the variables so that \({\bar{y}_{(1)}/ \bar{x}_{(1)}\le \cdots \le \bar{y}_{(n)}/\bar{x}_{(n)}}\) can be accomplished in \(O(n\log n)\) time by sorting. Then, one can simply enumerate all n possible values of \({\kappa }\) and verify whether conditions (27) are satisfied for each candidate set L. Since the sorting step dominates the complexity, the separation algorithm runs in \(O(n\log n)\).

5.2 Conic quadratic valid inequalities in an extended formulation

Inequalities (23) and (28) given in the original space of variables are valid only over restricted parts of the domain. They are neither valid nor convex over the entire domain of the variables, e.g., (23) is not convex whenever \(x(N^+{\setminus } L)\ge 1\). Thus, such inequalities are difficult to utilize directly by the optimization solvers. In order to address this challenge, in this section, we give valid conic quadratic reformulations in an extended space, which can be readily used by conic quadratic solvers.

For a partitioning (LRU) of \(N^+\) consider the inequality

$$\begin{aligned} t\ge \min _{\lambda ,\mu ,\zeta }\;&\frac{\Big (y(L)-\lambda _0\Big )^2}{1-x(R)-x(U)+\mu (R)+\mu _0}+\sum _{i\in R}\frac{(y_i-\lambda _i)^2}{x_i-\mu _i}+\frac{\Big (y(U)-y(N^-)+\lambda _0+\lambda (R)+\zeta \Big )^2}{x(U)-\mu _0} \end{aligned}$$
(29a)
$$\begin{aligned} \text {s.t.}\;&1-x(R)-x(U)+\mu (R)+\mu _0\ge 0 \end{aligned}$$
(29b)
$$\begin{aligned}&\mu _i\le x_i, \quad i\in R \end{aligned}$$
(29c)
$$\begin{aligned}&\mu _0\le x(U) \end{aligned}$$
(29d)
$$\begin{aligned}&\lambda ,\mu \in \mathbb {R}_+^{R},\; \lambda _0,\mu _0,\zeta \in \mathbb {R}_+. \end{aligned}$$
(29e)

Note that each inequality (29) requires O(n) additional variables and constraints. Moreover, although not explicitly enforced, it is easy to verify that there exists an optimal solution to (29) with \(\lambda _i\le y_i\) and \(\lambda _0\le y(L)\). Inequalities (29) are convex as they involve linear constraints and sums of ratios of convex quadratic terms and nonnegative linear terms, thus conic quadratic representable [3, 38]. We show, in Proposition 9, that inequalities (29) imply the strong formulations described in Proposition 7, and, in Proposition 10, that they are valid for X.

Proposition 9

If conditions (22a), (22b), (22d), (22e) and (22g) are satisfied, then \(\lambda =\mu = 0\) and \(\lambda _0=\mu _0=\zeta =0\) in an optimal solution of (29).

Proof

Observe that \(\zeta \) does not appear in any constraint of (29). Thus, since \(y(U)-y(N^-)\ge 0\) and \(\lambda ,\lambda _0\ge 0\), it follows that \(\zeta =0\) in an optimal solution. Moreover, since (22a) is satisfied, then setting \(\mu =0\) is feasible for (29). Finally, find that KKT conditions are satisfied for \(\lambda ,\mu =0\) and \(\lambda _0=\mu _0=0\) if

figure e
figure f
figure g

The KKT condition above for \(\lambda _0\) is precisely (22g). Since \(x(R)+x(U)\le 1\) by (22a), and \(y(U)-y(N^-)\ge 0\) by (22d), the KKT condition for \(\mu _0\) is equivalent to \(\frac{y(L)}{1-x(R)-x(U)}+\frac{y(U)-y(N^-)}{x(U)}\ge 0\), and thus reduces to (22g). The KKT conditions for \(\lambda _i\) are satisfied since (22e) holds. Finally, the KKT conditions for \(\mu _i\) can be equivalently stated as \(\frac{y(L)}{1-x(R)-x(U)}\le \frac{y_i}{x_i}\) (since \(x(R)+x(U)\le 1\) and \(x,y \ge 0\)), which are satisfied since (22b) holds. \(\square \)

Note that when \(\lambda =\mu =0\) and \(\lambda _0=\mu _0=\zeta =0\), inequality (29) reduces to (23). Thus, if sets LU satisfy the conditions of Proposition 7 for a given (xy), then there exists \(t\in \mathbb {R}\) such that \((x,y,t)\in \text {conv}(X)\) and (29) holds at equality. It remains to prove that inequalities (29) do not cut-off any points in X for any choice of partition (LRU).

Proposition 10

For any partitioning (LRU) of \(N^+\), inequalities (29) are valid for X.

Proof

It suffices to show that for any \((x,y)\in X\), i.e., \(x_i\in \{0,1\}\) and \(x_i(1-y_i)=0\) for all \(i\in N\), there exists \((\lambda ,\mu ,\lambda _0,\mu _0, \zeta )\) satisfying (29b)–(29e) such that inequality (29a) is valid. We prove the result by cases.

Case 1 \(y(N^+)< y(N^-)\): In this case, we can set \(\lambda _i=y_i\) and \(\mu _i=x_i\) for \(i\in R\), \(\lambda _0=y(L)\), \(\mu _0=x(U)\), \(\zeta =y(N^-)-y(U)-y(L)-y(R)\), and inequality (29a) reduces to \(t\ge 0\), which is valid.

Case 2 \(y(N^+)\ge y(N^-)\), \(x(R)=0\) and \(x(U)=0\): In this case, \(y_i=0\), \(i\in R\cup U\). Setting \(\mu _i=\lambda _i=0\) for \(i\in R\), \(\lambda _0=y(N^-)\), \(\mu _0=0\) and \(\zeta =0\), we find that inequality (29a) reduces to \(t\ge \big (y(L)-y(N^-)\big )^2=\big (y(N^+)-y(N^-)\big )^2\), which is valid.

Case 3 \(y(N^+)\ge y(N^-)\) and \(x(U)\ge 1\): Setting \(\lambda _i=y_i\) and \(\mu _i=x_i\) for \(i\in R\), \(\lambda _0=y(L)\), \(\mu _0=x(U)-1\), and \(\zeta =0\), inequality (29a) reduces to \(t\ge \big (y(N^+)-y(N^-)\big )^2\), which is valid.

Case 4 \(y(N^+)\ge y(N^-)\), \(x(U)= 0\), \(x(R)\ge 1\), \(y(N^-)<y_i\) for all \(i\in R\) and \(y(N^-)<y(L)\): In this case, \(y_i=0\), for all \(i\in U\) and \(x_i=1\), for all \(i\in R\), we can set \(\mu _0=0\), and inequality (29) reduces to

$$\begin{aligned} t\ge \min _{\lambda ,\mu }\;&\frac{\Big (y(L)-\lambda _0\Big )^2}{1-|R|+\mu (R)}+\sum _{i\in R}\frac{(y_i-\lambda _i)^2}{1-\mu _i} \end{aligned}$$
(30a)
$$\begin{aligned} \text {s.t.}\;&1-|R|+\mu (R)\ge 0 \end{aligned}$$
(30b)
$$\begin{aligned}&\mu _i\le 1 \quad \forall i\in R \end{aligned}$$
(30c)
$$\begin{aligned}&-y(N^-)+\lambda _0+\lambda (R)+\zeta =0 \end{aligned}$$
(30d)
$$\begin{aligned}&\lambda ,\mu \in \mathbb {R}_+^{R},\; \lambda _0,\zeta \in \mathbb {R}_+. \end{aligned}$$
(30e)

Constraint (30d) is obtained since the denominator of the third term in (29a) is zero, thus constraining the numerator to vanish as well. Moreover, since variable \(\zeta \ge 0\) only appears in (30d), after projecting \(\zeta \) out we find that constraint (30d) reduces to

$$\begin{aligned} \lambda _0+\lambda (R)\le y(N^-). \end{aligned}$$
(31)

Note that constraint (31), and assumptions \(y(N^-)<y_i\) for all \(i\in R\) and \(y(N^-)<y(L)\), imply that \(\lambda _i\le y_i\) and \(\lambda _0\le y(L)\). Observe that we can set

$$\begin{aligned} \mu _i= 1-\frac{y_i-\lambda _i}{y(L)+y(R)-\lambda (R)-\lambda _0}\quad \forall i\in R. \end{aligned}$$

Indeed, for any feasible \(\lambda \), \(y(L)+y(R)-\lambda (\bar{R})-\lambda _0\ge y(L)+y(R)-y(N^-)\ge 0\); thus \(\mu _i\le 1\). Moreover,

$$\begin{aligned} \frac{y_i-\lambda _i}{y(L)+y(R)-\lambda (R)-\lambda _0}\le \frac{y_i-\lambda _i}{y(L)+y(R{\setminus } i)+y_i-\lambda _i}\le 1 \end{aligned}$$

thus \(\mu _i\ge 0\). For this choice of \(\mu \), we find that

$$\begin{aligned} 1-|R|+\mu (R)=\frac{y(L)-\lambda _0}{y(L)+y(R)-\lambda (R)-\lambda _0}\ge 0. \end{aligned}$$

Finally, substituting \(1-|R|+\mu (R)\) and \(\mu _i\) in (30a) with their respective values, (30a) reduces to

$$\begin{aligned}t\ge&\min _{\lambda }\;\Big (y(L)-\lambda _0\Big )\Big (y(L)+y(R)-\lambda (R)-\lambda _0\Big )\\&+\Big (y(L)+y(R)-\lambda (R)-\lambda _0\Big )\sum _{i\in R}(y_i-\lambda _i)\\ \Leftrightarrow t\ge&\min _{\lambda }\;\Big (y(L)+y(R)-\lambda (R)-\lambda _0\Big )^2=\Big (y(L)+y(R)-y(N^-)\Big )^2, \end{aligned}$$

and since \(y(L)+y(R)=y(N^+)\), this inequality is valid.

Case 5 \(y(N^+)\ge y(N^-)\), \(x(U)= 0\), \(x(R)\ge 1\), \(y(N^-)< y(L)\) but \(y(N^-)\ge y_j\) for some \(j\in R\): In this case, \(y_i=0\) for all \(i\in U\), and we set \(\mu _0=0\). Note that, in (29), we can set \(\lambda _j=y_j\) and \(\mu _j=x_j\), resulting in the inequality

$$\begin{aligned}t\ge \min _{\lambda ,\mu ,\zeta }\;&\frac{\Big (y(L)-\lambda _0\Big )^2}{1-x(R{\setminus } j)-x(U)+\mu (R{\setminus } j)}+\sum _{i\in R{\setminus } j}\frac{(y_i-\lambda _i)^2}{x_i-\mu _i}\\&+\frac{\Big (y(U)-y(N^-)+y_j+\lambda _0+\lambda (R{\setminus } j)+\zeta \Big )^2}{x(U)}\\ \text {s.t.}\;&1-x(R{\setminus } j)-x(U)+\mu (R{\setminus } j)\ge 0\\&\mu _i\le x_i \quad \quad \forall i\in R{\setminus } j\\&\lambda ,\mu \in \mathbb {R}_+^{R{\setminus } j},\; \lambda _0,\zeta \in \mathbb {R}_+. \end{aligned}$$

This inequality of the same form as (29) but with \(\hat{R}=R{\setminus } j\) and \(\hat{y}(N^-)=y(N^-) - y_j\). After repeating sequentially this process so that \(\lambda _i=y_i\) and \(\mu _i=x_i\) for some subset \(T \subseteq R\), such that \(y(N^-)-y(T)\le y_i\) for all \(i\in R{\setminus } T\), and applying a similar strategy as in Case 4, we obtain an inequality of the form

$$\begin{aligned} t\ge \Big (y(L)+y(R{\setminus } T)-\big (y(N^-)- y(T)\big )\Big )^2=\Big (y(N^+)-y(N^-)\Big )^2, \end{aligned}$$

which is valid.

Case 6 \(y(N^+)\ge y(N^-)\), \(x(U)= 0\), \(x(R)\ge 1\), and \(y(N^-)\ge y(L)\): In this case, we can set \(\lambda _0=y(L)\), \(\mu _0=0\), and (29) reduces to

$$\begin{aligned}t\ge \min _{\lambda ,\mu }\;&\sum _{i\in R}\frac{(y_i-\lambda _i)^2}{x_i-\mu _i}\\ \text {s.t.}\;&1-x(R)+\mu (R)\ge 0\\&\mu _i\le x_i \quad \quad \forall i\in R\\&\lambda (R)\le y(N^-)-y(L)\\&\lambda ,\mu \in \mathbb {R}_+^{R}. \end{aligned}$$

Moreover, if \(y(N^-)-y(L)\ge y_j\) for some \(j\in R\), then we can set \(\lambda _j=y_j\), \(\mu _j=y_j\) as done in Case 5. After repeating this process, we obtain an inequality of the form

$$\begin{aligned} t\ge \min _{\lambda ,\mu }\;&\sum _{i\in R{\setminus } T}\frac{(y_i-\lambda _i)^2}{x_i-\mu _i} \end{aligned}$$
(32a)
$$\begin{aligned} \text {s.t.}\;&1-x(R{\setminus } T)+\mu (R{\setminus } T)\ge 0 \end{aligned}$$
(32b)
$$\begin{aligned}&\mu _i\le x_i \quad \forall i\in R{\setminus } T \end{aligned}$$
(32c)
$$\begin{aligned}&\lambda (R{\setminus } T)\le y(N^-)-y(L)-y(T) \end{aligned}$$
(32d)
$$\begin{aligned}&\lambda ,\mu \in \mathbb {R}_+^{R{\setminus } T}, \end{aligned}$$
(32e)

where \(y(N^-)-y(L)-y(T)<y_i\) for all \(i\in R{\setminus } T\), and therefore \(x_i=1\) for all \(i\in R{\setminus } T\).

Note that constraint (32d) and \(y(N^-)-y(L)-y(T)<y_i\) imply that \(\lambda _i< y_i\) in any feasible solution. Then, for all \(i \in R{\setminus } T\), we can set

$$\begin{aligned} \mu _i=x_i-\frac{y_i-\lambda _i}{y(R{\setminus } T)-\lambda (R{\setminus } T)}. \end{aligned}$$

Clearly, \(\mu _i\le x_i\). Moreover, for all \(i \in R{\setminus } T\),

$$\begin{aligned} \frac{y_i-\lambda _i}{y(R{\setminus } T)-\lambda (R{\setminus } T)}\le \frac{y_i-\lambda _i}{y(R{\setminus } (T \cup i))+y_i-\lambda _i}\le 1=x_i, \end{aligned}$$

thus \(\mu _i\ge 0\). Finally,

$$\begin{aligned} 1-x(R{\setminus } \Lambda )+\mu (R{\setminus } T)=1-\frac{y(R{\setminus } T)-\lambda (R{\setminus } \lambda )}{y(R{\setminus } T)-\lambda (R{\setminus } T)}=0, \end{aligned}$$

and constraint (32b) is satisfied. Substituting \(x_i-\lambda _i\), \( i \in R {\setminus } T\), with their explicit form in (32a), we find the equivalent form

$$\begin{aligned} t\ge \min _{\lambda }\;\Big (y(R{\setminus } T)-\lambda (R{\setminus } T)\Big )\sum _{i\in R{\setminus } T}(y_i-\lambda _i)=&\min _{\lambda }\;\Big (y(R{\setminus } T)-\lambda (R{\setminus } T)\Big )^2\\ =&\Big (y(N^+)-y(N^-)\Big )^2, \end{aligned}$$

which is valid. \(\square \)

To derive the corresponding lifted inequalities for \(B^-\), it suffices to interchange \(N^+\) and \(N^-\). Therefore, for a partitioning (LRU) of \(N^-\), we find the conic quadratic inequalities:

$$\begin{aligned} t\ge \min _{\lambda ,\mu ,\zeta }\;&\frac{\left( y(L) -\lambda _0\right) ^2}{1- x(R) + x(U) + \mu (R) + \mu _0}+ \sum \limits _{i\in R}\frac{(y_i-\lambda _i)^2}{x_i-\mu _i}+\frac{\left( y(U)-y(N^+) +\lambda _0+\lambda (R)+\zeta \right) ^2}{x(U)-\mu _0} \end{aligned}$$
(33a)
$$\begin{aligned} \text {s.t.}\;&1- x(R) -x(U)+ \mu (R)+\mu _0\ge 0 \end{aligned}$$
(33b)
$$\begin{aligned}&\mu _i\le x_i, \quad i\in R \end{aligned}$$
(33c)
$$\begin{aligned}&\mu _0\le x(U) \end{aligned}$$
(33d)
$$\begin{aligned}&\lambda ,\mu \in \mathbb {R}_+^{R},\; \lambda _0,\mu _0,\zeta \in \mathbb {R}_+. \end{aligned}$$
(33e)

One of the main results of the paper, that is, an explicit description of \(\text {cl conv}(X)\) via a finite number of conic quadratic inequalities, is stated below.

Theorem 3

\(\text {cl conv}(X)\) is given by bound constraints \(0 \le x\le 1\), \(y \ge 0\), and inequalities (29) and (33).

For the positive case of \(X_+\) with \(N^- =\emptyset \), for a partitioning (LR) of N, inequalities (29) reduce to

$$\begin{aligned} t\ge \min _{\mu }\;&\frac{y(L)^2}{1-x(R)+\mu (R)}+\sum _{i\in R}\frac{y_i^2}{x_i-\mu _i} \end{aligned}$$
(34a)
$$\begin{aligned} \text {s.t.}\;&1-x(R)+\mu (R)\ge 0 \end{aligned}$$
(34b)
$$\begin{aligned}&\mu _i\le x_i, \quad \quad i\in R \end{aligned}$$
(34c)
$$\begin{aligned}&\mu \in \mathbb {R}_+^{R}. \end{aligned}$$
(34d)

Note that each inequality (34) also requires O(n) additional variables and constraints but is significantly simpler compared to (29).

Theorem 4

\(\text {cl conv}(X_+)\) is given by bound constraints \(0 \le x \le 1\), \(y \ge 0\), and inequalities (34).

6 Computational experiments

In this section, we test the computational effectiveness of the conic quadratic inequalities given in Sect. 5.2 in solving convex quadratic minimization problems with indicators. In particular, we solve portfolio optimization problems with fixed-charges. All experiments are run with CPLEX 12.8 solver on a laptop with a 1.80GHz Intel®CoreTM i7 CPU and 16 GB main memory on a single thread. We use CPLEX default settings but turn on the numerical emphasis parameter, unless stated otherwise. The data for the instances and problem formulations in .lp format can be found online at https://sites.google.com/usc.edu/gomez/data.

6.1 Instances

We consider optimization problems of the form

$$\begin{aligned} \min _{y,x}\;&y'(FF')y+\sum _{i=1}^n(d_iy_i)^2 \end{aligned}$$
(35a)
$$\begin{aligned} \text {s.t.}\;&e'y=1 \end{aligned}$$
(35b)
$$\begin{aligned}&b'y-a'x\ge \beta \end{aligned}$$
(35c)
$$\begin{aligned}&y_i\le x_i, \ \ i\in N \end{aligned}$$
(35d)
$$\begin{aligned}&x\in \{0,1\}^N, y\in \mathbb {R}_+^N \end{aligned}$$
(35e)

where \(F\in \mathbb {R}_+^{n\times r}\) with \(r<n\), \(a,b,d\in \mathbb {R}_+^N\). We test two classes of instances, general and positive, where either F has both positive and negative entries, or F has only non-negative entries, respectively. Note that constraints (35d) are in fact a big-M reformulation of complementarity constraint \(y_i(1-x_i)=0\): indeed, constraint (35b) and \(y\ge 0\) imply the upper bound \(y\le 1\). The parameters are generated as follows—we use the notation \(Y\sim U[\ell ,u]\) as “Y is generated from a continuous uniform distribution between \(\ell \) and u":

F:

Let \(\rho \) be a positive weight parameter. Matrix \(F=EG\) where \(E\in \mathbb {R}_+^{n\times r}\) is an exposure matrix such that \(E_{ij}=0\) with probability 0.8 and \(E_{ij}\sim U[0,1]\) otherwise, and \(G\in \mathbb {R}_+^{r\times r}\) such that: \(G_{ij}\sim U[\rho ,1]\). If \(\rho \ge 0\), then matrix F is guaranteed to be positive, and we refer to such instances as positive. Otherwise, for \(\rho <0\), we refer to the instances as general.

d:

Let \(\delta \) be a diagonal dominance parameter. Define \(v=(1/n)\sum _{i=1}^n (FF')_{ii}\) to be the average diagonal element of \(FF'\); then \(d_i^2\sim U[0,v\delta ]\).

b:

We generate entries \(b_i\sim U[0.25,0.75]\times \sqrt{(FF')_{ii}+d_i^2}\). Note that if the terms \(b_i\) and \(((FF')_{ii}+d_i^2)\) are interpreted as the expectation and variance of a random variable, then expectations are approximately proportional to the standard deviations. This relation aims to avoid trivial instances, where one term dominates the other.

a:

Let \(\omega \) be a fixed cost parameter and \(a_i={\omega }(e'b)/n\), \(i\in N\), where e is an n-dimensional vector of ones.

It is well-documented in the literature that for matrices with large diagonal dominance the perspective reformulation achieves close to \(100\%\) gap improvement. Therefore, we choose a low diagonal dominance \(\delta =0.01\) to generate instances hard for the perspective reformulation. In our computations, unless stated otherwise, we use \(n=200\) and \(\beta =(e'b)/n\).

6.2 Methods

We test the following methods:

  • Basic : Problem (35) formulated as

    $$\begin{aligned} \min \;&\Vert q\Vert _2^2+\sum _{i=1}^n(d_iy_i)^2 \end{aligned}$$
    (36a)
    $$\begin{aligned} \text {s.t.}\;&q=F'y \end{aligned}$$
    (36b)
    $$\begin{aligned}&(35b)-(35d) \end{aligned}$$
    (36c)
    $$\begin{aligned}&x\in \{0,1\}^n,\; y\in \mathbb {R}_+^n,\; q\in \mathbb {R}^r. \end{aligned}$$
    (36d)
  • Perspective : Problem (35) formulated as

    $$\begin{aligned} \min \;&\Vert q\Vert _2^2+\sum _{i=1}^nd_i^2p_i \end{aligned}$$
    (37a)
    $$\begin{aligned} \text {s.t.}\;&q=F'y \end{aligned}$$
    (37b)
    $$\begin{aligned}&y_i^2\le p_ix_i,{} & {} i=1,\ldots ,n \end{aligned}$$
    (37c)
    $$\begin{aligned}&(35b)-(35d) \end{aligned}$$
    (37d)
    $$\begin{aligned}&x\in \{0,1\}^n,\; y\in \mathbb {R}_+^n,\;{} & {} p\in \mathbb {R}_+^n,\; q\in \mathbb {R}^r. \end{aligned}$$
    (37e)
  • Supermodular : Problem (35) formulated as

    $$\begin{aligned} \min \;&\sum _{j=1}^r t_j+\sum _{i=1}^nd_i^2p_i \end{aligned}$$
    (38a)
    $$\begin{aligned} \text {s.t.}\;&\left( F_j'y\right) ^2\le t_j,{} & {} j=1,\ldots ,r \end{aligned}$$
    (38b)
    $$\begin{aligned}&y_i^2\le p_ix_i,{} & {} i=1,\ldots ,n \end{aligned}$$
    (38c)
    $$\begin{aligned}&(35b)-(35d) \end{aligned}$$
    (38d)
    $$\begin{aligned}&x\in \{0,1\}^n,\; y\in \mathbb {R}_+^n,\;{p\in \mathbb {R}_+^n},\;{} & {} t\in \mathbb {R}_+^r, \end{aligned}$$
    (38e)

    where \(F_j\) denotes the j-th column of F. Additionally, lifted supermodular inequalities (29) are added to strengthen the relaxations. Note that the convex relaxation of (38) without any additional inequalities is equivalent to the convex relaxation of (37).

Cuts (29) (for general instances) or (34) (for positive instances) for method Supermodular are added as follows:

  1. (1)

    We solve the convex relaxation of (38) to obtain a solution \((\bar{x}, \bar{y}, \bar{t})\). By default, the convex relaxation is solved with an interior point method.

  2. (2)

    We find a most violated inequality (29) or (34) for each constraint (38b) using the separation algorithm given in Sect. 5.1.3. Denote by \(\bar{\nu }_j\) the rhs value of (23) or (28) if sets L and U satisfying (22) exist; otherwise, let \(\bar{\nu }=-\infty \).

  3. (3)

    Let \(\epsilon =10^{-3}\) be a precision parameter. Inequalities found in step (2) are added if either \(\bar{t}_j<\epsilon \) and \((\bar{\nu }_j-\bar{t}_j)>\epsilon \); or \(\bar{t}_j\ge \epsilon \) and \((\bar{\nu }_j-\bar{t}_j)/\bar{t}_j>\epsilon \). At most r inequalities are added per iteration, one for each constraint (38b).

  4. (4)

    This process is repeated until either no inequality is added in step (3) or max number of cuts (3r) is reached.

We point out that convexification based on \(X_f\) [7], described in Sect. 3.2, is not effective with formulation (38) since \(t_j\ge (F_j'y)^2/\min \{1,e'x\}\) reduces to \(t_j\ge (F_j'y)^2\) due to (35b) and (35d).

6.3 Results

Tables 1, 2, 3 and 4 present the results for \(\rho =\{-1,-0.5,-0.2,0\}\). They show, for different ranks r and values of the fixed cost parameter \({\omega }\), the optimal objective value (opt) and, for each method, the optimal objective value for the convex relaxation (val), the integrality gap (gap) computed as \(\texttt {gap}=\frac{\texttt {opt}-\texttt {val}}{\texttt {opt}}\times 100\), the improvement (imp) of Supermodular over Perspective computed as

$$\begin{aligned} \texttt {imp}=\frac{\texttt {gap}_{\texttt {Persp.}}-\texttt {gap}_\texttt {Supermod.}}{\texttt {gap}_{\texttt {Persp.}}}, \end{aligned}$$

the time required to solve the relaxation in seconds (time) and the number of cuts added (cuts). The optimal solutions are computed using CPLEX branch-and-bound method using the Perspective formulation. The values opt and val are scaled so that, in a given instance, \(\texttt {opt}=100\). Each row corresponds to the average of five instances generated with the same parameters.

Table 1 Computational results for general instances, \(\rho =-1\)
Table 2 Computational results for general instances, \(\rho =-0.5\)
Table 3 Computational results for general instances, \(\rho =-0.2\)
Table 4 Computational results for positive instances, \(\rho =0\)

First note that Perspective achieves only a very modest improvement over Basic due to the low diagonal dominance parameter \(\delta =0.01\). We also point out that instances with smaller positive weight \(\rho \) have weaker natural convex relaxations, i.e., Basic has larger gaps—a similar phenomenon was observed in [26].

The relative performance of all methods in rank-one instances, \(r=1\), is virtually identical regardless of the value of the positive weight parameter \(\rho \). In particular Supermodular substantially improves upon Basic and Perspective : it achieves \(0\%\) gaps in instances with \({\omega }\le 10\), and reduces to gap from 35 to 6% in instances with \({\omega }=50\).

In instances with \(r\ge 5\), the relative performance of Supermodular depends on the positive weight parameter \(\rho \): for larger values of \(\rho \), more cuts are added and Supermodular results in higher quality formulations. For example, in instances with \(r=5\), \({\omega }=50\), the improvements achieved by Supermodular are 40.3% (\(\rho =-1\)), 53.2% (\(\rho =-0.5\)), 62.0% (\(\rho =-0.2\)) and 72.7% (\(\rho =0\)). Similar behavior can be observed for other combinations of parameters with \(r\ge 5\).

Our interpretation of the dependence of \(\rho \) in the strength of the formulation is as follows. For instances with small values of \(\rho \), it is possible to reduce the systematic risk of the portfolio \(y'(FF')y\) close to zero due to negative correlations, i.e., achieve “perfect hedge" although it may be unrealistic in practice. In such instances, the idiosynctratic risk \(\sum _{i=1}^n (d_iy_i)^2\) and constraints (35b)–(35d), which limit diversification, are the most important components behind the portfolio variance. In contrast, as \(\rho \) increases, it is increasingly difficult to reduce the systematic risk (and altogether impossible for \(\rho \ge 0\)). Thus, in such instances, the systematic risk \(y'(FF')y\) accounts for the majority of the variance of the portfolio. Thus, the lifted supermodular inequalities, which exploit the structure induced by the systematic risk, are particularly effective in the later class of instances.

Figure 4 depicts the integrality gap of different formulations as a function of rank for instances with \(\rho =0\). We see that Supermodular achieves large (\(> 70\%\)) improvement over Perspective especially in the challenging low-rank settings. The improvement is significant (44%) also for high-rank settings with \(r=35\).

Fig. 4
figure 4

Integrality gap vs matrix rank (\({\omega }=50\), \(\delta =0.01\), \(\rho =0\))

Finally, to evaluate the computational burden associated with the formulations, we plot in Fig. 5 the time in seconds (in a logarithmic scale) require the solve the convex relaxations of each method for different dimensions n. Each point in Fig. 5 corresponds to an average of 15 portfolio optimization instances generated with parameters \(r=10\), \(\delta =0.01\) and \({\omega }\in \{2,10,50\}\) (5 instances for each value of \({\omega }\)). The time for Supermodular includes the total time used to generate cuts and solving the convex relaxations many times.

Fig. 5
figure 5

Solution time vs problem dimension (\(r=10\), \(\delta =0.01\))

We see that, in general, formulation Basic is an order-of-magnitude faster than Perspective, which in turn is an order-of-magnitude faster than Supermodular. Nonetheless, the computation times for Supermodular are adequate for many applications, solving instances with \(n=1000\) on average under four seconds.

Contrary to expectations, Supermodular is faster for general instances than for positive instances, despite the larger and more complex inequalities (29) used for the general case; for \(n=1000\), Supermodular runs in 1.9 s in general instances versus 3.8 s in positive instances. This counter-intuitive behavior is explained by the number of cuts added, as several more violated cuts are found in instances with large values of \(\rho \), leading to larger convex formulations and the need to resolve them more times; for \(n=1000\), 20 cuts are added in each instance with \(\rho =0\), whereas on average only 3.7 cuts are added in instances with \(\rho =-1\).

The computation times are especially promising for tackling large-scale quadratic optimization problems with indicators, where alternatives to constructing strong convex relaxations (often based on decomposition of matrix \(FF'+D\) into lower-dimensional terms) may not scale. For example, Frangioni et al. [26] solve convex relaxations of instances up to \(n=50\), Han et al. [33] solve relaxations for instances up to \(n=150\), and Atamtürk and Gómez [6] report that solving the convex relaxation of quadratic instances with \(n=200\) requires up to 1000 s. All of these methods require adding \(O(n^2)\) variables and constraints to the formulations to achieve strengthening. In contrast, the supermodular inequalities (29) and (34) yield formulations with O(nr) additional variables and constraints, which can be solved efficiently even if n is large provided that the rank r is sufficiently small: in our computations, instances with \(r=10\) and \(n=1000\) are solved in under 4 s. Nonetheless, as discussed in the next section, even if the convex relaxations can be solved easily, incorporating the proposed convexification in branch-and-bound methods may required tailored implementations, not supported by current off-the-shelf branch-and-bound solvers.

6.4 On the performance with off-the-shelf branch-and-bound solvers

We also experimented with solving the formulations Supermodular obtained after adding cuts with CPLEX branch-and-bound algorithm. However, note that inequalities (29) and, to a lesser degree, inequalities (34), involve several ratios that can result in division by 0—from the proof of Proposition 10, we see that this in fact the case in many scenarios. Therefore, while we did not observe any particular numerical difficulties when solving the convex relaxations (via interior point methods), in a small subset of the instances we observed that the branch-and-bound method (based on linear outer approximations) resulted in numerical issues leading to incorrect solutions.

Table 5 reports the results on the two instances that exhibit such pathological behavior. It shows, for each instance and method and different CPLEX settings, the bounds on the optimal solution obtained reported by CPLEX when solving the convex relaxation via interior point methods (barrier, corresponding to a lower bound), and lower and upper bounds reported by running the branch-and-bound algorithm for one hour. We do not scale the solutions obtained in Table 5. The tested settings are default CPLEX (def), default CPLEX with numerical emphasis enabled (+num), and CPLEX with numerical emphasis enabled and presolve and CPLEX cuts disabled (+num-pc).

Table 5 Examples of pathological behavior in branch-and-bound

In the first instance shown in Table 5, when using Supermodular with the default CPLEX settings, the solution reported is worse than the optimal solution by 30%. By enabling the numerical emphasis option, the solution improves but is still 10% worse than the solution reported by Perspective. Nonetheless, if presolve and CPLEX cuts are disabled, then both solutions coincide. The second instance shown in Table 5 exhibits the opposite behavior: when used with the default settings, independently of the numerical emphasis, the solutions obtained by Perspective and Supermodular coincide; however, if presolve and CPLEX cuts are disabled, then the lower bound obtained after one hour of branch-and-bound with the Supermodular method already precludes finding the correct solution. We point out that pathological behavior of conic quadratic branch-and-bound solvers have been observed in the past for other nonlinear mixed-integer problems with a large number of variables, see for example [6, 13, 26, 29].

7 Conclusions

In this paper we describe the convex hull of the epigraph of a rank-one quadratic functions with indicator variables. In order to do so, we first describe the convex hull of a underlying supermodular set function in a lower-dimensional space, and then maximally lift the resulting facets into nonlinear inequalities in the original space of variables. The approach is broadly applicable, as most of the existing results concerning convexifications of convex quadratic functions with indicator variables can be obtained in this way, as well as several well-known classes of facet-defining inequalities for mixed-integer linear problems.