1 Introduction

A standard quadratic program is an optimization problem in which a (nonconvex) homogeneous quadratic function, also known as a quadratic form, is minimized over the unit simplex. An instance of a standard quadratic program is given by

$$\begin{aligned} \text {(StQP)} \quad \nu (Q) := \min \limits _{x \in {\varDelta }_n} x^T Q x, \end{aligned}$$

where \(Q \in {{\mathcal {S}}}^n\) and \({{\mathcal {S}}}^n\) denotes the set of \(n \times n\) real symmetric matrices, \(x \in {\mathbb {R}}^n\), and \({\varDelta }_n\) denotes the unit simplex in \({\mathbb {R}}^n\) given by

$$\begin{aligned} {\varDelta }_n := \left\{ x \in {\mathbb {R}}^n_+: e^T x= 1\right\} , \end{aligned}$$
(1)

where \(e \in {\mathbb {R}}^n\) is the vector of all ones and \({\mathbb {R}}^n_+\) denotes the nonnegative orthant in \({\mathbb {R}}^n\).

We remark that having a quadratic form in the objective function is not restrictive since the problem of minimizing a nonhomogeneous quadratic function over the unit simplex can be reformulated in the form of (StQP) using the following identity:

$$\begin{aligned} x^T Q x + 2 c^T x = x^T (Q + e c^T + c e^T) x, \quad \text {for each}~ x \in {\varDelta }_n. \end{aligned}$$

Standard quadratic programs arise in a variety of applications ranging from the classical portfolio optimization problem [26] to population genetics [24]; from quadratic resource allocation [21] to selection replicator dynamics and evolutionary game theory [5]. For a given matrix \(Q \in {{\mathcal {S}}}^n\), Q is copositive if and only if \(\nu (Q) \ge 0\). Therefore, standard quadratic programs can be used to check if a matrix is copositive. We refer the reader to the paper [3] for other applications, in which the term standard quadratic program was coined. The maximum stable set problem in graph theory [27] and its weighted version [17] can be formulated as instances of (StQP), which implies that (StQP) is, in general, NP-hard.

There is an extensive amount of literature on standard quadratic programs. In this paper, we are concerned with computing a global solution of (StQP). We therefore restrict our literature review to the exact solution approaches. All of these approaches, in general, are based on a branch-and-bound scheme and differ only in terms of the subroutines used for computing upper and lower bounds, and subdividing the feasible region. For instance, a DC (difference of two convex functions) programming approach is employed in [4] to compute a lower bound and a local optimization method is used to find an upper bound. Using a relation between global solutions of (StQP) and the set of cliques in an associated graph, referred to as the convexity graph (see Sect. 2.3), a branch-and-bound method based on an implicit enumeration of cliques is proposed in [29]. More recently, another branch-and-bound method was proposed in [25], in which both convex envelope estimators and polyhedral underestimators are employed for computing lower bounds, and an implicit enumeration of the KKT points is utilized using the relation with the set of cliques in the convexity graph. In another recent paper [10], a set of cutting planes is proposed in the context of a spatial branch-and-bound scheme.

Standard quadratic programs can also be solved by finite branch-and-bound methods proposed for solving more general nonconvex quadratic programming problems (see, e.g., [12, 13]). These approaches are based on an implicit enumeration of the complementarity constraints in the KKT conditions. The resulting subproblems are approximated by semidefinite relaxations or by polyhedral semidefinite relaxations. By a simple manipulation of the KKT conditions, a general quadratic program can be formulated as a linear program with complementarity constraints (LPCC) (see, e.g., [20] and the references therein) and the resulting LPCC can be solved by an enumerative scheme such as branch-and-bound. An LPCC can also be formulated as a mixed integer linear programming (MILP) problem and can be solved using Benders decomposition [20] or by branch-and-cut [32]. A similar MILP formulation is proposed in [31] under the assumption of a bounded feasible region. Alternatively, using the completely positive reformulation of (StQP) (see, e.g., [7]), adaptive inner and outer polyhedral approximations of completely positive programs can be employed [11]. Clearly, one can also use general purpose nonlinear programming solvers such as BARON [30] and Couenne [2].

In this paper, we propose globally solving a standard quadratic program by reformulating it as a mixed integer linear programming (MILP) problem. We choose MILP reformulations due to the existence of powerful state-of-the-art MILP solvers such as CPLEX [22] and Gurobi [18]. We propose two different MILP reformulations. Our first formulation is based on casting (StQP) as a linear program with complementarity constraints and linearizing the complementarity constraints by using binary variables and big-M constraints. We discuss how to obtain valid bounds for the big-M parameters by exploiting the structure of (StQP). The second formulation is obtained by replacing the quadratic objective function in (StQP) by an overestimating function given by the maximum of a finite number of linear functions associated with the positive components of a feasible solution, referred to as the support. We show that the overestimating function is exact at all KKT points of (StQP), which leads to the second MILP formulation by introducing binary variables for modeling the support of a feasible solution. We further show that our second MILP formulation is, in fact, an exact relaxation of the first one. Furthermore, using the relation between the support of a global minimizer of (StQP) and the set of cliques in the convexity graph, we propose a set of valid inequalities for both of our MILP formulations. We conduct extensive computational experiments to assess the performances of our MILP formulations in comparison with several other global solution approaches. The computational results indicate that the proposed MILP formulations consistently outperform other global solution approaches. Furthermore, especially on larger instances, we observe improvements of several orders of magnitude.

Our work is related to the previous work on reformulations of a quadratic program as an instance of a linear program with complementarity constraints (LPCC) [20, 31]. For a general quadratic program, the paper [20] proposes a two-stage LPCC approach. In the first stage, an LPCC is solved to determine if the quadratic program is bounded below, in which case a second LPCC is formulated to compute a global solution. The resulting complementarity problems are formulated as MILP problems and solved using a parameter-free approach via Benders decomposition, which eliminates the need for big-M parameters [19] (see [32] for a branch-and-cut approach). In contrast, the paper [31] explicitly uses big-M parameters. By using a Hoffman type error bound, the authors show that there exists a valid upper bound for the big-M parameters under the assumption of a bounded feasible region. They give a closed form expression of this bound for (StQP). Our first MILP formulation, which is based on a similar approach as in these previous MILP formulations, also employs big-M parameters as in [31]. In contrast with their approach, we exploit the specific structure of (StQP) in an attempt to obtain much tighter bounds for big-M parameters. Furthermore, our second MILP formulation is based on specifically taking advantage of the particular structure of (StQP). Therefore, in contrast with the previous approaches in the literature, we propose stronger MILP formulations for a more specific class of quadratic programs.

This paper is organized as follows. In Sect. 1.1, we briefly review our notation. Section 2 discusses several useful properties of standard quadratic programs. We present our MILP formulations as well as a set of valid inequalities in Sect. 3. Section 4 is devoted to the results of our computational experiments. We conclude the paper in Sect. 5.

1.1 Notation

We use \({\mathbb {R}}^n, {\mathbb {R}}^n_+\), and \({{\mathcal {S}}}^n\) to denote the n-dimensional Euclidean space, the nonnegative orthant, and the space of \(n \times n\) real symmetric matrices, respectively. For \(u \in {\mathbb {R}}^n\), we denote its jth component by \(u_j,~j = 1,\ldots ,n\). Similarly, \(U_{ij}\) denotes the (ij) entry of a matrix \(U \in {{\mathcal {S}}}^n,~i = 1,\ldots ,n;~j = 1,\ldots ,n\). We denote the unit simplex in \({\mathbb {R}}^n\) by \({\varDelta }_n\). For any \(U \in {{\mathcal {S}}}^n\) and \(V \in {{\mathcal {S}}}^n\), the trace inner product of U and V is given by \(\langle U, V \rangle := \sum \limits _{i=1}^n \sum \limits _{j = 1}^n U_{ij} V_{ij}\). The unit vectors in \({\mathbb {R}}^n\) are denoted by \(e_j,~j = 1,\ldots ,n\). We reserve \(e \in {\mathbb {R}}^n\) and \(E = e e^T \in {{\mathcal {S}}}^n\) for the vector of all ones and the matrix of all ones, respectively. We use 0 to denote the real number zero, the vector of all zeroes as well as the matrix of all zeroes in the appropriate dimension, which will always be clear from the context. We use \(\text {conv}(\cdot )\) to denote the convex hull. We define the following convex cones in \({{\mathcal {S}}}^n\):

$$\begin{aligned} {{\mathcal {N}}}^n= & {} \left\{ M \in {{\mathcal {S}}}^n: M_{ij} \ge 0, \quad i = 1,\ldots ,n;~j = 1,\ldots ,n\right\} , \end{aligned}$$
(2)
$$\begin{aligned} {{\mathcal {S}}}^n_+= & {} \left\{ M \in {{\mathcal {S}}}^n: u^T M u \ge 0, \quad \forall ~u \in {\mathbb {R}}^n \right\} , \end{aligned}$$
(3)
$$\begin{aligned} \mathcal{COP}^n= & {} \left\{ M \in {{\mathcal {S}}}^n: u^T M u \ge 0, \quad \forall ~u \in {\mathbb {R}}^n_+ \right\} , \end{aligned}$$
(4)
$$\begin{aligned} \mathcal{CP}^n= & {} \text {conv}\left\{ u u^T: u \in {\mathbb {R}}^n_+\right\} , \end{aligned}$$
(5)
$$\begin{aligned} \mathcal{DNN}^n= & {} {{\mathcal {S}}}^n_+ \cap {{\mathcal {N}}}^n, \end{aligned}$$
(6)

namely, the cone of component-wise nonnegative matrices, the cone of positive semidefinite matrices, the cone of copositive matrices, the cone of completely positive matrices, and the cone of doubly nonnegative matrices, respectively. The following relations easily follow from these definitions:

$$\begin{aligned} \mathcal{CP}^n \subseteq \mathcal{DNN}^n \subseteq \mathcal{COP}^n. \end{aligned}$$
(7)

2 Preliminaries

In this section, we review several basic properties of standard quadratic programs that will be useful in the subsequent sections. We remark that these results can be found in the literature (see, e.g., [3, 8]). We include proofs of some of these results for the sake of completeness.

2.1 Optimality conditions

Since a standard quadratic program has linear constraints, constraint qualification is satisfied at every feasible solution. Given an instance of (StQP), if \(x \in {\varDelta }_n\) is an optimal solution, then there exist \(s \in {\mathbb {R}}^n\) and \(\lambda \in {\mathbb {R}}\) such that the following KKT conditions are satisfied:

$$\begin{aligned} Q x - \lambda e - s= & {} 0, \end{aligned}$$
(8)
$$\begin{aligned} e^T x= & {} 1, \end{aligned}$$
(9)
$$\begin{aligned} x\in & {} {\mathbb {R}}^n_+, \end{aligned}$$
(10)
$$\begin{aligned} s\in & {} {\mathbb {R}}^n_+, \end{aligned}$$
(11)
$$\begin{aligned} x_j s_j= & {} 0, \quad j = 1,\ldots ,n. \end{aligned}$$
(12)

We remark that the Lagrange multiplers are scaled by a factor of 1/2 in (8).

For an instance of (StQP), \(x \in {\varDelta }_n\) is said to be a KKT point if there exist \(s \in {\mathbb {R}}^n\) and \(\lambda \in {\mathbb {R}}\) such that conditions (8)–(12) are satisfied. For any KKT point \(x \in {\varDelta }_n\) of (StQP), it follows from (8), (9), (10), and (12) that (see also [16])

$$\begin{aligned} \lambda = x^T Q x \ge \nu (Q), \end{aligned}$$
(13)

where the inequality is exact if and only if x is a global minimizer of (StQP).

2.2 Properties of \(\nu (Q)\)

The following lemma presents several useful properties about the optimal value function \(\nu (\cdot )\).

Lemma 1

For any \(Q \in {{\mathcal {S}}}^n\), \(Q_1 \in {{\mathcal {S}}}^n\), \(Q_2 \in {{\mathcal {S}}}^n\), and \(\gamma \in {\mathbb {R}}\), the following relations are satisfied:

  1. (i)

    \(\nu (Q + \gamma ee^T) = \nu (Q) + \gamma \).

  2. (ii)

    If \(Q_1 - Q_2 \in {{\mathcal {N}}}^n\), then \(\nu (Q_1) \ge \nu (Q_2)\).

  3. (iii)

    If Q is a diagonal matrix with strictly positive diagonal entries \(Q_{11},\ldots ,Q_{nn}\), then

    $$\begin{aligned} \nu (Q) = \frac{1}{\sum \limits _{k=1}^n (1/Q_{kk})}. \end{aligned}$$
  4. (iv)

    Let \(\gamma _0 = \min \limits _{1\le i \le j \le n} Q_{ij}\) and \(\gamma _1 = \min \limits _{k=1,\ldots ,n} Q_{kk}\). Then, \(\gamma _0 \le \nu (Q) \le \gamma _1\).

Proof

  1. (i)

    For any \(Q \in {{\mathcal {S}}}^n\) and any \(\gamma \in {\mathbb {R}}\), we have

    $$\begin{aligned} \nu (Q + \gamma e e^T) = \min \{x^T (Q + \gamma e e^T) x: x \in {\varDelta }_n\} = \gamma + \min \{x^T Q x: x \in {\varDelta }_n\} = \nu (Q) + \gamma . \end{aligned}$$
  2. (ii)

    Let \(Q_1 - Q_2 \in {{\mathcal {N}}}^n\). Then, for any \(x \in {\varDelta }_n\), we have \(x^T (Q_1 - Q_2) x \ge 0\), which implies that

    $$\begin{aligned} x^T Q_1 x \ge x^T Q_2 x, \quad \forall ~x \in {\varDelta }_n, \end{aligned}$$

    from which the assertion follows.

  3. (iii)

    If Q is a diagonal matrix with strictly positive diagonal entries \(Q_{11},\ldots ,Q_{nn}\), then \(Q \in \mathcal {N}^n\), which implies that \(\nu (Q) \ge 0\). For any KKT point \(x \in {\varDelta }_n\), one obtains

    $$\begin{aligned} Q_{jj} x_j - \lambda - s_j = 0, \quad j = 1,\ldots ,n. \end{aligned}$$

    First, it is easy to see that \(s_j = 0\) for each \(j = 1,\ldots ,n\), since if \(s_k > 0\) for some \(k \in \{1,\ldots ,n\}\), then we would have \(x_k = 0\) by (12), which would imply that \(\lambda = x^T Q x = -s_k < 0\) by (13), which contradicts \(\nu (Q) \ge 0\). Therefore, we obtain \(x_j = \lambda /Q_{jj}\) for each \(j = 1,\ldots ,n\). Combining this with \(e^T x = 1\) yields that the unique KKT point satisfies

    $$\begin{aligned} x_k = \frac{1/Q_{kk}}{\sum \limits _{k=1}^n (1/Q_{kk})}, \quad k = 1,\ldots ,n. \end{aligned}$$

    Substituting this solution in the objective function yields the result.

  4. (iv)

    Let \(\gamma _0 = \min \limits _{1\le i \le j \le n} Q_{ij}\). Then, \(Q - \gamma _0 ee^T \in {{\mathcal {N}}}^n\), which implies that \(0 \le \nu (Q - \gamma _0 ee^T) = \nu (Q) - \gamma _0\), where we used (i). Therefore, \(\gamma _0 \le \nu (Q)\). Furthermore, since \(e_k \in {\varDelta }_n\) for each \(k = 1,\ldots ,n\), we have \(\nu (Q) \le \min \limits _{k=1,\ldots ,n} e_k^T Q e_k = \min \limits _{k=1,\ldots ,n} Q_{kk} = \gamma _1\).

\(\square \)

2.3 Properties of an optimal solution

In this section, we present a useful relation between an optimal solution of a standard quadratic program and a related graph.

Given an instance of (StQP) with \(Q \in {{\mathcal {S}}}^n\), we can associate with it an undirected graph \(G = (V,E)\), called the convexity graph of Q, where \(V = \{1,2,\ldots ,n\}\) with node j corresponding to the vertex of the unit simplex \(e_j,~j = 1,\ldots ,n\). There is an edge between node i and node j if the restriction of the quadratic form \(x^T Q x\) to the edge of the unit simplex between the vertices \(e_i\) and \(e_j\) is strictly convex, i.e.,

$$\begin{aligned} E = \{(i,j): Q_{ii} + Q_ {jj} - 2 Q_{ij} > 0, \quad 1 \le i < j \le n\}. \end{aligned}$$

For \(x \in {\varDelta }_n\), we introduce the following index sets:

$$\begin{aligned} {{\mathcal {P}}}(x)= & {} \left\{ j \in \{1,\ldots ,n\}: x_j > 0 \right\} , \end{aligned}$$
(14)
$$\begin{aligned} {{\mathcal {Z}}}(x)= & {} \left\{ j \in \{1,\ldots ,n\}: x_j = 0 \right\} . \end{aligned}$$
(15)

The indices in \({{\mathcal {P}}}(x)\) are referred to as the support of x. For a given undirected graph \(G = (V,E)\), a set \(C \subseteq V\) of nodes is called a clique if each pair of nodes is connected by an edge. Similarly, a set \(S \subseteq V\) of nodes is called a stable set if no two nodes in S are connected by an edge. The following theorem [29] establishes a useful connection between the support of a global solution of (StQP) and the convexity graph \(G = (V,E)\).

Theorem 1

(Scozzari and Tardella, 2008) Given an instance of (StQP), let \(G = (V,E)\) denote the convexity graph of Q. Then, there exists a globally optimal solution \(x^* \in {\varDelta }_n\) of (StQP) such that the nodes corresponding to the indices in \({{\mathcal {P}}}(x^*)\) (i.e., the support of \(x^*\)) in G form a clique (or, equivalently, a stable set in the complement of G).

2.4 Lower bounds on \(\nu (Q)\)

In this section, given an instance of (StQP), we review two lower bounds on \(\nu (Q)\).

2.4.1 A simple lower bound

We start with a simple lower bound on \(\nu (Q)\). By Lemma 1(iv),

$$\begin{aligned} \nu (Q) \ge \gamma _0 = \min _{1\le i \le j \le n} Q_{ij}, \end{aligned}$$

with equality if there exists \(k \in \{1,\ldots ,n\}\) such that \(\gamma _0 = \gamma _1 = Q_{kk}\).

This lower bound can be slightly sharpened if the minimum entry of Q is not on the main diagonal, i.e. if \(\gamma _0 < \gamma _1\). In this case, we have \(Q - \gamma _0 e e^T \in {{\mathcal {N}}}^n\) with strictly positive diagonal elements, which can be decomposed as \(Q - \gamma _0 e e^T = D + F\), where \(D \in {{\mathcal {N}}}^n\) and \(F \in {{\mathcal {N}}}^n\) are such that D is a diagonal matrix with strictly positive entries given by \(D_{kk} = Q_{kk} - \gamma _0,~k = 1,\ldots ,n,\) along the main diagonal, and all diagonal entries of F are equal to zero. Since \((Q - \gamma _0 e e^T) - D = F \in {{\mathcal {N}}}^n\), it follows by Lemma 1(i), (ii), and (iii) that

$$\begin{aligned} \nu (Q - \gamma _0 e e^T) = \nu (Q) - \gamma _0 \ge \nu (D) = \frac{1}{\sum \limits _{k=1}^n (1/D_{kk})} = \frac{1}{\sum \limits _{k=1}^n (1/\left( Q_{kk} - \gamma _0\right) )}. \end{aligned}$$

This gives rise to the following lower bound on \(\nu (Q)\) (see, e.g., [8]):

$$\begin{aligned} \text {(LB1)} \quad \nu (Q) \ge \ell _1(Q) := \min \limits _{1\le i \le j \le n} Q_{ij}+ \frac{1}{\sum \limits _{k=1}^n \left( 1/(Q_{kk} - \min \limits _{1\le i \le j \le n} Q_{ij})\right) }, \end{aligned}$$

where we define \(1/0 = \infty \), \(1/\infty = 0\), and \(\beta + \infty = \infty \) for any \(\beta \in {\mathbb {R}}\). These definitions imply that \(\ell _1(Q) = \min _{1\le i \le j \le n} Q_{ij} = \gamma _0\) if and only if \(\min \limits _{1\le i \le j \le n} Q_{ij} = \min \limits _{k = 1,\ldots ,n} Q_{kk}\).

2.4.2 Lower bound from doubly nonnegative relaxation

In this section, we present another lower bound on \(\nu (Q)\) using an alternative formulation of (StQP).

A standard quadratic program can be equivalently reformulated as the following instance of a linear optimization problem over the convex cone of completely positive matrices [7]:

$$\begin{aligned} \text {(CPP)} \quad \nu (Q) = \min \{{\langle }Q, X {\rangle }: {\langle }E, X {\rangle }= 1, \quad X \in \mathcal{CP}^n\}, \end{aligned}$$

where \(X \in {{\mathcal {S}}}^n\) and \({{\mathcal {C}}}{{\mathcal {P}}}^n\) is given by (5). Despite the fact that (CPP) is a convex reformulation of (StQP), it remains NP-hard since the membership problem for the cone of completely positive matrices is intractable (see, e.g., [14]).

By (7), one can replace the intractable cone of completely positive matrices in (CPP) by the larger but tractable cone of doubly nonnegative matrices so as to obtain the following doubly nonnegative relaxation of (CPP):

$$\begin{aligned} \text {(DNN)} \quad \min \left\{ \langle Q, X \rangle : \langle E, X \rangle = 1, \quad X \in \mathcal{DNN}^n \right\} , \end{aligned}$$

where \(\mathcal{DNN}^n\) is given by (6).

Therefore, another lower bound on \(\nu (Q)\) is given by

$$\begin{aligned} \text {(LB2)} \quad \nu (Q) \ge \ell _2(Q) := \min \left\{ \langle Q, X \rangle : \langle E, X \rangle = 1, \quad X \in \mathcal{DNN}^n \right\} . \end{aligned}$$

By [8, Theorem 13], we have the following relation:

$$\begin{aligned} \ell _1(Q) \le \ell _2(Q) \le \nu (Q), \end{aligned}$$
(16)

i.e., the lower bound \(\ell _2(Q)\) is at least as tight as \(\ell _1(Q)\). By Lemma 1(iv) and (16), both lower bounds are exact if the minimum entry of Q lies along the diagonal. However, as illustrated by our computational results in Sect. 4, \(\ell _2(Q)\) is, in general, much tighter than \(\ell _1(Q)\).

Note that \(\ell _1(Q)\) can be computed in \(O(n^2)\) time whereas \(\ell _2(Q)\) requires solving a computationally expensive semidefinite program. We remark that there exist other lower bounds in the literature (see, e.g., [1, 6, 8, 8, 28], and see [8] for a comparison of different lower bounds). Usually, there is a trade-off between the quality of the lower bound and its computational cost.

3 Mixed integer linear programming formulations

In this section, we present two different mixed integer linear programming (MILP) reformulations of standard quadratic programs. We then propose a set of inequalities that are valid for both formulations.

3.1 A formulation based on KKT conditions

Our first MILP formulation is obtained by exploiting the KKT conditions. We discuss how the nonlinear complementarity constraints can be linearized by employing binary variables. We also discuss how to obtain valid upper bounds for the big-M parameters that arise from this linearization.

Given an instance of (StQP), it follows from (13) that \(\lambda = x^T Q x\) for any KKT point \(x \in {\varDelta }_n\). Therefore, (StQP) can be equivalently formulated as the following linear program with complementarity constraints (see also [16]):

$$\begin{aligned} \begin{array}{lllrcl} &{}\text {(LPCC1)} &{}\quad \min &{} \lambda &{} \\ &{} &{} \quad \text {s.t.}&{}&{}&{}\\ &{}&{}&{} Q x - \lambda e - s &{}=&{} 0, \\ &{}&{} &{} e^T x &{} = &{} 1, \\ &{}&{} &{} x_j s_j &{} = &{} 0, \quad j = 1,\ldots ,n, \\ &{} &{}&{} x &{} \ge &{} 0, \\ &{} &{}&{} s &{} \ge &{} 0. \end{array} \end{aligned}$$

We can linearize the nonconvex complementarity constraints in (LPCC1) by using binary variables and big-M constraints, which gives rise to the following MILP reformulation of (StQP):

figure a

Note that, by (20) and (21), the binary variable \(y_j\) ensures that \(x_j\) and \(s_j\) cannot simultaneously be positive for any \(j = 1,\ldots ,n\). In particular, if \(y_j = 1\), then \(s_j = 0\) by (21) and \(x_j\) is allowed to be positive. Since \(x \in {\varDelta }_n\), we have \(0 \le x_j \le 1\), which implies that (20) yields a valid upper bound on \(x_j\). On the other hand, if \(y_j = 0\), we have \(x_j = 0\) by (20). In this case, we need valid upper bounds on the variable \(s_j\) in (21), which we discuss next.

By (18),

$$\begin{aligned} s_j = e_j^T Q x - \lambda , \quad j = 1,\ldots ,n. \end{aligned}$$
(25)

We can obtain an upper bound on \(s_j\) by deriving an upper bound for each of the terms on the right-hand side of (25). For the first term, since \(x \in {\varDelta }_n\), we have

$$\begin{aligned} e_j^T Q x = x^T Q e_j \le \max _{i = 1,\ldots ,n} Q_{ij}, \quad j = 1,\ldots ,n. \end{aligned}$$
(26)

In order to bound the second term from above, any lower bound on \(\lambda \) can be employed. Since \(\lambda \ge \nu (Q)\) for any feasible solution of (MILP1), it follows that any lower bound on \(\nu (Q)\) can be used to obtain an upper bound on \(s_j,~j = 1,\ldots ,n\). Indeed, let \(\ell \) denote an arbitrary lower bound on \(\nu (Q)\). For any feasible solution \((x,y,s,\lambda )\) of (MILP1), it follows from (25) and (26) that

$$\begin{aligned} s_j = e_j^T Q x - \lambda \le \max _{i = 1,\ldots ,n} Q_{ij} - \nu (Q) \le \max _{i = 1,\ldots ,n} Q_{ij} - \ell , \quad j = 1,\ldots ,n, \end{aligned}$$

which implies that

$$\begin{aligned} M_j = \max _{i = 1,\ldots ,n} Q_{ij} - \ell , \quad j = 1,\ldots ,n \end{aligned}$$
(27)

would be a valid choice in (MILP1). In particular, we use \(\ell \in \{\ell _1(Q),\ell _2(Q)\}\) in our computational experiments.

3.2 An alternative formulation

In this section, we present an alternative MILP formulation. Given an instance of (StQP), we first derive an underestimator and an overestimator for the quadratic objective function. We then establish useful properties of these two functions, which form the basis of our second formulation.

We start with a lemma that presents an underestimator and an overestimatior for the objective function of (StQP), both of which depend on the support of a feasible solution defined in (14).

Lemma 2

For any \(Q \in {{\mathcal {S}}}^n\) and \(x \in {\varDelta }_n\), we have

$$\begin{aligned} \min _{j \in {{\mathcal {P}}}(x)} e_j^T Q x \le x^T Q x \le \max _{j \in {{\mathcal {P}}}(x)} e_j^T Q x, \end{aligned}$$
(28)

where \({{\mathcal {P}}}(x)\), given by (14), denotes the support of x. Furthermore, if \(x \in {\varDelta }_n\) is a KKT point of (StQP), then

$$\begin{aligned} \min _{j \in {{\mathcal {P}}}(x)} e_j^T Q x = x^T Q x = \max _{j \in {{\mathcal {P}}}(x)} e_j^T Q x. \end{aligned}$$
(29)

Proof

For any \(Q \in {{\mathcal {S}}}^n\) and \(x \in {\varDelta }_n\),

$$\begin{aligned} x^T Q x = \left( \sum _{j=1}^n x_j e_j^T\right) Qx = \sum _{j=1}^n x_j \left( e_j^T Q x \right) = \sum _{j \in {{\mathcal {P}}}(x)} x_j \left( e_j^T Q x \right) , \end{aligned}$$

i.e., \(x^T Q x\) is a convex combination of \(e_j^T Q x\) for \(j \in {{\mathcal {P}}}(x)\), from which (28) follows.

Let \(x \in {\varDelta }_n\) be a KKT point of (StQP). Then, by (8)–(12),

$$\begin{aligned} e_j^T Q x= & {} \lambda , \quad j \in {{\mathcal {P}}}(x), \\ e_j^T Q x\ge & {} \lambda , \quad j \in {{\mathcal {Z}}}(x). \end{aligned}$$

Furthermore \(x^T Q x = \sum \limits _{j \in {{\mathcal {P}}}(x)} x_j \left( e_j^T Q x \right) = \lambda \left( \sum \limits _{j \in {{\mathcal {P}}}(x)} x_j\right) = \lambda \), which establishes (29). \(\square \)

Using Lemma 2, we next present an alternative characterization of \(\nu (Q)\).

Proposition 1

Given an instance of (StQP),

$$\begin{aligned} \nu (Q) = \min _{x \in {\varDelta }_n} \max _{j \in {{\mathcal {P}}}(x)} e_j^T Q x. \end{aligned}$$
(30)

Proof

For any \(x \in {\varDelta }_n\), we have \(x^T Q x \le \max \limits _{j \in {{\mathcal {P}}}(x)} e_j^T Q x\) by Lemma 2, which implies that \(\nu (Q) \le \min \limits _{x \in {\varDelta }_n} \max \limits _{j \in {{\mathcal {P}}}(x)} e_j^T Q x\). Conversely, let \(x^* \in {\varDelta }_n\) be an optimal solution of (StQP). Then, \(x^*\) is a KKT point, which implies that \(\nu (Q) = (x^*)^T Q x^* = \max \limits _{j \in {{\mathcal {P}}}(x^*)} e_j^T Q x^*\) by Lemma 2, which establishes the reverse inequality. \(\square \)

We are now in a position to propose an alternative MILP formulation based on the characterization (30) stated in Proposition 1.

figure b

Note that the auxiliary variable \(\alpha \) is introduced and used in (31) and (32) to linearize the maximum function on the right-hand side of (30) and the binary variables \(y_j\) are employed in (35) to ensure that the maximum is restricted only to the linear functions corresponding to the support of x in (32). Indeed, if \(j \in {{\mathcal {P}}}(x)\), then \(x_j > 0\), which forces \(y_j = 1\) by (34) and \(z_j = 0\) by (35). Otherwise, (32) is a redundant constraint since \(z_j\) can then take a positive value. Note that we again rely on big-M parameters \(U_j\) in (35). The next proposition presents a valid bound for these parameters.

Proposition 2

Given an instance of (StQP), (MILP2) is an equivalent reformulation of (StQP) if

$$\begin{aligned} U_j\ge M_j, \quad j = 1,\ldots ,n, \end{aligned}$$

where \(M_j\) is defined as in (27) and \(\ell \) is any lower bound on \(\nu (Q)\).

Proof

Let \(x \in {\varDelta }_n\). Then, for each \(j \in {{\mathcal {P}}}(x)\), we have \(y_j = 1\) by (34), which implies that \(z_j = 0\) by (35) and \(\alpha \ge \max \limits _{j \in {{\mathcal {P}}}(x)} e_j^T Q x \ge x^T Q x\) by (32) and by Lemma 2. Since the objective function minimizes \(\alpha \), the best choice of \(\alpha \) would be given by \(\alpha = \max \limits _{j \in {{\mathcal {P}}}(x)} e_j^T Q x\).

Consider an index \(j \not \in {{\mathcal {P}}}(x)\). Let us define \(z_j = \max \{0, e_j^T Q x - \alpha \} \ge 0\). Then, if \(e_j^T Q x - \alpha < 0\), we have \(z_j = 0 \le M_j\), which satisfies the constraints of (MILP2). Otherwise,

$$\begin{aligned} z_j = e_j^T Q x - \alpha \le \max _{i = 1,\ldots ,n} Q_{ij} - \alpha \le \max _{i = 1,\ldots ,n} Q_{ij} - x^T Q x \le \max _{i = 1,\ldots ,n} Q_{ij} - \nu (Q), \end{aligned}$$

which implies that \(z_j \le \max \limits _{i = 1,\ldots ,n} Q_{ij} - \ell = M_j\), where \(\ell \) is any lower bound on \(\nu (Q)\). It follows that for each \(x \in {\varDelta }_n\), we can construct \(y \in {\mathbb {R}}^n\), \(z \in {\mathbb {R}}^n\), and \(\alpha \in {\mathbb {R}}\) such that \(\alpha = \max \limits _{j \in {{\mathcal {P}}}(x)} e_j^T Q x \ge x^T Q x\). By Lemma 2, if \(x \in {\varDelta }_n\) is a KKT point, then we can choose \(\alpha = x^T Q x\). The equivalence follows. \(\square \)

By Proposition 2, (MILP2) can be viewed as a majorization minimization approach for (StQP), where the majorizing function is exact at any KKT point.

We close this section by a brief comparison of (MILP1) and (MILP2). Note that the constraint set (32) in (MILP2) can be rewritten as

$$\begin{aligned} Qx - \alpha e - z \le 0. \end{aligned}$$

Identifying the variables z with s and \(\alpha \) with \(\lambda \), a comparison with (18) in (MILP1) reveals that (MILP2) is in fact a relaxation of (MILP1). It follows that, for any feasible solution \((x,y,s,\lambda )\) of (MILP1), we can define \(z = s\) and \(\alpha = \lambda \) so that \((x,y,z,\alpha )\) is a feasible solution of (MILP2). On the other hand, while each feasible solution \((x,y,s,\lambda )\) of (MILP1) necessarily corresponds to a KKT point of (StQP), we can construct a feasible solution \((x,y,z,\alpha )\) for any \(x \in {\varDelta }_n\) such that \(\alpha \ge x^T Q x\), with equality if x is a KKT point of (StQP). Therefore, (MILP2) can be viewed as an exact relaxation of (MILP1).

3.3 Valid inequalities

In this section, we present a set of inequalities that are valid for both formulations (MILP1) and (MILP2).

Given an instance of (StQP), Theorem 1 presents a relation between the support of an optimal solution and the convexity graph of Q. This relation gives rise to the following theorem.

Theorem 2

The following inequalities are valid for both formulations (MILP1) and (MILP2):

$$\begin{aligned} y_i + y_j \le 1, \quad \quad 1 \le i < j \le n ~~\text {s.t.}~~Q_{ii} + Q_ {jj} - 2 Q_{ij} \le 0. \end{aligned}$$
(39)

Proof

In both formulations (MILP1) and (MILP2), the binary variables \(y_j\) are equal to one if \(j \in {{\mathcal {P}}}(x)\) for any feasible solution \(x \in {\varDelta }_n\). By Theorem 1, there exists a global solution of (StQP) whose support set forms a stable set in the complement of the convexity graph \(G = (V,E)\) of Q. The assertion follows. \(\square \)

4 Computational results

We report the results of our computational experiments in this section. We first describe the set of instances used in our experiments. Then, we explain our experimental setup in detail. Finally, we report performances of the proposed MILP formulations in comparison with several other global solution approaches from the literature.

4.1 Set of instances

In an attempt to accurately assess the performances of the MILP formulations (MILP1) and (MILP2), we conducted extensive experiments on the following set of instances from the literature:

  1. (i)

    BLST instances [10]: This set consists of 150 instancesFootnote 1 with \(n = 30\) (BLST30) and 150 instances with \(n = 50\) (BLST50). Each entry of Q is randomly generated from a triangular distribution with parameters \(a< c < b\), where a and b denote the minimum and maximum values and c is the mode of the distribution.

  2. (ii)

    ST instances [29]: This set consists of 24 instancesFootnote 2 with \(n = 100\) (ST100), 18 instances with \(n = 200\) (ST200), 11 instances with \(n = 500\) (ST500), and 1 instance with \(n = 1000\) (ST1000). For each of these instances, the matrix Q is randomly generated so that its convexity graph \(G = (V,E)\) has a prespecified density \(\delta \in [0,1]\), where the density of an undirected graph is given by the ratio of the number of edges to the maximum possible number of edges. Note that these instances are generated by constructing a matrix \(Q \in {{\mathcal {S}}}^n\) such that \(Q_{ij} = 0.5(Q_{ii} + Q_{jj}) - R_{ij}\), for \(1 \le i < j \le n\), where \(R_{ij} > 0\) with probability \(\delta \) and \(R_{ij} < 0\) with probability \(1 - \delta \) (see [28]).

  3. (iii)

    DIMACS instances: It is well-known that the maximum stable set problem in graph theory can be formulated as an instance of (StQP) [27]. This set consists of (StQP) instances obtained from the complements of the 30 instances of the maximum clique problemFootnote 3 from the Second DIMACS Implementation Challenge with \(n \in [28,300]\). These instances are divided into two groups based on the number of vertices. DIMACS1 consists of 8 instances with \(n \in [28,171]\) and DIMACS2 is comprised of 22 instances with \(n \in [200,300]\).

  4. (iv)

    BSU instances [9]: This set consists of 20 “hard” instances with \(n \in [5,24]\). Each of these instances is specifically constructed to harbor an exponential number of strict local minimizers. In particular, the number of strict local minimizers varies between \(1.38^n\) and \(1.49^n\).

Note that Theorem 1 establishes a relation between the support of a global minimizer of an instance of (StQP) and the set of cliques of the associated convexity graph \(G = (V,E)\). Denoting the density of the convexity graph by \(\delta \in [0,1]\), it follows that the number of cliques in G tends to increase as \(\delta \) increases. Therefore, instances of (StQP) with larger values of \(\delta \) contain a larger number of possible support sets for a global minimizer. Indeed, this difficulty is also reflected in earlier computational experiments (see, e.g. [25, 29]). It is also worth mentioning that the number of valid inequalities (39) is given by \((1 - \delta ) n (n-1)/2\). Therefore, for fixed n, the number of valid inequalities decreases as \(\delta \) increases. For each set of instances, we therefore report the range of the parameter \(\delta \).

Recall that, for an instance of (StQP), if the minimum entry of Q lies along the main diagonal, then \(\nu (Q)\) equals that entry by Lemma 1(iv). Therefore, we use this criterion as a preprocessing step in order to eliminate trivial instances. Apart from this, we do not use any other preprocessing procedure. After eliminating such trivial instances, we obtain a test bed that consists of a total of 376 instances. We summarize the statistics on the set of instances in Table 1.

Table 1 Summary of instances

As illustrated by Table 1, our test bed encompasses a large number of instances of (StQP) with varying sizes and characteristics. In particular, we note the higher density of the convexity graphs associated with the hard BSU instances. Indeed, 14 instances in this set have a density of 1; 5 instances have densities between 0.92 and 0.95; and only one instance has a density of 0.6, supporting our previous observation regarding the correlation between the difficulty of an instance and the density of the associated convexity graph.

4.2 Experimental setup

Note that we propose two MILP formulations (MILP1) and (MILP2). For each formulation, one can use the lower bound \(\ell _1(Q)\) or \(\ell _2(Q)\). Finally, for each choice of the lower bound, we have the option of adding the valid inequalities (39) or not, which implies that we have a total of 8 variants. For each variant with (MILP1), we add the following bound constraints on the variable \(\lambda \):

$$\begin{aligned} \ell _1(Q) \le \lambda \le \min _{k = 1,\ldots ,n} Q_{kk}, \end{aligned}$$
(40)

where the upper bound follows from Lemma 1(iv). Similarly, the corresponding constraints are added for each variant with (MILP2):

$$\begin{aligned} \ell _2(Q) \le \alpha \le \min _{k = 1,\ldots ,n} Q_{kk}. \end{aligned}$$
(41)

We compare the performances of our MILP formulations with three other global solution approaches, namely, the MILP formulation of [31], which is publicly available at https://github.com/xiawei918/quadprogIP, the quadratic programming (QP) solver of CPLEX, and the nonlinear programming (NLP) solver BARON.

We solved all MILP formulations in MATLAB (version R2017b) using CPLEX (version 12.8.0) with the CPLEX Class API provided in CPLEX for MATLAB Toolbox. Similarly, the QP solver of CPLEX was called in MATLAB using the same API. The NLP solver BARON (version 17.4.1) was called from GAMS (version 24.8.5) using the MATLAB interface. The computation of \(\ell _2(Q)\) requires a semidefinite programming solver. For that purpose, we employed MOSEK (version 8.1.0.49) using the MOSEK Optimization Toolbox for MATLAB (version 8.1.0.82).

We measured the running times in terms of wall clock time. In our experiments with CPLEX and BARON, we imposed a time limit of 3600 s for each MILP and QP problem. No time limit was imposed on MOSEK for the computation of \(\ell _2(Q)\). Our computational experiments were carried out on a 64-bit HP workstation with 24 threads (2 sockets, 6 cores per socket, 2 threads per core) running Ubuntu Linux with 48 GB of RAM and Intel Xeon CPU E5-2667 processors with a clock speed of 2.90 GHz. In our experiments with CPLEX, we chose the deterministic parallel mode by setting cplex.parallelmode = 1 in order to have reproducible results. We remark that both CPLEX and MOSEK can take advantage of multiple threads. Similarly, we employed option threads = 24 in GAMS. However, we noticed that the wall clock time and the CPU time reported by BARON were virtually identical on all instances, suggesting that BARON did not take advantage of the multiple threads. Therefore, we caution the reader about the interpretation of the run times in our experiments with BARON.

In CPLEX and BARON, we set the optimality gap tolerance to \(10^{-6}\), which is given by

$$\begin{aligned} \frac{|\mathtt{bestbound} - \mathtt{bestsolution}|}{10^{-10} + |\mathtt{bestsolution}|}. \end{aligned}$$
(42)

We employed the default settings for all the other parameters of CPLEX, BARON, and MOSEK.

We denote by MILP1-L1 and MILP1-L1-VI the MILP formulation (MILP1) using the lower bound \(\ell _1(Q)\) with and without the set of valid inequalities (39), respectively. We replace the suffix L1 by L2 for the lower bound \(\ell _2(Q)\). We use a similar convention for (MILP2). The MILP formulation of [31] is denoted by QP-IP, whereas CPLEX QP and BARON refer to the QP formulations solved by CPLEX and BARON, respectively. The doubly nonnegative relaxation (DNN) is denoted by DNN. For our MILP formulations with the lower bound \(\ell _2(Q)\), the solution times exclude the computational effort for solving the corresponding doubly nonnegative relaxation (DNN) required to compute this lower bound, which is reported separately. Finally, for a solution \(x \in {\varDelta }_n\) reported by a solver, an index \(j \in \{1,\ldots ,n\}\) is considered to be in the support of x (i.e., \(j \in {{\mathcal {P}}}(x)\)) if \(x_j > 10^{-8}\).

For each data set, we report our results using a table that presents various summary statistics and a performance profile. In each table, we have a row for each of the eight variants using (MILP1) and (MILP2), and a row for each of QP-IP, CPLEX QP, BARON, and DNN. The first set of columns reports the number of instances solved to optimality within the time limit of 3600 s, the average solution time over these instances, and the standard deviation. We present the number of instances on which the corresponding approach hits the time limit and the average optimality gap defined as in (42) over these instances in the second set of columns. Recall that we do not impose any time limit for solving the DNN relaxation. In an attempt to shed more light on the comparison of the performances of different approaches, we report performance profiles [15], which are frequently used for benchmarking purposes in optimization. For a given problem instance, the performance ratio of a particular approach is defined as the ratio of the solution time of that approach to the best solution time among all approaches on that particular instance, which is defined to be \(+\infty \) if the approach fails to solve the instance within the given time limit. Then, for each approach a, a cumulative distribution function \(P_a(\tau )\) is defined to be the percentage of the number of instances that can be solved by that approach within a factor of \(\tau \) of the solution time of the best approach, where \(\tau \in [1,\infty )\). Note that only the instances that can be solved by at least one approach within the time limit are included in the comparison. We remark that performance ratios are reported in logarithmic scale. Furthermore, the markers are included for every kth instance, where k is judiciously chosen according to the number of instances in each set in such a way that each figure has roughly the same number of markers.

4.3 BLST instances

In this section, we report the performances on the BLST data set [10]. Recall that BLST30 consists of 134 instances with \(n = 30\) and BLST50 is comprised of 138 instances with \(n = 50\).

On each of the BLST30 and BLST50 instances, the lower bound \(\ell _2(Q)\) (i.e., the optimal value of (DNN)) is either equal to \(\nu (Q)\) or the difference between the two is at most \(10^{-6}\), whereas the simple lower bound \(\ell _1(Q)\) is always smaller than \(\nu (Q)\). Therefore, \(\ell _2(Q)\) is significantly tighter than \(\ell _1(Q)\) over all instances. The support sizes of optimal solutions are in the range [1, 10].

We report the summary of the results in Tables 2 and 3 for the BLST30 and BLST50 instances, respectively.

Table 2 Performance on BLST30 (134 instances; \(n = 30\))
Table 3 Performance on BLST50 (138 instances; \(n = 50\))

As illustrated by Tables 2 and 3, our MILP formulations as well as QP-IP can solve each instance to optimality in a fraction of a second on average. On the other hand, each of CPLEX QP and BARON hits the time limit on some of the instances, with CPLEX QP exhibiting a better performance than BARON. Note that the average gaps of BARON are usually considerably higher than those reported by CPLEX QP. On each set, the average time taken by each of CPLEX QP and BARON is significantly larger than that required for each MILP formulation. In terms of average solution times of MILP models, each of the eight variants of (MILP1) and (MILP2) slightly outperforms QP-IP, where the improvement is more pronounced on BLST50. We remark that the average time of each of our MILP formulations on BLST50 increases only modestly in comparison with that on BLST30. We particularly highlight the better and more robust performances of MILP2-L2 and MILP2-L2-VI on BLST50. Finally, while the average computational effort for solving the DNN relaxation is somewhat negligible on BLST30, it increases significantly on BLST50, even surpassing the average time required for solving each MILP formulation.

Fig. 1
figure 1

Performance profile on BLST instances

Fig. 2
figure 2

Performance profile on BLST instances (excluding CPLEX QP and BARON)

The performance profile on BLST instances is illustrated in Fig. 1. In an attempt to illustrate the total computational effort in the performance profile, we add the computation time of the DNN relaxation to the solution time of each variant of our model that uses the lower bound \(\ell _2(Q)\). As indicated by Fig. 1, each of the eight variants of our MILP formulations outperforms CPLEX QP and BARON, even with the inclusion of the computational effort for solving the DNN relaxation. On the other hand, while each of the four variants of our MILP formulations that uses the simple lower bound \(\ell _1(Q)\) outperforms QP-IP, the additional computational effort for the DNN relaxation outweighs the benefits of the MILP formulations that rely on the tighter lower bound \(\ell _2(Q)\). In an attempt to give a better comparison of the MILP based approaches, we also include the performance profile excluding CPLEX QP and BARON in Fig. 2, which clearly illustrates that the best performance ratios are achieved by MILP1-L1, MILP1-L1-VI, and MILP2-L1. While the valid inequalities seem to slightly improve the performance ratios of MILP1-L2, they do not seem to have a positive effect on the remaining variants.

4.4 ST instances

In this section, we report our computational results on the ST data set [29], which consists of 24 instances with \(n = 100\) (ST100), 18 instances with \(n = 200\) (ST200), 11 instances with \(n = 500\) (ST500), and one instance with \(n = 1000\) (ST1000).

We first focus on ST100 and ST200. On each instance in these data sets, the difference between the lower bound \(\ell _2(Q)\) and \(\nu (Q)\) is less than \(10^{-5}\). On the other hand, the simple lower bound \(\ell _1(Q)\) is considerably smaller than \(\nu (Q)\) on all instances. The support sizes of optimal solutions vary between 3 and 7.

Table 4 Performance on ST100 (24 instances; \(n = 100\))
Table 5 Performance on ST200 (18 instances; \(n = 200\))

We report our results in Tables 4 and 5 , each of which is organized similarly to Table 2. A close examination of Tables 4 and 5 reveals that MILP formulations significantly outperform each of CPLEX QP and BARON on ST100 and ST200. In particular, all of the instances in these two sets can be solved to optimality by each of our MILP formulations and by QP-IP, whereas CPLEX QP can solve only 6 instances out of 24 in ST100 to optimality and BARON can solve none within the time limit. Furthermore, BARON generally reports considerably higher optimality gaps in comparison with CPLEX QP. Note again that the average computational effort required for each MILP formulation is significantly smaller than that for CPLEX QP and BARON. The average computational effort for solving the DNN relaxation significantly exceeds that required for each MILP formulation. In particular, the average solution time of DNN increases by more than a factor of 25 from ST100 to ST200, despite the fact that n only increases by a factor of 2.

In terms of average solution times of MILP models, each of the eight variants of our MILP formulations outperforms QP-IP on both ST100 and ST200. MILP2-L2 stands out as a clear winner among the other MILP formulations. Note, in particular, that MILP2-L2 is about 13 times faster than QP-IP on ST100 and more than 30 times faster on ST200 on average. In terms of the overall performance, MILP2-L2 is followed by MILP2-L2-VI, MILP2-L1, and MILP2-L1-VI. In particular, the lower bound \(\ell _2(Q)\) seems to have a remarkably positive effect on the performance of (MILP2), both with and without valid inequalities and, to a lesser extent, on the performance of (MILP1) on ST200. The inclusion of valid inequalities has a mixed effect on (MILP1) and (MILP2). It is worth mentioning that instances with sparse convexity graphs give rise to a larger number of valid inequalities, which may adversely affect the overall performance of the variants with valid inequalities.

Fig. 3
figure 3

Performance Profile on ST100 and ST200 Instances

The performance profile on ST100 and ST200 instances, which is illustrated in Fig. 3 and includes the additional computation time of DNN relaxations in the variants of our formulations that rely on the lower bound \(\ell _2(Q)\) as in Figs. 1 and 2, reveals that each of the eight variants of our MILP formulations outperforms CPLEX QP and BARON, even with the inclusion of the computational effort for solving the DNN relaxation. On the other hand, this additional computational effort outweighs the benefits of the MILP formulations in comparison with QP-IP. In terms of the performance profiles, the ranking is given by MILP2-L1, MILP2-L1-VI, MILP1-L1, and MILP1-L1-VI, followed by QP-IP, which, in turn, is followed by the variants that rely on the lower bound \(\ell _2(Q)\). Once again, we note that the inclusion of valid inequalities does not seem to lead to an improved performance on this set. Recall that BARON is not included in the figure since it hits the time limit on all instances in this set.

We now focus on the larger instances ST500 and ST1000. On each of these instances, \(\ell _1(Q)\) again turns out to be a rather loose lower bound on \(\nu (Q)\). The sizes of the support of optimal solutions are between 4 and 6. Note that the DNN relaxation could not be solved for any instance in ST500 and ST1000. Therefore, the lower bound \(\ell _2(Q)\) is not available on this set.

Table 6 Performance on ST500 (11 instances; \(n = 500\))
Table 7 Performance on ST1000 (1 instance; \(n = 1000\))

Tables 6 and 7 report the results on ST500 and ST1000, respectively. Note that we omit the rows corresponding to the variants of our MILP formulations with the lower bound \(\ell _2(Q)\) and the DNN relaxation in Tables 6 and 7 accordingly. Therefore, we present the results only for the variants of our MILP formulations with the simple lower bound \(\ell _1(Q)\). On ST500, while each of our formulations can solve all instances to optimality, QP-IP hits the time limit on 5 of the 11 instances. Furthermore, both CPLEX QP and BARON also hit the time limit on all of the instances. A comparison of average computational effort indicates that our MILP formulations are far more effective than the other approaches, with MILP2-L1 exhibiting the best and most robust performance, followed by MILP2-L1-VI, MILP1-L1, and MILP1-L1-VI.

Fig. 4
figure 4

Performance profile on ST500 and ST1000 instances

ST1000 consists of a single instance with \(n = 1000\). We include this instance in our experiments to assess the performances of different global approaches on a very large sample instance. This instance indeed turns out to be very challenging for each approach, with only MILP2-L1 being able to solve it to optimality within the time limit. We believe that this instance provides a remarkable computational evidence about the effectiveness of (MILP2), even when used with the simple and fairly loose lower bound \(\ell _1(Q)\). We also point out the extremely large optimality gaps reported by the QP solvers, which illustrates that instances of this scale are well beyond the reach of current state-of-the-art QP and NLP solvers. On this instance, only MILP1-L1-VI reports a larger gap than that of QP-IP. Note that the valid inequalities do not seem to help on these two sets since the total number of such inequalities can be fairly large. For instance, the number of valid inequalities is about 375,000 on the single instance in ST1000. On both of ST500 and ST1000, MILP2-L1 clearly outperforms all the other MILP formulations.

In terms of the performance ratios, the performance profile in Fig. 4 clearly illustrates the superior performance of MILP2-L1, supporting our previous observations, and the better performance of each of our MILP formulations in comparison with QP-IP. Recall again that CPLEX QP and BARON are not included in the figure since each of them hits the time limit on all instances in this set.

4.5 DIMACS instances

In this section, we report our computational results on the DIMACS data set, consisting of 8 instances in DIMACS1 with \(n \in [28,171]\), and 22 instances in DIMACS2 with \(n \in [200,300]\). Each of these instances is obtained from the reformulation of the maximum stable set problem as an instance of (StQP) [27]. We first review this formulation.

Let \(G = (V,E)\) be a simple, undirected graph, where \(V = \{1,\ldots ,n\}\). Recall that a set \(S \subseteq V\) is a stable set if no two nodes in S are connected by an edge. The maximum stable set problem is concerned with computing the largest stable set in G, whose size is denoted by \(\alpha (G)\).

By defining a binary variable \(y_j \in \{0,1\}\) for each \(j \in V\) to indicate whether node j belongs to a maximum stable set, the maximum stable set problem can be formulated as an integer linear programming problem as follows:

$$\begin{aligned} \text {(ILP)} \quad \alpha (G) = \max \left\{ \sum \limits _{j=1}^n y_j: y_i + y_j \le 1, \quad (i,j) \in E, \quad y_j \in \{0,1\}, \quad j = 1,\ldots ,n\right\} . \end{aligned}$$

Motzkin and Straus [27] proposed the following formulation in the form of (StQP):

$$\begin{aligned} \text {(MS-StQP)} \quad \frac{1}{\alpha (G)} = \min \limits _{x \in {\varDelta }_n} x^T(I + A_G)x, \end{aligned}$$

where \(A_G \in {{\mathcal {S}}}^n\) denotes the adjacency matrix of G. Furthermore, if \(S^* \subseteq V\) denotes a maximum stable set of G, then an optimal solution of (MS-StQP) is given by \(x_j = 1/|S^*|\) if \(j \in S^*\), and \(x_j = 0\) otherwise.

While we think that reformulating a combinatorial optimization problem as an instance of (StQP) and then reformulating it as an MILP problem may not necessarily be the best solution approach, we still include this set of instances in our experiments due to the existence of a global optimal solution of (MS-StQP) with a support of size \(\alpha (G)\), which can be fairly large for certain classes of graphs. Therefore, these instances can be particularly challenging for global solution approaches for (StQP).

Given an instance of (MS-StQP) corresponding to a graph \(G = (V,E)\), it is easy to verify that the convexity graph of \(Q = I + A_G\) (see Sect. 2.3) is precisely given by the complement of G. Since the valid inequalities (39) are defined for each edge of the complement of the convexity graph, which coincides with G, it follows that the valid inequalities in our MILP formulations are given by

$$\begin{aligned} y_i + y_j \le 1, \quad (i,j) \in E, \end{aligned}$$

which are precisely the same constraints as in (ILP). Therefore, our MILP formulations with valid inequalities can in some sense be viewed as extended formulations of (ILP).

In contrast with BLST instances and ST instances, the lower bound \(\ell _2(Q)\) is almost tight on 5 out of 8 instances in DIMACS1 and only on 6 out of 22 instances in DIMACS2. The lower bound \(\ell _1(Q)\) is quite loose across all instances. The support size of an optimal solution is in the range of [4,44] and [8,128] in DIMACS1 and DIMACS2, respectively.

We report our computational results on DIMACS1 and DIMACS2 in Tables 8 and 9, respectively. For comparison purposes, we also include the performance of the integer linear programming formulation (ILP) on these instances, denoted by ILP.

Table 8 Performance on DIMACS1 (8 instances; \(n \in [28,171]\))
Table 9 Performance on DIMACS2 (22 instances; \(n \in [200,300]\))

Tables 8 and 9 illustrate that these instances are indeed challenging for each of the global solution approaches. We discuss the computational results on DIMACS1 and DIMACS2 separately.

On DIMACS1, as illustrated by Table 8, each variant of our MILP formulations without valid inequalities can solve 7 of the 8 instances to optimality within the time limit. The addition of the valid inequalities not only helps to close the gap on the remaining instance on all variants but also improves the average solution time of (MILP1) regardless of the particular lower bound employed. Similarly, QP-IP can solve 7 out of 8 instances to optimality. Each of CPLEX QP and BARON hits the time limit on each of the 8 instances, with BARON reporting a higher average optimality gap compared to CPLEX QP. The average computational effort for solving the DNN relaxation is considerably larger than that required for each variant of our MILP formulations. Finally, ILP can solve all of these instances to optimality fairly quickly. We conjecture that this favorable outcome can be attributed to CPLEX’s capability of identifying the particular structure of this formulation and adding other well-known inequalities and cuts.

In terms of the average solution time of MILP problems, each of our eight MILP variants requires less computational effort than QP-IP, either achieving significantly smaller average solution times or terminating with smaller optimality gaps. While the performances of MILP1-L1-VI and MILP2-L1-VI are similar, MILP2-L2-VI exhibits a considerably better performance than MILP1-L2-VI.

On DIMACS2, which consists of larger instances, Table 9 reveals that each approach, including ILP, hits the time limit on various subsets of instances, illustrating the challenging structure of these instances. ILP achieves the best performance in terms of the number of instances solved to optimality. In particular, ILP is terminated due to the time limit only on one instance out of 22. MILP1-L1-VI ranks in the second place, with the second largest number of instances solved to optimality, followed by MILP2-L1-VI and MILP1-L2-VI, respectively. Once again, QP-IP is outperformed by each of our eight MILP variants in terms of both the number of instances solved to optimality and average optimality gap. Both CPLEX QP and BARON can solve only one instance to optimality and hit the time limit on each of the remaining instances, with CPLEX QP reporting a slightly lower average gap compared to BARON. The average computational effort required for solving the DNN relaxation is quite significant, exceeding the average solution time of each MILP formulation.

We again observe notable improvements due to the addition of valid inequalities for both (MILP1) and (MILP2). Furthermore, it is worth noting that both formulations (MILP1) and (MILP2) significantly benefit in terms of average optimality gaps when used with the better lower bound \(\ell _2(Q)\).

Fig. 5
figure 5

Performance profile on DIMACS instances

The performance profile on DIMACS instances is illustrated in Fig. 5. In the figure, we once again recall that the computation time of the tighter lower bound \(\ell _2(Q)\) is added to the solution times of each variant that relies on this bound. As demonstrated by Fig. 5, we observe that each of the eight variants of our MILP formulations outperforms CPLEX QP and BARON, even with the inclusion of the additional effort for the computation of \(\ell _2(Q)\). MILP1-L1-VI achieves the best performance ratio, followed closely by MILP2-L1-VI. It is also worth noticing that each of MILP2-L1 and MILP1-L1 outperforms QP-IP on this set. Finally, even with the additional computational effort, we remark that each of MILP1-L2-VI and MILP2-L2-VI exhibits a similar performance to QP-IP, while solving a larger number of instances to optimality. On this set of instances, we can clearly see the performance improvement due to the inclusion of valid inequalities on each variant.

4.6 BSU instances

In this section, we report our computational results on the BSU data set, which consists of one instance for each value of n in the range [5, 24]. Each problem is specifically constructed to have an exponential number of strict local minimizers.

The lower bound \(\ell _2(Q)\) is tight only on one instance in this set, namely for \(n = 6\). On the other hand, the lower bound \(\ell _1(Q)\) is considerably weaker than \(\ell _2(Q)\) on all instances. The sizes of the support of an optimal solution range between 2 and 13.

Table 10 Performance on BSU (20 instances; \(n \in [5,24]\))
Fig. 6
figure 6

Performance profile on BSU instances

We report our results in Table 10, which is organized similarly to Table 2. Table 10 reveals that each instance in this set can be solved by each MILP formulation in a fraction of a second on average, illustrating that the existence of an exponential number of strict local minimizers does not seem to hinder the performances of MILP formulations. In particular, each of our eight MILP variants slightly outperforms QP-IP on this set. On the other hand, each of CPLEX QP and BARON can solve only the five smallest instances to optimality within the time limit, each hitting the time limit on the remaining set of 15 instances. Therefore, these instances seem to be particularly challenging for QP and NLP solvers despite the small values of n. Note that DNN requires a negligible average computational effort on this set.

Fig. 7
figure 7

Performance profile on BSU instances (excluding CPLEX QP and BARON)

The performance profiles, provided in Fig. 6 for all approaches and in Fig. 7 for MILP based approaches only, support our previous observations. We again recall that the computation of the DNN relaxation is added to the solution time of each variant that relies on the lower bound \(\ell _2(Q)\). In particular, as demonstrated by these two figures, it is worth noticing that each of our eight variants outperforms QP-IP, even with the inclusion of the computational effort for \(\ell _2(Q)\). In addition, this set of instances provides strong computational evidence that MILP formulations are less likely to be affected by the possibility of an exponential number of local minimizers, which otherwise poses a great challenge for general purpose QP and NLP solvers. These results illustrate that MILP formulations can be much more effective for solving standard quadratic programs in comparison with general purpose QP and NLP solvers.

4.7 Overall comparison

In this section, we give a discussion on overall comparison of each of the global solution approaches on all of the instances in our test bed. In an attempt to make a fair comparison, we first divide the set of instances into two subsets. IS1 consists of all instances on which each global solution approach is attempted, i.e., the set of all instances excluding the sets ST500 and ST1000. We collect these two sets of larger instances in the set IS2. Recall that the DNN relaxation cannot be solved on IS2.

Our first comparison is based on the summary of the performances of all of the global solution approaches on our test bed.

In Table 11, we summarize the results of our computational experiments. For each instance set and each global solution approach, we report the total number of instances solved to optimality and the total number of instances terminated due to the time limit. We organize the results similarly as in Table 2, except that we have a separate set of columns for each of IS1 and IS2. Furthermore, we do not include the average solution times and average optimality gaps due to the high variability.

Table 11 Summary of results (IS1: 364 instances; \(n \in [5,300]\); IS2: 12 instances; \(n \in [500,1000]\))

We first focus on instances in IS1. Table 11 reveals that MILP1-L1-VI dominates all the other approaches in terms of the total number of instances solved to optimality within the time limit. MILP2-L1-VI ranks in the second place, followed closely by MILP1-L2-VI and MILP2-L2-VI, respectively. Each of our 8 MILP variants outperforms QP-IP on this performance metric. CPLEX QP and BARON fall behind significantly, with CPLEX QP exhibiting better performance than BARON. Note that the addition of valid inequalities improves this performance metric on both formulations regardless of the lower bound employed. On the other hand, this performance metric does not seem to be significantly affected by the choice of the lower bound \(\ell _1(Q)\) or \(\ell _2(Q)\).

Recall that IS2 consists of 12 larger instances. On this set, MILP2-L1 can solve all of the instances to optimality and outperforms all other approaches in terms of this performance metric, which is followed, in turn, by MILP2-L1-VI, MILP1-L1, and MILP1-LI-VI, respectively. Once again, each of our MILP variants exhibits better performance than QP-IP on this metric. Each of CPLEX QP and BARON hits the time limit on each instance in this set. Note that valid inequalities and the choice of the lower bound do not seem to affect this metric on this set.

In an attempt to shed more light on the comparison of the behavior of different approaches, we now focus on the results reported in Tables 23456789, and 10 and Figs. 12, 3, 4, 5, 6, and 7. These results clearly illustrate that MILP formulations constitute an effective approach for solving standard quadratic programs in comparison with state-of-the-art QP and NLP solvers. In terms of average solution times, we observe that each of the eight variants of our MILP formulations outperforms QP-IP. On the other hand, when we take into account the additional computational effort for the computation of the tighter lower bound \(\ell _2(Q)\), we see that this effort outweighs the benefits of our MILP formulations on larger instances. It is worth noticing that each of the four variants that relies on the simple lower bound \(\ell _1(Q)\) consistently outperforms QP-IP on all instances, demonstrating the robustness of these variants. The valid inequalities do not seem to provide a notable computational advantage in general and may even worsen the performance, with the exception of maximum stable set instances. We recommend using the variants of our formulations that rely on the tighter lower bound \(\ell _2(Q)\) unless this bound is too costly to compute (e.g., for \(n \le 50\)). For larger instances, we suggest using (MILP2) with the simple lower bound \(\ell _1(Q)\) due its more robust performance. Finally, we do not recommend the addition of valid inequalities, unless the instance is known to arise from a maximum stable set problem.

5 Concluding remarks

In this paper, we propose solving standard quadratic programs by using two alternative MILP reformulations. The first MILP formulation arises from a simple manipulation of the KKT conditions. The second MILP formulation is obtained by exploiting the specific structure of a standard quadratic program and is in fact a relaxation of the first formulation. Both of our MILP formulations involve big-M parameters. We derive bounds on these parameters by taking advantage of the particular structure of (StQP). We show that these bounds are functions of a lower bound on the optimal value. We consider two different lower bounds, which differ in terms of the computational effort and tightness. Our extensive computational experiments illustrate that the proposed MILP formulations outperform another recently proposed MILP approach in [31] in terms of average solution times and are much more effective than general purpose quadratic programming and nonlinear programming solvers.

Since the tighter bound \(\ell _2(Q)\) requires a considerable computational effort for large instances, one can use cheaper methods for computing or approximating this bound. For instance, the dual problem of (DNN) can be solved using a combination of proximal gradient method and binary search (see, e.g., [23]), which may be cheaper than using a semidefinite programming solver. Alternatively, based on the encouraging computational results reported in [25], a tight linear programming based lower bound can be employed in lieu of \(\ell _2(Q)\). We leave these problems for future work.

Another interesting research direction is the investigation of decomposition and cutting plane approaches for the proposed MILP formulations arising from large-scale standard quadratic programs. In addition, encouraged by our computational results, we intend to identify other classes of nonconvex quadratic programs that would be amenable to a similar effective MILP formulation.