1 Introduction

We consider the quadratic programming problem with bound constraints

$$\begin{aligned} \begin{array}{lll} &{}\min &{}c^T x + \frac{1}{2} x^TGx\\ &{}\text {s.t. } &{}x\in \mathbf {x}, \end{array} \end{aligned}$$
(1)

where \(c \in {\mathbb {R}}^n\), G is a symmetric \(n\times n\) matrix, not necessarily semidefinite,

$$\begin{aligned} \mathbf {x} := [\underline{x},\overline{x}] = \{x \in {\mathbb {R}}^n\mid \underline{x}_i \le x_i \le \overline{x}_i,\ i=1,\dots ,n\} \end{aligned}$$
(2)

is a box in \({\mathbb {R}}^n\), and infinite bounds are permitted. Quadratic programming problems with simple bounds form an important class of nonlinear optimization problems, including for positive definite G linear least squares problems with nonnegativity constraints and the dual problems of linearly constrained least distance problems, and for indefinite G a number of combinatorial problems such as the maximum clique problem or the maximum cut problem (see, e.g., Floudas and Pardalos [4]). If G is positive semidefinite then every local minimizer is a global minimizer. On the other hand, the global optimization problem is quite difficult in the indefinite case, where the reliable solution needs quite different techniques (relaxation and branch and bound); see, e.g., several articles in the collection by Lee and Leyffer [17]. Here we only discuss the local solution of indefinite quadratic programs, needed repeatedly in branch and bound frameworks for finding good starting points for a global solver.

There are many optimization algorithms designed for finding local optima of quadratic programs under various conditions. A fairly comprehensive list of algorithms is maintained by Gould and Toint [9]. For strictly convex bound constrained quadratic programs, many algorithms are available. For a comparison of solvers for strictly convex bound constrained quadratic programs see Voglis and Lagaris [24]. We are interested in two more general situations: bound constrained quadratic programs without restriction (they may be strictly convex, rank-deficient or indefinite), and strictly convex quadratic programs with general linear constraints. Because of duality, the second problem class may be viewed as a special case of the first, hence is still significantly simpler than a general quadratic program.

MINQ5 [20] is a publicly available Matlab program for bound constrained quadratic programming and strictly convex general quadratic programming, based on rank 1 modifications. It finds a local minimizer of the problem (1). If G is positive semidefinite, any local optimizer is global, so it finds (in theory, assuming exact arithmetic) the global optimum. The method combines coordinate searches and subspace minimization steps. The latter solve safeguarded equality constrained QPs whenever the coordinate searches no longer change the active set. Rank 1 updates, in a format suited for both the dense and sparse case, are used to keep the linear algebra reasonably cheap.

Applications of MINQ5 to bound constrained quadratic programs include the calculation of the formation constants for copper(I)-chloride complexes [19], subpixel land cover mapping [3], an optimal control model of redundant muscles in step-tracking wrist movements [10], entropy estimation for high-dimensional finite-accuracy data [16], finding the best code to represent a speech signal [23], the calculation of a nonnegative sparsity induced similarity measure for cluster analysis of spam images [7], a method for calibrating the scores of biased reviewers [22], a model for the braking behavior of industrial robots [2], and finding the filter coefficients for a filtering technique to denoise the far-field pattern in the presence of noise [21]. Moreover, many nonlinear optimization techniques are based on solving auxiliary quadratic problems [6, 14, 18, 24, 25]. Kanzow and Petra [15] used MINQ5 to solve a trust-region subproblem in a scaled filter trust region method. Finally, MINQ5 is an integral part of the global optimization algorithm MCS [12] and the noisy optimization algorithm SNOBFIT [13].

In some of our applications we encountered bound constrained quadratic programs where MINQ5 turned out to be very slow. This provided the impetus for developing a new quadratic programming algorithm. Like MINQ5 it employs coordinate searches and subspace steps, but the latter use smaller subspaces than in MINQ5. In contrast to MINQ5, we do not use rank 1 updates but instead make direct factorizations. In analogy to the MINQ5 package [20] (named after Matlab 5), we call our new algorithm MINQ8. As MINQ5, our Matlab 8 implementation of MINQ8 solves both the general bound constrained quadratic program and the related problems discussed in Sect. 4. The software is freely available at http://www.mat.univie.ac.at/~neum/software/minq/, together with the drivers for the problem classes from Sect. 5.3.

In Sect. 2 we start with a motivation of the ingredients of MINQ8, introduce some notation, describe the main subprograms of the algorithm in detail in the order in which they are called in MINQ8, discuss the modifications needed if some bound is infinite and finally present MINQ8 as a whole. We examine properties of the points obtained by the algorithm and prove convergence under the assumption of strict convexity and special settings of the parameters in Sect. 3. In Sect. 4 we describe how MINQ8 can be applied to general positive definite quadratic programs. Finally, in Sect. 5 we make an extensive comparison of MINQ8, MINQ5, the algorithms contained in the quadprog function of Matlab, and NewtonKKTqp [1] on different test problems.

2 The MINQ8 algorithm

MINQ8 is designed to solve the problem of finding a local minimizer of a bound constrained (definite or indefinite) quadratic problem of the form

$$\begin{aligned} \begin{array}{ll} \min &{}f(x):=\gamma + c^Tx + \rho (Ax-b)\\ \text {s.t. } &{} x \in \mathbf {x}, \end{array}\end{aligned}$$
(3)

where \(\gamma \in {\mathbb {R}}\), \(c \in {\mathbb {R}}^n\), \(b \in {\mathbb {R}}^m\), \(A \in {\mathbb {R}}^{m\times n}\), \(\mathbf {x}\) is as in (2), and \(\rho : {\mathbb {R}}^m \rightarrow {\mathbb {R}}\) is given by

$$\begin{aligned} \rho (z):= \frac{1}{2} z^TDz, \end{aligned}$$

where \(D\in {\mathbb {R}}^{m\times m}\) is a diagonal matrix. Thus there are n variables and each residual \(Ax-b\) has m components. Note that this form of writing a quadratic function \(\gamma + c^Tx+\frac{1}{2} x^TGx\) is not a restriction since each symmetric matrix \(G \in {\mathbb {R}}^{n\times n}\) can be written as \(G = A^TDA\), \(A, D \in {\mathbb {R}}^{n\times n}\), D a diagonal matrix, using a spectral factorization or an \(LDL^T\) factorization of G.

By a well-known theorem by Frank and Wolfe [5], (3) has a finite solution if and only if \(\{f(x) \mid x\in \mathbf {x}\} \) is bounded below. In this case, the necessary conditions for local optimality are given by the Kuhn–Tucker conditions

$$\begin{aligned} g_i= & {} 0 \quad \hbox {if}~\underline{x}_i< x_i< \overline{x}_i,\\ g_i\ge & {} 0 \quad \hbox {if}~\underline{x}_i = x_i< \overline{x}_i,\\ g_i\le & {} 0 \quad \hbox {if}~\underline{x}_i < x_i= \overline{x}_i, \end{aligned}$$

where

$$\begin{aligned} g:= c+A^TD(Ax-b) \end{aligned}$$
(4)

denotes the gradient of f at x. Therefore the active set at a point x, defined as

$$\begin{aligned} K(x):=\{i \mid x_i=\underline{x}_i ~\hbox {or}~ x_i=\overline{x}_i\}, \end{aligned}$$

plays an important role. The variables \(x_i\) with \(i\in K(x)\) are called active (at the respective bound); the variables that are not active are called inactive (or free) variables. We say that two points \(x, y \in \mathbf {x}\) have the same activities iff \(K(x) = K(y)\) and \(x_i = y_i\ \forall i \in K(x) = K(y)\).

The easiest case is that of an unconstrained convex quadratic optimization problem, where the global optimizer can be computed easily since the Kuhn–Tucker conditions reduce to a linear system. If \(x \in {\mathbb {R}}^n\) with the corresponding gradient g and the Hessian matrix G, the global minimum is attained at \(x+p\) with \(Gp = -g\). On the other hand, in the non-convex case even the problem of checking the existence of a Kuhn–Tucker point is NP-hard; cf. Horst et al. [11]. If the unconstrained minimizer of a convex quadratic optimization problem is not contained in \(\mathbf {x}\) and in all other cases where the objective function is bounded on the box \(\mathbf {x}\), minimizers occur at the boundary. Therefore it is important to find the correct activities.

For any putative active set, the corresponding variables can be fixed, and the remaining part of the gradient can be set to zero by solving a linear system. If the Hessian is positive definite on the subspace, this gives the optimal point in the corresponding subspace and defines part of the subspacestep (with modifications described below). The best feasible point on the entire line through the current best point and this point (if these are distinct) defines the result of this step. Thus a two-sided line search is performed; a directional line search would not always be sufficient, since the Hessian on the subspace might be indefinite or negative definite.

The solution of these linear systems is the most time-consuming part of the algorithm; hence it pays to try to make the subspace dimension small before performing the subspace step. This is done using three different techniques.

To increase the active set we use (as in MINQ5) a greedy method fixbounds that fixes single variables as long as the objective function decreases.

A further dimension reduction is possible by noting that some coordinates are unlikely to contribute much to the subspace step. These coordinates are determined by a simple heuristic and form together with the active coordinates an extended active set. The subspace direction is then computed only for the lower-dimensional subspace spanned by the complementary reduced inactive set, and is augmented to a direction in the full inactive set by components pointing towards the more favorable bound of this component.

A more expensive special reductionstep is done in cases where this still leaves more reduced inactive variables than the number of rows of A. Indeed, in this case we can fix more bounds by an approximate minimization on the subspace where Ax is fixed. Compared to MINQ5, this is another theoretical innovation of the present algorithm.

These steps are alternated until subspacestep does not change the active set any more. Then we try to free some of the bounds, which is done in freebounds, where a two-sided line search is performed along a direction p that would be optimal in the case of a separable problem.

Notation. We first introduce some notation, which will be used in the remainder of this section. \(e^i\) denotes the ith standard unit vector of \({\mathbb {R}}^n\). We write \(A_{i,j}\) for the (ij)-entry of A. If I is a subset of \(\{1,\dots ,m\}\) and J a subset of \(\{1,\dots ,n\}\), \(A_{I,J}\) denotes the submatrix of A with the row indices taken from I and the column indices from J, i.e., \(A_{I,J} := (A_{i,j})_{i\in I,j\in J}\), \(A_{I,:}\) the submatrix \(A_{I,:}:=(A_{i,j})_{i\in I,1\le j \le n}\), and similarly \(A_{:,J}:= (A_{i,j})_{1\le i \le m,j \in J}\). If \(J = \{l\}\), we also write \(A_{:,l}\) instead of \(A_{:,J}\), analogously for the other cases of one-element index sets. \(x_J\) is the subvector of x with the indices taken from the set J, and the vector \(|x|:= (|x_1|,\dots ,|x_n|)^T\) is obtained by taking the absolute values of all components of x.

We also introduce the quantities

$$\begin{aligned} d_i := \frac{1}{2} A_{:,i}^TDA_{:,i},\quad i=1,\dots ,n. \end{aligned}$$
(5)

They are independent of the point x and stay constant during the algorithm. On the other hand, the gradient (4) must be computed at each point x. The minimum of the difference

$$\begin{aligned} \varDelta f:=f(x+\alpha e^i) - f(x) = \alpha g_i + \alpha ^2d_i \end{aligned}$$

in function value when changing the ith coordinate within the box, so that \(x+\alpha e^i \in \mathbf {x}\), is attained at one of

$$\begin{aligned} \alpha= & {} \underline{\alpha }_i := \underline{x}_i-x_i, \quad \varDelta f = \underline{\varDelta }f_i := \underline{\alpha }_i(g_i+\underline{\alpha }_id_i), \end{aligned}$$
(6)
$$\begin{aligned} \alpha= & {} \overline{\alpha }_i := \overline{x}_i-x_i,\quad \varDelta f = \overline{\varDelta }f_i := \overline{\alpha }_i(g_i+\overline{\alpha }_id_i), \end{aligned}$$
(7)
$$\begin{aligned} \alpha= & {} \hat{\alpha }_i := -\frac{g_i}{2d_i},\quad \varDelta f = \hat{\varDelta }f_i := \frac{g_i}{2} \hat{\alpha }_i = - \frac{g_i^2}{4d_i}, \end{aligned}$$
(8)

where the minimizer can be different from \(\underline{\alpha }_i\) and \(\overline{\alpha }_i\) only if \(d_i > 0\) and \(\underline{\alpha }_i< \hat{\alpha }_i <\overline{\alpha }_i\).

In the first five subsections of this section, we describe the ingredients of the algorithm in detail under the condition that all bounds are finite. Subsequently we deal with the possibility of infinite bounds, and in the final subsection the MINQ8 algorithm is presented as a combination of its ingredients.

2.1 fixbounds

The subprogram fixbounds cyclically tries to fix as many variables at bounds as possible in order to be left with a not too large set of inactive variables to be optimized in the subsequent subspace step.

Let ind be an index vector that sorts the components of d in ascending order. Then the application of fixbounds to a point x is described in Algorithm 1 for the case that the function is bounded below, where \(\underline{\varDelta }f_i\) and \(\overline{\varDelta }f_i\) are defined by (6) and (7). Note that we use the equality sign for assignment as well as comparison in our pseudocodes.

figure a

After fixbounds has been completed it is not possible to improve the function value by fixing an additional bound. Let \(\overline{I}\) be the inactive set of the point x obtained by fixbounds. Then the minimum of \(f(x+\alpha e^i)\), \(x+\alpha e^i \in \mathbf {x}\), is attained for \(\alpha = \hat{\alpha }_i \in (\underline{\alpha }_i,\overline{\alpha }_i)\) for \(i\in \overline{I}\) and thus \(d_i > 0\), \(\underline{\varDelta }f_i \ge 0\), \(\overline{\varDelta }f_i \ge 0\), and \(\hat{\varDelta }f_i \le 0\), where these quantities are defined by (5)–(8).

2.2 The reduced inactive set

We now describe the heuristic screening step used for choosing a sensible reduced inactive set (a subset of the inactive set \(\overline{I}\) of x) and the complementary extended active set (a superset of the active set of x).

The idea is to incorporate some information from the bounds (which, in a full subspace step, would not influence the search direction). We compare for each inactive variable i of the point x obtained by fixbounds the (nonnegative) gains \(\underline{\varDelta }f_i\) and \(\overline{\varDelta }f_i\) of moving this variable to a bound with the (nonpositive) gain \(\hat{\varDelta }f_i\) when moving to the optimal interior value. The absolute value of the ratio is taken to be a heuristic indicator of how important the coordinate is for the subspace. A tunable threshold \(\kappa \) for this ratio is used to determine which inactive variables are assigned to the reduced inactive set.

The reduced inactive set is thus defined to be

$$\begin{aligned} I := \{i\in \overline{I}\mid \kappa |\hat{\varDelta }f_i| <\min (\underline{\varDelta }f_i,\overline{\varDelta }f_i)\} \end{aligned}$$

for some fixed \(\kappa \ge 0\), and the extended active set is its complement \(K := \{1,\dots ,n\} \setminus I\). In the special case that \(\kappa = 0\), the extended active and restricted inactive sets coincide with the active and inactive sets, respectively. Another special case is \(\kappa = 1\), where \(\underline{\varDelta }f_i\), \(\overline{\varDelta }f_i\), and \(|\hat{\varDelta }f_i| = -\hat{\varDelta }f_i\) are compared. For very large \(\kappa \), only the indices i with \(\hat{\varDelta } f_i = 0\) (i.e., \(x_i\) has its optimal value along coordinate i in the interior of \([\underline{x}_i,\overline{x}_i]\)) belong to the reduced inactive set and thus very large values of \(\kappa \) do not make sense.

A non-vectorized version of the algorithm of computing the reduced inactive and extended active sets is given in Algorithm 2.

figure b

2.3 reductionstep

Let I be the reduced inactive set of the current point x. If \(|I| > m\), \(G_{I,I}\) is singular and we reduce the number of elements of I by the following procedure to obtain \(|I| \le m\).

The idea is to perform an approximate minimization on the subspace \(Ax = Ax^0\), where \(x^0\) is the current best point. On this subspace, the objective function simplifies to \(c^Tx\) plus a constant, and all currently active variables are fixed. Thus proceeding along a descent direction (if there is one) of the linear program

$$\begin{aligned} \begin{array}{lll} \min &{} c^Tx\\ \hbox {s.t.} &{} Ax=Ax^0,\quad x\in \mathbf {x}\end{array} \end{aligned}$$

we are guaranteed to fix a bound, and this can be iterated in strongly polynomial time until \(|I| \le m\).

Noting that the case \(|I|>m\) is only possible for \(n > m\), we use LU factorizations of \(A_{:,I}^T\) and submatrices of \(A_{:,I}^T\) to determine maximal subsets \(I'\) of I and \(J'\) of \(\{1,\dots ,m\}\), \( m \ge r := |I'|=|J'|\), such that \(A_{J',I'}\) is invertible, and compute its inverse \(C := A_{J',I'}^{-1}\). The set \(J'':= \{1,\dots ,m\}\setminus J'\) might be empty, but \(I'' := I\setminus I' \ne \emptyset \) since \(|I|>m\).

We now define a vector \(u \in {\mathbb {R}}^n\) with \(Au = 0\) by assigning the subvectors \(u_{I'}\), \(u_{I''}\) and \(u_K\) (pertaining to the index sets \(I'\), \(I''\) and K, which form a disjoint partition of \(\{1,\dots ,n\}\)) as follows. Let \(u_{I''}=e^k\) for some \(k \in \{1,\dots ,|I''|\}\), and let \(u_{I'} := -CA_{J',I''}u_{I''}\) and \(u_{K} = 0\). Then we have \(A_{:,I''}u_{I''} = A_{:,l}\) for some \(l \in I''\) and

$$\begin{aligned} Au = A_{:,I'}u_{I'} + A_{:,I''}u_{I''} = -A_{:,I'}CA_{J',I''}u_{I''}+A_{:,l}=-A_{:,I'}CA_{J',l}+A_{:,l}. \end{aligned}$$

Since r is the (numerical) rank of \(A_{:,I}\) and \(A_{J',I'}\) is nonsingular, the columns of \(A_{:,I'}\) form a basis of the column space of \(A_{:,I}\). Hence \(A_{:,l}\) can be written as a linear combination

$$\begin{aligned} A_{:,l} = \sum _{j=1}^r \lambda _j A_{:,i_j} \end{aligned}$$

for appropriate \(\lambda _j\), where \(I'=\{i_1,\dots ,i_r\}\). Then

$$\begin{aligned} Au= & {} -A_{:,I'}CA_{J',l} + A_{:,l} = -\sum \nolimits _{j=1}^r \lambda _j A_{:,I'}CA_{J',i_j}+ A_{:,l}\\= & {} - \sum \nolimits _{j=1}^r \lambda _j A_{:,I'}e^{j} + A_{:,l} = - \sum \nolimits _{j=1}^r \lambda _j A_{:,i_j}+ A_{:,l} = 0, \end{aligned}$$

where we have used the definition of C.

Since \(A(x+\alpha u) = Ax\), minimizing \(c^T(x+\alpha u) + \rho (A(x+\alpha u))\) over the box \(\mathbf {x}\) amounts to minimizing \(\alpha c^Tu\) subject to \(x+\alpha u\in \mathbf {x}\), which is easy to do.

figure c

Putting things together results in Algorithm 3, where it is assumed that subsets \(J'\) of \(\{1,\dots ,m\}\) and \(I'\) of I such that \(A_{J',I'}\) is invertible have already been determined. \(|I''|\) choices for \(u_{I''}\) and thus for u are possible, but in the first for loop the algorithm just looks for one u that manages to fix at least one bound, and that will usually be the one with \(u_{I''} = e^1 \in {\mathbb {R}}^{|I''|}\) since the standard basis vectors are tried out in the natural order. Let \(L\subset I\) be the set of bounds that are fixed. In the second for loop the elements \(i \in L\) are transferred from I to K. If \(i \in I''\), this can be done directly, thus reducing \(I''\). In the case where \(i \in I'\), \(|I'|\) should stay the same. We therefore replace i by a \(j \in I''\) for which \(A_{J',\bar{I'}}\) is still invertible, where \(\bar{I'}:=(I'\cup \{j\})\setminus \{i\}\), and \(\bar{C}:=A_{J',\bar{I'}}^{-1}\) is computed from C with the aid of the Sherman–Morrison formula. To improve numerical stability, the index j is chosen such that the denominator in the Sherman–Morrison formula is maximal. This procedure is repeated until \(I = I'\), \(I''=\emptyset \). The indices that are transferred from I to K result in active indices of the new current point. The matrix \(A_{:,I}\) obtained at the end of the algorithm has full rank by construction, which is necessary (but not sufficient) for \(G_{I,I} = A_{:,I}^TDA_{:,I}\) to be nonsingular.

2.4 subspacestep

Now we have a reduced inactive set I with \(|I| \le m\) and an extended active set K, which is a superset of the active set \(K'\) of the current point x (the output of reductionstep if it has been called, otherwise the output of fixbounds) with the gradient g and the Hessian matrix G. I and K (or rather \(K_l\) and \(K_u\)) have been computed for \(x^\text {fb}\) (the point obtained by the last call to fixbounds) according to Algorithm 2 and reduced and augmented, respectively, by Algorithm 2.3 (if it is called). We define a search direction p as follows. For \(k\in K'\), we set \(p_k = 0\), and for \(k \in K \setminus K'\) we set \(p_k = \underline{x}_k - x_k\) or \(p_k = \overline{x}_k - x_k\) (the value that yields the smaller function value, cf. Sect. 2.2). The optimal step \(p_I\) for the reduced inactive variables is determined by \(G_{I,:}p = -g_I\), which results in solving

$$\begin{aligned} G_{I,I}p_I = -g_I -G_{I,K}p_K, \end{aligned}$$
(9)

where \(p_K\) denotes the already determined part of p pertaining to the extended active set. To improve numerical stability we regularize the matrix G by slightly increasing its diagonal entries. We set \(G_{i,i} = (1+\delta _1)G_{i,i}\) if \(G_{i,i}\ne 0\) and \(G_{i,i} = \delta _1\) if \(G_{i,i}=0\), where \(\delta _1\) is a fixed small nonnegative number. In the case \(\delta _1 = 0\) G remains unchanged, but choosing \(\delta _1 > 0\) might be advantageous for rank-deficient problems; cf. Sect. 5.1. The vector \(p_I\) is computed from (9) using the backslash operator in Matlab.

If the linear system does not have a well-defined solution, the remainder of the subspace step is skipped. Otherwise an exact two-sided line search is made to find the best \(\alpha \) with \(x+\alpha p \in \mathbf {x}\), and a new current point \(x+\alpha p\) is obtained. Note that \(\alpha \) can even be negative, as in case of an indefinite Hessian nothing guarantees that p is a descent direction. It may even point to a local maximum, from which we must move away!

Since \(p_i=0\) for active indices i, we always have \(x+\alpha p \in \mathbf {x}\) at least for small \(|\alpha |\). \(p=0\) is only possible if \(g_I=0\) and all indices in K are active. The pseudocode of the subspacestep algorithm is given in Algorithm  4, where the sets \(K_l\) and \(K_u\) are the extended lower and upper active sets, respectively, \(K = K_l \cup K_u\), and I is the reduced inactive set.

figure d

2.5 freebounds

In this subprogram \(f(x+\alpha p)\) is minimized for \(x+\alpha p \in \mathbf {x}\) with \(p_i = \alpha _i\), where \(\alpha _i \in \{ \underline{\alpha }_i,\overline{\alpha }_i,\hat{\alpha }_i\}\) with smallest \(\varDelta f\) as defined by (6)–(8). This choice of p would be optimal for separable quadratic problems. A non-vectorized version of this subprogram is given in Algorithm 5.

figure e

2.6 Infinite bounds

Infinite bounds need a special treatment. Let \(I_l\) be the set of i such that \(\underline{x}_i = -\infty \), let \(I_u\) be the set of i such that \(\overline{x}_i = \infty \), and assume that \(I_{\inf } := I_l \cup I_u \ne \emptyset \). If there is an \(i \in I_{\inf }\) such that \(d_i < 0\), the function is unbounded below on \(\mathbf {x}\) and the program returns an error flag and a finite vector x with huge norm and \(f(x) \ll 0\). Such a case can already be detected at the beginning of the algorithm.

For the case that \(d_i = 0\) for some \(i\in I_{\inf }\), the boundedness of the quadratic function on \(\mathbf {x}\) depends on the ith component \(g_i\) of the gradient, a quantity that varies with the point x. The function is unbounded below on \(\mathbf {x}\) if \(i \in I_l\) and \(g_i > 0\), or \(i \in I_u\) and \(g_i < 0\). Such a case can be detected in freebounds and is handled similarly as the unbounded case of \(d_i < 0\). The same is done in the case that an unbounded direction is found in an exact line search or in reductionstep, where setting \(\alpha \) to the boundary that minimizes \(c^T(x+\alpha u)\) in Algorithm 3 might identify an unbounded problem.

Since \(|I| > m\) in reductionstep can only occur in the case \(n > m\), where the Hessian matrix is singular, Algorithm 3 either finds a direction of infinite descent or manages to reduce |I|. The only exception is the case where \(|c^Tu|\) is so small that the search direction leaves \(c^Tx\) almost constant. In the case that \(|c^Tu|\le \delta _2 |c|^T|u|\) (where \(\delta _2\) is a fixed small positive number) and the minimization would yield a direction of infinite descent, we do not minimize \(\alpha c^Tu\) but set \(\alpha = 0\); if \(c^Tu = 0\) (the function is constant along the direction u) we also set \(\alpha = 0\). If that is the case for all possible choices of u, it is not possible to reduce I and subspacestep is carried out with this I.

For the changes needed in Algorithms 1–5 due to possible infinite bounds see the actual Matlab code at http://www.mat.univie.ac.at/~neum/software/minq/.

2.7 The MINQ8 algorithm

Now we are ready to describe the MINQ8 algorithm as a whole. MINQ8 performs four different subtasks, namely fixbounds, reductionstep, subspacestep, and freebounds. Each of these subprograms yields, starting from the point x produced by the previously called subprogram (or the initial point) with function value f, a point \(x^{\text {new}}\) with function value \(f_{\text {new}} < f\) or stays at \(x^{\text {new}} = x\), and \(x^{\text {new}}\) is used as input for the subsequent subtask. This is indicated by writing \(x = {\mathtt{subprogram}}(x)\) in the pseudocode for the subprograms of MINQ8 (we do not care for the other input and output parameters of the subprograms in the algorithm as a whole). The subtasks are carried out in an appropriate sequence according to Algorithm 6. Here \(\delta _3\) is a fixed small positive number to handle the stopping criterion for function values close to zero. This number and all other parameters in the algorithm will be specified in Sect. 5.

To avoid inefficient zigzagging, i.e., multiple freeing and fixing the same set of active variables in consecutive iterations, the subprogram freebounds is only called if the activities of the current points after the application of fixbounds and after the application of subspacestep are the same. This is equivalent to reductionstep (if applied) and subspacestep not generating any additional active variables (since it leaves active variables unchanged).

In addition to the possibility to identify the problem as unbounded at the beginning of the algorithm (before entering the loop in Algorithm 6) as described in Sect. 2.6, the algorithm checks at the beginning whether the function is constant along some coordinate axis. If \(c_i = 0\) and \(A_{:,i} = 0\) for some i, f does not depend on \(x_i\). In that case, \(x_i\) is fixed at its initial value and only the remaining variables are optimized.

figure f

The program stops either due to finding an unbounded direction (before entering the loop or in the loop), or by reaching the maximum number maxit of iterations, or due to an insufficient change in the function value given by the input parameter tol. Since a very small step is rarely followed by a large step, it is typical of optimization algorithms to terminate them when the gain in function value is too small. The algorithm returns the best point (i.e., the current point at termination), its function value, the number of times the loop in Algorithm 6 has been carried out, and an error flag indicating whether the problem was detected to be unbounded.

Due to the fact that the subtasks of the algorithm only accept a new point if it has a strictly smaller function value, a local minimizer returned might belong to a locally optimal ray. Moreover, if the function value has not been changed by MINQ8, the current point has not changed. Therefore, in the case that tol is set to zero and maxit to \(\infty \), the algorithm stops at a Kuhn–Tucker point; cf. Theorem 1.

3 Termination and convergence

In this section we prove several results concerning the termination of our algorithm. Theorem 1 deals with the case where a whole iteration of the main loop of MINQ8 does not change the point any more and the algorithm stops due to not making any progress in function value. To this end, in Lemma 1 we consider the case where reductionstep (if applied) and subspacestep have not changed the point, i.e., where some of the ingredients do not make any progress. In that case freebounds is called (since no change in the point implies no change in the activities) and is applied to the output of fixpoints.

Lemma 1

If freebounds is applied to the output of fixbounds, then the search direction p in freebounds is a descent direction or \(p = 0\).

Proof

Let x be the output of fixbounds. If \(\underline{x}_i< x_i < \overline{x}_i\) for some i, then for \(\underline{\alpha }_i\), \(\overline{\alpha }_i\), \(\underline{\varDelta }f_i\), and \(\overline{\varDelta }f_i\) defined by (6) and (7) we have \(\underline{\alpha }_i < 0\), \(\overline{\alpha }_i > 0\), \(\underline{\varDelta }f_i \ge 0\) and \(\overline{\varDelta }f_i \ge 0\). This implies \(-\,\overline{\alpha }_i d_i \le g_i \le -\underline{\alpha }_i d_i\) and thus \(d_i \ge 0\). If \(d_i=0\), we also have \(g_i = 0\) and \(p_i = 0\). For \(d_i>0\) we obtain \(\frac{1}{2} \underline{\alpha }_i \le -\frac{g_i}{2d_i}\le \frac{1}{2}\overline{\alpha }_i\). Then for \(\hat{\alpha }_i\) and \(\hat{\varDelta } f_i\) defined by (8) we have \( \underline{\alpha }_i< \hat{\alpha }_i < \overline{\alpha }_i\) and \(\hat{\varDelta }f_i < 0\), which implies \(p_i = \hat{\alpha }_i = -\frac{g_i}{2d_i}\).

If \(x_i = \underline{x}_i\), then \(\underline{\alpha }_i = 0\) and \(\overline{\alpha }_i > 0\). Then \(d_i \le 0\) implies \(p_i = \underline{\alpha }_i = 0\). \(p_i = -\frac{g_i}{2d_i}\) is only possible in the case that \(d_i > 0\) and \(0< -\frac{g_i}{2d_i} < \overline{\alpha }_i\), which yields \(g_i < 0\). Similarly, if \(x_i = \overline{x}_i\), we either have \(p_i = 0\) or \(p_i = -\frac{g_i}{2d_i}\), \(d_i > 0\), and \(g_i > 0\).

This shows that \(x + \alpha p \in [\underline{x},\overline{x}]\) for sufficiently small positive \(\alpha \). If \(p \ne 0\) then \(M := \{i\mid p_i \ne 0\}\) is nonempty, hence \(p^Tg = -\sum _{i \in M} \frac{g_i^2}{2d_i} < 0\). \(\square \)

If the algorithm stops with no change in the function value and thus the point, the following holds.

Theorem 1

If a point x is not changed by fixbounds, reductionstep, subspacestep, and freebounds, then x is a Kuhn–Tucker point, i.e., the reduced gradient \(g^{\text {red}}\) at x, defined by

$$\begin{aligned} g_i^{\text {red}} := {\left\{ \begin{array}{ll} g_i &{}\text {if }\underline{x}_i< x_i < \overline{x}_i,\\ \max (0,-g_i) &{}\text {if } x_i = \underline{x}_i,\\ \max (0,g_i) &{}\text {if } x_i = \overline{x}_i, \end{array}\right. } \end{aligned}$$
(10)

is zero. Here g is the gradient of f at x.

Proof

Let x be point that fixbounds, reductionstep, subspacestep, and freebounds leave the same. Then, by Lemma 1, the vector p defined by freebounds should be 0. Let \(i \in \{1,\dots ,n\}\). If \(\underline{x}_i< x_i <\overline{x}_i\), \(p_i = 0\) implies \(g_i = 0\). Let now \(x_i = \underline{x}_i\), i.e., \(0 = \underline{\alpha }_i < \overline{\alpha }_i\) with the definitions from (6) and (7). In the case \(d_i \le 0\), \(\overline{\varDelta }f_i =\overline{\alpha }_i(g_i + \overline{\alpha }_id_i) \ge 0\) yields \(g_i \ge -\overline{\alpha }_id_i\ge 0\). If \(d_i > 0\), we have \(-\,\frac{g_i^2}{4d_i} \le 0\), and \(p_i = 0\) implies \(\hat{\alpha }_i = -\frac{g_i}{2d_i} \le \overline{\alpha }_i = 0\), again yielding \(g_i \ge 0\). Similarly \(x_i = \overline{x}_i\) implies \(g_i \le 0\). \(\square \)

Under certain conditions, the result of subspacestep only depends on the activities.

Lemma 2

Assume that the parameters \(\kappa \) in Sect. 2.2 and \(\delta _1\) in Sect. 2.4 are set to 0 and \(x \in \mathbf {x}:=[\underline{x},\overline{x}]\) is the input of subspacestep. Let I be the inactive set of x, suppose that \(G_{I,I}\) is positive definite and \(K = \{1,\dots ,n\} \setminus I = K_l \cup K_u\) is the active set of x, where

$$\begin{aligned} K_l := \{i\in \{1,\dots ,n\}\mid x_i = \underline{x}_i\} \text { and } K_u := \{i\in \{1,\dots ,n\}\mid x_i = \overline{x}_i\}. \end{aligned}$$

Moreover, assume that

$$\begin{aligned} G_{I,I}^{-1}\left( A_{:,I}^TDb-c_I-G_{I,K}x_K\right) \in [\underline{x}_I,\overline{x}_I]. \end{aligned}$$

Then the point obtained by applying subspacestep to x only depends on \(K_l\) and \(K_u\).

Proof

Since \(K_l\) and \(K_u\) are given, \(x_K\) is uniquely determined. \(\kappa = 0\) implies that the reduced inactive set is equal to the inactive set I of x and thus the search direction p defined by subspacestep fulfils \(p_K=0\) for the active set \(K=K_l \cup K_u\) and \(G_{I,I}p_I = -g_I\), which by assumption yields the unique solution \(p_I = -G_{I,I}^{-1}g_I\). Then \(x_I+p_I = x_I - G_{I,I}^{-1}(c_I + G_{I,:}x-A_{:,I}^TDb) = -G_{I,I}^{-1}(c_I+G_{I,K}x_K-A_{:,I}^TDb)\), which is independent of \(x_I\) and in \([\underline{x}_I,\overline{x}_I]\) by assumption. The line search along \(x+\alpha p\) yields \(\alpha ^* = 1\) since \(x+p\) is the minimizer of f(y) s.t. \(y_K=x_K\), \(y_I \in [\underline{x}_I,\overline{x}_I]\). \(\square \)

Corollary 1

Assume that G is positive definite and the parameters \(\kappa \) in Sect. 2.2 and \(\delta _1\) in Sect. 2.4 are set to 0. If a point x with the correct activities has been found by fixbounds, then subspacestep yields the unique minimizer.

Since fixbounds, reductionstep and subspacestep cannot free any bounds and the number of active variables cannot increase indefinitely, it has to happen from time to time that reductionstep and subspacestep do not change the activities and freebounds is called.

Theorem 2

If G is positive definite and the parameters \(\kappa \) in Sect. 2.2 and \(\delta _1\) in Sect. 2.4 are set to 0, the algorithm stops after finitely many iterations even if \(\mathtt{tol}=0\) and \(\mathtt{maxit}=\infty \) and the global minimizer of the problem is obtained.

Proof

Let x be the current point before the application of freebounds, I its inactive set, K its active set and g the corresponding gradient. If subspacestep has not changed the active set, two cases are possible. In the first case we have \(p=0\), i.e., the search direction of subspacestep is zero. Then \(p_I = 0\), \(p_K = 0\) and \(G_{I,I}\) nonsingular by assumption imply \(g_I = c_I + G_{I,:}x-A_{:,I}^TDb=0\), and we obtain \(x_I = G_{I,I}^{-1}(A_{:,I}^TDb-c_I-G_{I,K}x_K)\), i.e., \(x_I\) is uniquely determined by the activities \(x_K\).

In the second case, the line search into the direction of the minimizer of f(y) s.t. \(y_K = x_K\) yields a point x with \(x_I\) in the interior of \([\underline{x}_I,\overline{x}_I]\), which implies that \(x_I\) is the optimal point in the subspace, i.e., again \(x_I = G_{I,I}^{-1}(A_{:,I}^TDb-c_I-G_{I,K}x_K)\).

Therefore, in both cases, only a single choice of \(x_I\) is possible for every choice of \(x_K\). The algorithm cannot cycle because the objective function is strictly decreasing from one iteration of Algorithm 6 to the next. Since there are only finitely many possible active sets, freebounds is called only finitely often. Therefore the algorithm stops, the assumptions of Theorem 1 are satisfied and we obtain a Kuhn–Tucker point, which is the global minimizer of the problem by assumption. \(\square \)

If \(\kappa > 0\), the reduced inactive set may be smaller than the full inactive set, while if \(\delta _1 > 0\), the Hessian is modified. In these two cases, as well as in the case that G is not positive definite, it is conceivable in exact arithmetic that we may perform infinitely many iterations involving the same sets \(K_u\) and \(K_l\). However, in floating-point arithmetic this is not possible since the objective function is strictly decreasing in each iteration. In practice, a properly chosen tolerance \(\kappa \) (and in some cases also a properly chosen \(\delta _1 > 0\), cf. the end of Sect. 5.1) improves the numerical stability of the algorithm and leads to a significant improvement in speed, without impairing the convergence properties to a local minimizer.

4 Solving strictly convex quadratic programs

We consider the strictly convex quadratic program subject to linear constraints

$$\begin{aligned} \begin{array}{ll} \min &{} f(x):=c^T x + \frac{1}{2} x^TGx\\ \text {s.t. } &{} (Ax)_I \ge b_I, \quad (Ax)_E = b_E, \end{array}\end{aligned}$$
(11)

where \(c \in {\mathbb {R}}^n\), G is a symmetric, positive definite \(n \times n\) matrix, \(A \in {\mathbb {R}}^{m\times n}\), \(b \in {\mathbb {R}}^m\) and (IE) is a partition of \(\{1,\dots ,m\}\) with \(|E| \le n\).

Theorem 3

Let x be a feasible point of (11) and let y be a feasible point of the bound constrained quadratic program

$$\begin{aligned} \begin{array}{ll} \min &{} d(y):=\frac{1}{2}(A^Ty-c)^TG^{-1}(A^Ty-c)-b^Ty\\ \text {s.t. } &{} y_{I} \ge 0. \end{array}\end{aligned}$$
(12)

Then

$$\begin{aligned} f(x)+d(y) \ge 0, \end{aligned}$$

with equality iff x and y are optimal solutions of (11) and (12), respectively. In this case,

$$\begin{aligned} Gx+c=A^Ty, \end{aligned}$$
(13)

and the complementarity conditions

$$\begin{aligned} (Ax-b)_iy_i=0\quad \hbox {for all } i\in I \end{aligned}$$
(14)

hold.

Proof

Write \(r:=A^Ty-c-Gx\). Then

$$\begin{aligned} f(x)+d(y)= & {} c^Tx+ \frac{1}{2} x^TGx+\frac{1}{2} (r+Gx)^TG^{-1}(r+Gx)-b^Ty\\= & {} (c+Gx)^Tx+\frac{1}{2} r^TG^{-1}r+r^Tx-b^Ty =\frac{1}{2} r^TG^{-1}r+y^T(Ax-b)\\\ge & {} y^T(Ax-b)=\displaystyle \sum _{i\in I}(Ax-b)_iy_i\ge 0. \end{aligned}$$

Equality is possible only if \(r=0\) and \((Ax-b)_iy_i=0\) for all \(i\in I\). \(\square \)

Together with the feasibility constraints, (13) and (14) are the common primal-dual optimality conditions for (11) and (12). Thus these problems are dual to each other, and we may find the solution of (11) by first solving (12) with a bound constrained solver and then computing

$$\begin{aligned} x = G^{-1}(A^Ty-c). \end{aligned}$$

This is how MINQ8 solves strictly convex problems of the form (11).

Since G is positive definite, (11) has a unique solution if it is feasible. However, (12) is not necessarily strictly convex, and may have a direction p of infinite descent. Since

$$\begin{aligned} d(y+\alpha p) =d(y)+\alpha \bigl ((A^Ty-c)^TG^{-1}A^T-b^T\bigr )p +\frac{1}{2} \alpha ^2 (A^Tp)^TG^{-1}A^Tp, \end{aligned}$$

this is possible only if

$$\begin{aligned} A^Tp=0,\quad b^Tp>0,\quad p_i\ge 0\quad \hbox {for all } i\in I. \end{aligned}$$
(15)

But in this case, any feasible x satisfies

$$\begin{aligned} 0>(Ax-b)^Tp=\sum _{i\in I}(Ax-b)_ip_i\ge 0, \end{aligned}$$

contradiction. Thus p defines a certificate of infeasibility for (11). (Numerically, due to rounding errors, one may find in place of a direction of infinite descent just a huge dual solution y with a very large negative objective function value. In this case, y is itself an approximate certificate of infeasibility, and one may check whether \(p=y\) satisfies (15) to sufficient accuracy to report infeasibility.)

In general, to obtain the form (3) required by MINQ8 one has to make a spectral factorization or an \(LDL^T\) factorization of G (cf. Sect. 2). If the sparsity pattern of G is not unfavorable, this can be done in very large dimensions.

However there are two special cases that deserve a separate handling.

  1. (i)

    In the special case where G is a diagonal matrix, the problem (12) has already the form required for MINQ8.

  2. (ii)

    Considering the even more specialized case where \(c=0\) and \(G=I_n\) in (11), we see that bound constrained optimization can also be used to find either the point with minimal Euclidean norm satisfying a system of linear equations and inequalities, or a certificate of infeasibility of this system.

5 Numerical examples

This section discusses extensive computer experiments evaluating the performance of the new algorithm. The numerical examples were carried out with MATLAB 8.3.0.532 (R2014a) for Linux on an Intel Core i5-4670S processor with 3.1 GHz. The default value for tol is \(\mathtt{tol} = 10^{-8}\). The small parameters \(\delta _1\), \(\delta _2\), and \(\delta _3\) from Sects. 2.42.6, and 2.7 were set to \(\delta _1 = 0\), \(\delta _2 = 10^{-12}\), and \(\delta _3 = 10^{-4}\), respectively. Moreover, \(\kappa \) in Sect. 2.2 was set to \(\kappa = 0.5\), which turned out to perform much better in practice than \(\kappa = 0\) (the extended active set coincides with the active set) and slightly better than \(\kappa = 1\), although the convergence proofs in Sect. 3 assume \(\kappa = 0\).

In Sects. 5.1 and 5.2 we consider problems of the form (3), and Sect. 5.3 is devoted to problems of the form (11). On the first two test sets we compare MINQ8 with the default value for tol with MINQ5, NewtonKKTqp, and the Matlab function quadprog. For all problems and algorithms (except for the interior-point-convex quadprog algorithm, which does not take a starting point), the zero vector is used as the starting point. If MINQ5 returns a nonzero error flag ier (if the maximal number iterations \({\mathtt{maxit}} = 10n\) is exceeded or the algorithm incorrectly assumes that the problem is unbounded below), another call to MINQ5 is made with the result from the previous call as the starting point.

The Matlab version NewtonKKTqp of the Newton-KKT interior method for indefinite quadratic programming by Absil and Tits [1] is designed to return a local minimizer of the indefinite quadratic function \(f(x) = \frac{1}{2} x^THx+c^Tx\) subject to the linear inequality constraints \(Ax \le b\), starting with a feasible initial point. We implemented the bound constraints as \(Ax \le b\), and in the case of an unconstrained problem, we set A to the \(2\times n\) zero matrix and \(b := (1,1)^T\) since the program exited with an error message if A was chosen to be empty or a one-row matrix.

The Matlab function quadprog contains three algorithms for quadratic programming: interior-point-convex, trust-region-reflective, and active-set. The interior-point-convex algorithm is the default algorithm in Matlab 8, but it handles only convex problems and does not allow the choice of an initial point. The trust-region-reflective algorithm is not applicable to unconstrained problems. We used an increased limit of 10,000 (instead of the default value 200) on the number of iterations for all examples.

Table 1 Problem characteristics of Test Set 1 (random sparse problems)

5.1 Test Set 1: random sparse problems

We consider \(A \in {\mathbb {R}}^{m\times n}\) sparse with 6 nonzero integer entries in \([-5,5]\) per row, \(c_i, b_j \in \{0,1,2,3,4,5\}\), \(i=1,\dots ,n\), \(j=1,\dots ,m\), \(\gamma = 0\), and \(|D_{i,i}| \in \{1,2,3,4,5\}\), \(i=1,\dots ,n\), where all entries are chosen randomly from uniform distributions. A was always chosen to have full rank. The same problem is solved without bounds (only in the positive definite case) or on the box \([-10,10]^n\), and the problems are modified by taking \(D = D_1\), \(D_1\) a positive definite diagonal matrix (i.e., \(D_{i,i} \in \{1,2,3,4,5\}\)), \(D = -D_1\) and \(D = ED_1\), where E is a diagonal matrix with diagonal elements randomly chosen from \(\{-1,1\}\). The problem characteristics n, m, the signs of \(D_{i,i}\) and the bounds are displayed in Table 1. An empty entry means that the quantity is the same as in the previous line. This table also contains the number of activities nact of the best solution found among the algorithms. If the lowest \(f_{\text {best}}\) (up to 5 significant digits) was also achieved by MINQ8, nact is the number of exact activities (since MINQ8 easily finds exact inactivities due to fixbounds), and MINQ5 also finds exact inactivities for Problem 1c. In contrast, NewtonKKT very rarely yields exact inactivities. If the lowest function value was achieved by NewtonKKT, we applied MINQ8 to the point obtained by NewtonKKT, which resulted in a point with a slightly lower function value, and we display the number of activities of that point. Note that in the case of concave problems with a singular Hessian (7–10b) it is possible for minimizers to occur also at points that are not vertices.

The results are presented in Table 2. The best function value \(f_{\text {best}}\) and the CPU time needed for executing the algorithm (excluding the time for preparing the input data) are given for all algorithms. In addition, we present the number nit of iterations for MINQ8 (the number of times the loop in Algorithm 6 has been executed), for NewtonKKTqp (where it is called nb_iter) and for quadprog, and the number nsub of subspace steps for MINQ5. An asterisk after the time indicates that NewtonKKTqp terminated with the warning “Computed a non-KKT stationary point”. For quadprog, we report the results obtained by the interior-point-convex algorithm for the convex (i.e., positive semidefinite) problems and the ones for the trust-region-reflective algorithm for the non-convex problems. The trust-region-reflective algorithm does not accept unconstrained problems and also does not find the global minimum for six of the ten bound constrained convex problems. The active-set quadprog algorithm yields lower \(f_{\text {best}}\) values than the trust-region-reflective algorithm for some of the non-convex problems, but it is very slow and sometimes does not even finish within a limit of 10,000 iterations.

Table 2 Results for Test Set 1 (random sparse problems)

Problem 1a is the only problem where the first call to MINQ5 returns the not yet optimal function value \(-\,1.5510\times 10^5\) and the error flag for a problem that is unbounded below, but the optimal function value and a zero error flag are obtained after the second call to MINQ5. If two calls to MINQ5 were made, nsub column of Table 2 contains two numbers separated by a plus sign.

All algorithms yield the unique optimal function value \(f_{\text {best}}\) (up to the number of digits presented here) for the six convex unconstrained problems (1–6a). Even though the bound constrained convex problems (1–6b and 7–10a) also possess a unique local and therefore also global minimum \(f_{\text {best}}\), MINQ8 and interior-point-convex quadprog are the only algorithms that always manage to find the optimal \(f_{\text {best}}\). MINQ5 finds the correct \(f_{\text {best}}\) only in the cases where the minima coincide with the unconstrained minima (3b, 5b, 6b), i.e., no bounds are active at the best point, and in a case where one bound is active (4b). NewtonKKTqp never achieves the correct \(f_{\text {best}}\) and sometimes, as explained at the beginning of this section, claims that it has found a non-KKT stationary point (which is actually a non-stationary point). So, in spite of using different parameter values than the ones in the convergence proof, MINQ8 does generally well on convex problems. For all unconstrained convex problems and for the convex bound constrained problems where the minima coincide with the unconstrained minima, MINQ8 only needs two iterations of the main loop, i.e., it already finds the minimizer after the first iteration (as supported by theory for \(\kappa = 0\)) and then makes a second one in order to be able to fulfil the stopping criterion that the function value does not change any more. However, for the rank-deficient positive semidefinite problems 7–10a, MINQ8 needs a high number of iterations compared to all other problems (and longer time than the interior-point-convex quadprog algorithm) since it seems to be hard to find the correct activities in such a case (a larger number of activities than for Problems 1–6b).

Non-convex problems usually have several local minimizers, and MINQ8 also finds good \(f_{\text {best}}\) values for many non-convex problems. Even though NewtonKKTqp frequently achieves the lowest \(f_{\text {best}}\) for non-convex problems, the points returned are not even local minimizers and the algorithm somehow gets stuck.

Since the performance of each algorithm is basically the same on each problem class of the problems reported here (and several more that are not reported), the random element in the problems does not seem to be crucial and we did not generate several problems with the same n and m.

We also want to show the behavior of MINQ8 for unbounded problems. Applying MINQ8 to the Problem 7a modified by minimizing the function on \({\mathbb {R}}^n\) instead of \([-10,10]^n\) yielded a warning “The problem is unbounded below”, a corresponding error flag and a finite point with function value \(f \approx -5\times 10^{307}\) before entering the loop in Algorithm 6 (\(\mathtt{nit}=0\)).

For some of the rank-deficient problems 7–10, in particular for the indefinite problems 7–10c, Matlab produced several warnings “Matrix is singular to working precision” when solving (9) with the backslash operator, and the same occurred twice for Problem 9a. This is a situation where setting \(\delta _1\) to a nonzero value makes sense since there are no Matlab warnings any more and thus the stability is improved. However, if \(\delta _1\) is set to a nonzero value, some of the unconstrained convex problems in this test set and in the following subsection might take three iterations to finish instead of two. In Table 3 we present the results for MINQ8 with \(\delta _1 = 10^{-10}\) for Problems 7–10; for all other problems the results do not change significantly. Different local minima are obtained for Problems 7–10c, but for the other problems \(f_\text {best}\) stays the same up to the number of digits displayed here.

Table 3 Results for MINQ8 with \(\delta _1 = 10^{-10}\) on some problems of Test Set 1

5.2 Test Set 2: quadratic problems from CUTEr

To complement the test set with random functions we coded some quadratic problems from the CUTEr test set [8]. We use the default bounds and the default starting point for all problems. The names, dimensions, bounds, starting points, and numbers of activities at the global minimizers of the problems are listed in Table 4. In order to save space in the presentation of the results, we number the problems from C1 to C18. For all problems we have \(m \ge n\).

Table 5 contains the results. A failure of NewtonKKTqp is indicated by a dash: DQDRTIC terminated with an error message, the 10,000-dimensional problems could not be solved with NewtonKKTqp due to limitations in memory, and CHENHARK produced a warning (large k!!!) and never finished. NCVXBQP1–3 and QUDLIN are the only non-convex problems in this test set. Since on this test set interior-point-convex quadprog occasionally fails to find the global minimum and most of the problems are convex, we report the results of interior-point-convex quadprog for all convex problems as well as the results of trust-region-reflective quadprog for all constrained problems.

Table 4 Problem characteristics of Test Set 2
Table 5 Results of Test Set 2

The active-set quadprog algorithm was also tested, but it is inferior to the quadprog results presented in Table 5. It does not solve the 10,000-dimensional problems due to memory limitations, and it takes hours to solve PENTDI and the 5000-dimensional QUDLIN problem.

5.3 Test Set 3: separable quadratic programs

To test the behavior of MINQ8 on problems of the form discussed in Sect. 4, we apply MINQ8 and MINQ5 to separable quadratic problems with linear constraints by solving the dual problems. For comparison we only apply quadprog to the problems since NewtonKKTqp requires a feasible starting point.

Example 1

We consider the minimization of a separable quadratic form subject to linear constraints

$$\begin{aligned} \begin{array}{lll} &{}\min &{}c^T x + \frac{1}{2} x^TDx\\ &{}\text {s.t. } &{}(Ax)_I \ge b_I, \quad (Ax)_E = b_E, \end{array} \end{aligned}$$
(16)

where \(c \in {\mathbb {R}}^n\), D is a positive definite \(n \times n\) diagonal matrix, \(A \in {\mathbb {R}}^{m\times n}\), \(b \in {\mathbb {R}}^m\) and (IE) is a partition of \(\{1,\dots ,m\}\) with \(|E| \le n\). This problem is a special case of (11) and we proceed as described in Sect. 4.

We construct the instances in the following way. We first choose the elements of \(A \in {\mathbb {R}}^{m\times n}\), \(c \in {\mathbb {R}}^n\), \(y_{\text {des}}, s \in {\mathbb {R}}^m\) and the diagonal elements of D from a uniform distribution on [0, 1]. Let \(I = N\cup N'\), where N is the set of nonactive and \(N'\) the set of active inequalities, and \(|N'|+|E| \le \min (m,n)\). Then we set \(y_{\text {des},N} = 0\), \(s_{E\cup N'}=0\), \(x_{\text {des}}:= D^{-1}(A^Ty_{\text {des}}-c)\), and \(b := Ax_{\text {des}}-s\). In the case \(N = \emptyset \) (which can only happen if \(m \le n\)), all indices are active or equalities and \(b = Ax_\text {des}\).

For the solution vector x we define the vector \(r \in {\mathbb {R}}^n\) by \(r_E := (Ax-b)_E\) and \(r_{I} := \min ((Ax-b)_{I},0)\), i.e., r measures the violation of the constraints. Moreover, we use the abbreviation \(\varDelta x := x-x_{\text {des}}\). For MINQ5, we use the program minqsep.m included in the MINQ5 package. It applies MINQ5 to the dual problem, and one step of iterative refinement is applied if \(|r| \le {\text {nnz}}(A)\varepsilon (|A||x|+|b|)\) (with componentwise absolute value and inequalities) is violated, where nnz(A) denotes the number of nonzeros of the matrix A and \(\varepsilon \) the machine precision. For MINQ8, we solve the dual problem with \({\mathtt{tol}} = \varepsilon = 2.2204\times 10^{-16}\) (the machine precision) instead of the default value since the default value does not yet yield the optimal \(f_{\text {best}}\) for the original problems for the test cases with \(I \ne \emptyset \). (Note that MINQ8 and MINQ5 are not applied to the original problems but to the dual ones.) We also apply one step of iterative refinement in the above-mentioned case.

Table 6 contains the problem characteristics of the problems of Example 1. For each value of (mn), we consider one instance with empty \(N'\) and one with nonempty \(N'\). In Table 7 the test results for MINQ8, MINQ5 and the interior-point-convex quadprog algorithm are presented; two problems cannot be solved with quadprog due to memory problems.

Table 6 Problem characteristics of Example 1
Table 7 Results for Example 1

All three algorithms find approximately feasible points with the same function value up to at least 5 significant digits (not displayed here). Most of the points are very close to \(x_{\text {des}}\), with the exception of a few solutions produced by MINQ8 and quadprog (usually the problems that converge most slowly). The active-set quadprog algorithm is faster than the interior-point-convex algorithm (but not faster than MINQ8) for 9 of the 12 problems that are solved by quadprog, but it takes a long time to solve S9 and also S3 and therefore we do not display the results. The trust-region-reflective quadprog algorithm handles problems with only bounds, or only linear equality constraints, but not both and is therefore not applicable.

Example 2

We consider the problem of finding a point of minimal norm satisfying a given system of linear inequalities,

$$\begin{aligned} \begin{array}{lll} &{}\min &{}\frac{1}{2} \Vert x\Vert _2^2\\ &{}\text {s.t. } &{}Ax \ge b. \end{array} \end{aligned}$$
(17)

This problem is a special case of (16) with \(c = 0\), \(D = I_n\), and \(E = \emptyset \), and we consider the three cases \(m<n\), \(m=n\) and \(m>n\). The entries of A and b are chosen from a uniform distribution on [0, 1]. We applied MINQ8, MINQ5, and the active-set quadprog algorithm and present the results in Table 8. (The interior-point-convex algorithm yielded suboptimal feasible points for all problems.) The three algorithms produce essentially the same solutions. We display the maximal violation \(\Vert r\Vert _{\infty }\) of the constraints and the CPU time needed for the computation for all three algorithms and the number of iterations for MINQ8. The last block of Table 8 contains the minimal function value \(\frac{1}{2}\Vert x\Vert _2^2\) of the solution and the number of constraints that are numerically active (\(|(Ax)_i-b_i|\le 10^{-8}\)); these quantities are the same (up to the number of digits presented here) for the successful algorithms. quadprog does not solve the three largest problems due to memory problems. All solutions satisfy the constraints very well.

Table 8 Results for Example 2

5.4 Discussion summary

The numerical examples show that the MINQ8 performs well and is competitive with other quadratic programming algorithms implemented in Matlab. MINQ8 is good at solving convex problems and usually finds good local minima for non-convex problems. The CPU times needed for the computations are competitive as well. MINQ8 and MINQ5 are the only of the tested algorithms that are applicable to all problems of our test sets; the other algorithms contain restrictions concerning the problem class and more severe memory restrictions in Matlab. MINQ8 solves bound constrained quadratic programs well with the default value of tol, but for the separable quadratic programs from Sect. 5.3 a lower default value of tol is needed. The default value of n (the dimension of the problem) for maxit turned out to be sufficient for our test problems as well. The fact that MINQ8 easily produces points with exact activities seems to be an advantage for bound constrained problems.