# On fast trust region methods for quadratic models with linear constraints

- 1.3k Downloads
- 10 Citations

## Abstract

Quadratic models \(Q_k ( \underline{x}), \underline{x}\in \mathcal{R}^n\), of the objective function \(F ( \underline{x}), \underline{x}\in \mathcal{R}^n\), are used by many successful iterative algorithms for minimization, where *k* is the iteration number. Given the vector of variables \(\underline{x}_k \in \mathcal{R}^n\), a new vector \(\underline{x}_{k+1}\) may be calculated that satisfies \(Q_k ( \underline{x}_{k+1} ) < Q_k ( \underline{x}_k )\), in the hope that it provides the reduction \(F ( \underline{x}_{k+1} ) < F ( \underline{x}_k )\). Trust region methods include a bound of the form \(\Vert \underline{x}_{k+1} - \underline{x}_k \Vert \le \Delta _k\). Also we allow general linear constraints on the variables that have to hold at \(\underline{x}_k\) and at \(\underline{x}_{k+1}\). We consider the construction of \(\underline{x}_{k+1}\), using only of magnitude \(n^2\) operations on a typical iteration when *n* is large. The linear constraints are treated by active sets, which may be updated during an iteration, and which decrease the number of degrees of freedom in the variables temporarily, by restricting \(\underline{x}\) to an affine subset of \(\mathcal{R}^n\). Conjugate gradient and Krylov subspace methods are addressed for adjusting the reduced variables, but the resultant steps are expressed in terms of the original variables. Termination conditions are given that are intended to combine suitable reductions in \(Q_k ( \cdot )\) with a sufficiently small number of steps. The reason for our work is that \(\underline{x}_{k+1}\) is required in the LINCOA software of the author, which is designed for linearly constrained optimization without derivatives when there are hundreds of variables. Our studies go beyond the final version of LINCOA, however, which employs conjugate gradients with termination at the trust region boundary. In particular, we find that, if an active set is updated at a point that is not the trust region centre, then the Krylov method may terminate too early due to a degeneracy. An extension to the conjugate gradient method for searching round the trust region boundary receives attention too, but it was also rejected from LINCOA, because of poor numerical results. The given techniques of LINCOA seem to be adequate in practice.

## Keywords

Conjugate gradients Krylov subspaces Linear constraints Quadratic models Trust region methods## Mathematics Subject Classification

65K05 90C20 90C26 90C56 49M37## 1 Introduction

*k*, we define \(\underline{x}_k\) to be the point that, at the start of the

*k*-th iteration, has supplied the least calculated value of the objective function so far, subject to \(\underline{x}_k\) being feasible. If this point is not unique, we choose the candidate that occurs first, so \(\underline{x}_{k+1} \ne \underline{x}_k\) implies the strict reduction \(F ( \underline{x}_{k+1} ) < F ( \underline{x}_k )\).

The main task of the *k*-th iteration is to pick a new vector of variables, \(\underline{x}_k^+\) say, or to decide that the sequence of iterations is complete. On some iterations, \(\underline{x}_k^+\) may be infeasible, in order to investigate changes to the objective function when moving away from a constraint boundary, an extreme case being when the boundary is the set of points that satisfy a linear equality constraint that is expressed as two linear inequalities. We restrict attention, however, to an iteration that makes \(\underline{x}_k^+\) feasible, and that tries to achieve the reduction \(F ( \underline{x}_k^+ ) < F ( \underline{x}_k )\). We also restrict attention to algorithms that employ a quadratic model \(Q_k ( \underline{x}) \approx F ( \underline{x}), \underline{x}\in \mathcal{R}^n\), and a trust region radius \(\Delta _k > 0\).

*k*-th iteration be the quadratic

When first derivatives of *F* are calculated, the choice \(\underline{g}_k = \underline{\nabla }F ( \underline{x}_k )\) is usual for the model function (1.2). Furthermore, after choosing the second derivative matrix \(H_1\) for the first iteration, the *k*-th iteration may construct \(H_{k+1}\) from \(H_k\) by the symmetric Broyden formula [see equation (3.6.5) of [3], for instance]. There is also a version of that formula for derivative-free optimization ([9]). It generates both \(\underline{g}_{k+1}\) and \(H_{k+1}\) from \(\underline{g}_k\) and \(H_k\) by minimizing \(\Vert H_{k+1} - H_k \Vert _F\) subject to some interpolation conditions, where the subscript “*F*” denotes the Frobenius matrix norm. The techniques that provide \(\underline{g}_k, H_k\) and \(\Delta _k\) are separate from our work, however, except for one feature. It is that, instead of requiring \(H_k\) to be available explicitly, we assume that, for any \(\underline{v}\in \mathcal{R}^n\), the form of \(H_k\) allows the matrix vector product \(H_{k\,} \underline{v}\) to be calculated in \(\mathcal{O}( n^2 )\) operations. Thus our work is relevant to the LINCOA Fortran software, where the expression for \(H_k\) includes a linear combination of about \(2n + 1\) outer products of the form \(\underline{y}_{\,} \underline{y}^T, \underline{y}\in \mathcal{R}^n\).

*F*is a strictly convex quadratic function. In all of these tests, the optimal vector of variables is calculated to high accuracy, using only about \(\mathcal{O}( n )\) values of

*F*for large

*n*, which shows that the updating formula is successful at capturing enough second derivative information to provide fast convergence. There is no need for \(\Vert H_k - \nabla ^{2\!} F ( \underline{x}_k ) \Vert \) to become small, as explained by [1] when \(\underline{\nabla }F\) is available. In the tests of [11] with a quadratic

*F*, every initial matrix \(H_1\) satisfies \(\Vert H_1 - \nabla ^{2\!} F \Vert _F > \frac{1}{2}_{\,} \Vert \nabla ^{2\!} F \Vert _F\). Further, although the sequence \(\Vert H_k - \nabla ^{2\!} F \Vert _F, k = 1,2, \ldots ,K\), decreases monotonically, where

*K*is the final value of

*k*, the property

*n*is large.

*F*is quadratic, then \(\Vert H_{k+1} - H_k \Vert \) tends to zero as

*k*becomes large, so it is usual on the later iterations for the error \(| Q_k ( \underline{x}_k^+ ) - F ( \underline{x}_k^+ ) |\) to be much less than a typical error \(| Q_k ( \underline{x}) - F ( \underline{x}) |, \Vert \underline{x}- \underline{x}_k \Vert \le \Delta _k\). Thus the reduction \(F ( \underline{x}_k^+ ) < F ( \underline{x}_k )\) is inherited from \(Q_k ( \underline{x}_k^+ ) < Q_k ( \underline{x}_k )\) much more often than would be predicted by theoretical analysis, if the theory employed a bound on \(| Q_k ( \underline{x}_k^+ ) - F ( \underline{x}_k^+ ) |\) that is derived only from the errors \(\Vert \underline{g}_k - \nabla F ( \underline{x}_k ) \Vert \) and \(\Vert H_k - \nabla ^{2\!} F ( \underline{x}_k ) \Vert \), assuming that

*F*is twice differentiable.

Suitable ways of choosing \(\underline{x}_k^+\) in the unconstrained case (\(m = 0\)) are described by [2] and by [10], for instance. They are addressed in Sect. 2, where both truncated conjugate gradient and Krylov subspace methods receive attention. They are pertinent to our techniques for linear constraints (\(m > 0\)), because, as explained in Sect. 3, active set methods are employed. Those methods generate sequences of unconstrained problems, some of the constraints (1.1) being satisfied as equations, which can reduce the number of variables, while the other constraints (1.1) are ignored temporarily. The idea of combining active set methods with Krylov subspaces is considered in Sect. 4. It was attractive to the author during the early development of the LINCOA software, but two severe disadvantages of this approach are exposed in Sect. 4. The present version of LINCOA combines active set methods with truncated conjugate gradients, which is the subject of Sect. 5. The conjugate gradient calculations are complete if they generate a vector \(\underline{x}\) on the boundary of the trust region constraint (1.3), but moves round this boundary that preserve feasibility may provide a useful reduction in the value of \(Q_k ( \underline{x})\). This possibility is studied in Sect. 6. Some further remarks, including a technique for preserving feasibility when applying the Krylov method, are given in Sect. 7.

## 2 The unconstrained case

There are no linear constraints on the variables throughout this section, *m* being zero in expression (1.1). We consider truncated conjugate gradient and Krylov subspace methods for constructing a point \(\underline{x}_k^+\) at which the quadratic model \(Q_k ( \underline{x}), \Vert \underline{x}- \underline{x}_k \Vert \le \Delta _k\), is relatively small. These methods are iterative, a sequence of points \(\underline{p}_{\ell }, \ell = 1,2, \ldots , L\), being generated, with \(\underline{p}_1 = \underline{x}_k\), with \(Q_k (\underline{p}_{\ell +1} ) < Q_k ( \underline{p}_{\ell } ), \ell = 1,2, \ldots , L - 1\), and with \(\underline{x}_k^+ = \underline{p}_L\). The main difference between them is that the conjugate gradient iterations are terminated if the \(\ell \)-th iteration generates a point \(\underline{p}_{\ell +1}\) that satisfies \(\Vert \underline{p}_{\ell +1} - \underline{x}_k \Vert = \Delta _k\), but both \(\underline{p}_{\ell }\) and \(\underline{p}_{\ell +1}\) may be on the trust region boundary in a Krylov subspace iteration. The second derivative matrix \(H_k\) of the quadratic model (1.2) is allowed to be indefinite. We keep in mind that we would like the total amount of computation for each *k* to be \(\mathcal{O}( n^2 )\).

If the gradient \(\underline{g}_k\) of the model (1.2) were zero, then, instead of investigating whether \(H_k\) has any negative eigenvalues, we would abandon the search for a vector \(\underline{x}_k^+\) that provides \(Q_k ( \underline{x}_k^+ ) < Q_k ( \underline{x}_k )\). In this case LINCOA would try to improve the model by changing one of its interpolation conditions. We assume throughout this section, however, that \(\underline{g}_k\) is nonzero.

*n*iterations (see [3], for instance).

The only task of a conjugate gradient iteration that requires \(\mathcal{O}( n^2 )\) operations is the calculation of the product \(H_{k\,} \underline{d}_{\ell }\). All of the other tasks of the \(\ell \)-th iteration can be done in only \(\mathcal{O}(n)\) operations, by taking advantage of the availability of \(H_{k\,} \underline{d}_{\ell -1}\) when \(\underline{\nabla }Q_k ( \underline{p}_{\ell } ), \beta _{\ell }\) and \(\underline{d}_{\ell }\) are formed. Therefore the target of \(\mathcal{O}( n^2 )\) work for each *k* is maintained if *L*, the final value of \(\ell \), is bounded above by a number that is independent of both *k* and *n*. Further, because the total work of the optimization algorithm depends on the average value of *L* over *k*, a few large values of *L* may be tolerable.

The two termination conditions that have been mentioned are not suitable for keeping *L* small. Indeed, if \(H_k\) is positive definite, then, for every \(\ell \ge 1\), the property \(Q_k ( \underline{p}_{\ell +1} ) < Q_k ( \underline{x}_k )\) implies \(\Vert \underline{p}_{\ell +1} - \underline{x}_k \Vert < \Delta _k\) for sufficiently large \(\Delta _k\). Furthermore, if \(\underline{\nabla }Q_k ( \underline{p}_{\ell +1} ) = 0\) is achieved in exact arithmetic, then \(\ell \) may have to be close to *n*, but it is usual for computer rounding errors to prevent \(\underline{\nabla }Q_k ( \underline{p}_{\ell +1} )\) from becoming zero. Instead, the termination conditions of LINCOA, which are described below, are recommended for use in practice. They require a positive constant \(\eta _1 < 1\) to be prescribed, the LINCOA value being \(\eta _1 = 0.01\).

*n*is small. In all other cases, \(\ell \) is increased by one for the next conjugate gradient iteration.

In the remainder of this section, we consider a Krylov subspace method for calculating \(\underline{x}_k^+\) from \(\underline{x}_k\), where *k* is any positive integer such that the gradient \(\underline{g}_k\) of the model (1.2) is nonzero. For \(\ell \ge 1\), the Krylov subspace \(\mathcal{K}_{\ell }\) is defined to be the span of the vectors \(H_k^{j-1} \underline{g}_k \in \mathcal{R}^n, j = 1,2, \ldots , \ell \), the matrix \(H_k^{j-1}\) being the \((j - 1)\)-th power of \(H_k = \nabla ^2 Q_k\), and \(H_k^0\) being the unit matrix even if \(H_k\) is zero. Let \(\ell ^*\) be the greatest integer \(\ell \) such that \(\mathcal{K}_{\ell }\) has dimension \(\ell \), which implies \(\mathcal{K}_{\ell } = \mathcal{K}_{\ell ^*}, \ell \ge \ell ^*\). We retain \(\underline{p}_1 = \underline{x}_k\). The \(\ell \)-th iteration of our method constructs \(\underline{p}_{\ell +1}\), and the number of iterations is at most \(\ell ^*\). Our choice of \(\underline{p}_{\ell +1}\) is a highly accurate estimate of the vector \(\underline{x}\) that minimizes \(Q_k ( \underline{x})\), subject to \(\Vert \underline{x}- \underline{x}_k \Vert \le \Delta _k\) and subject to the condition that \(\underline{x}- \underline{x}_k\) is in \(\mathcal{K}_{\ell }\). We find later in this section that the calculation of \(\underline{p}_{\ell +1}\) for each \(\ell \) requires only \(\mathcal{O}( n^2 )\) operations. An introduction to Krylov subspace methods is given by [2], for instance.

It is well known that each \(\underline{p}_{\ell +1}\) of the conjugate gradient method has the property that \(\underline{p}_{\ell +1} - \underline{x}_k\) is in the Krylov subspace \(\mathcal{K}_{\ell }\), which can be deduced by induction from Eqs. (2.2) and (2.3). It is also well known that the search directions of the conjugate gradient method satisfy \(\underline{d}_{\ell }^{\,T\!} H_{k\,} \underline{d}_j = 0, 1 \le j < \ell \). It follows that the points \(\underline{p}_{\ell }, \ell = 1,2,3, \ldots \), of the conjugate gradient and Krylov subspace methods are the same while they are strictly inside the trust region \(\{ \underline{x}: \Vert \underline{x}- \underline{x}_k \Vert \le \Delta _k \}\). We recall that the conjugate gradient method is terminated when its \(\underline{p}_{\ell +1}\) is on the trust region boundary, but the iterations of the Krylov subspace method may continue, in order to provide a smaller value of \(Q_k ( \underline{x}_k^+ )\).

## 3 Active sets

*m*being positive in expression (1.1). We apply an active set method, this technique being more than 40 years old (see [4], for instance). The active set \(\mathcal{A}\), say, is a subset of the indices \(\{ 1,2, \ldots , m \}\) such that the constraint gradients \(\underline{a}_j, j \in \mathcal{A}\), are linearly independent. Usually, until \(\mathcal{A}\) is updated, the variables \(\underline{x}\in \mathcal{R}^n\) are restricted by the equations

*j*is not in \(\mathcal{A}\), the constraint \(a_j^T \underline{x}\le b_j\) is ignored, until updating of the active set is needed to prevent a constraint violation. This updating may occur several times during the search on the

*k*-th iteration for a vector \(\underline{x}_k^+ = \underline{x}\) that provides a relatively small value of \(Q_k ( \underline{x})\), subject to the inequality constraints (1.1) and the bound \(\Vert \underline{x}- \underline{x}_k \Vert \le \Delta _k\). As in Sect. 2, the search generates a sequence of points \(\underline{p}_{\ell }, \ell = 1,2, \ldots , L\), in the trust region with \(\underline{p}_1 = \underline{x}_k, \underline{x}_k^+ = \underline{p}_L\), and \(Q_k ( \underline{p}_{\ell +1} ) < Q_k ( \underline{p}_{\ell } ), \ell = 1,2, \ldots , L - 1\). Also, every \(\underline{p}_{\ell }\) is feasible. Let \(\mathcal{A}\) be the current active set when \(\underline{p}_{\ell +1}\) is constructed. We replace the Eq. (3.1) by the conditions

This replacement brings a strong advantage if one (or more) of the residuals \(b_j - \underline{a}_j^T \underline{x}, j = 1,2, \ldots , m\), is very small and positive, because it allows the indices of those constraints to be in \(\mathcal{A}\). The Eq. (3.1), however, require the choice of \(\mathcal{A}\) at \(\underline{x}= \underline{x}_k\) to be a subset of \(\{ j : b_j - \underline{a}_j^T \underline{x}_k = 0 \}\). Let \(b_i - \underline{a}_i^T \underline{x}_k\) be tiny and positive, where *i* is one of the constraint indices. If *i* is not in the set \(\mathcal{A}\), then the condition \(b_i - \underline{a}_i^T \underline{p}_2 \ge 0\) is ignored in the first attempt to construct \(\underline{p}_2\), so it is likely that the \(\underline{p}_2\) of this attempt violates the *i*-th constraint. Then \(\underline{p}_2\) would be shifted somehow to a feasible position in \(\mathcal{R}^n\), and usually the new \(\underline{p}_2\) would satisfy \(\underline{a}_i^T \underline{p}_2 = b_i\), with *i* being included in a new active set for the construction of \(\underline{p}_3\). Thus the active set of the *k*-th iteration may be updated after a tiny step from \(\underline{p}_1 = \underline{x}_k\) to \(\underline{p}_2\), which tends to be expensive if there are many tiny steps, the work of a typical updating being \(\mathcal{O}( n^2 )\). Therefore the conditions (3.2) are employed instead of the Eq. (3.1) in the LINCOA software, the actual choice of \(\mathcal{A}\) at \(\underline{x}_k\) being as follows.

*j*is in \(\mathcal{J}( \underline{x})\) if and only if the distance from \(\underline{x}\) to the boundary of the

*j*-th constraint is at most \(\eta _{2\,} \Delta _k\). The choice of \(\mathcal{A}\) at \(\underline{x}_k\) is a subset of \(\mathcal{J}( \underline{x}_k )\). As in Sect. 2, the step from \(\underline{p}_1 = \underline{x}_k\) to \(\underline{p}_2\) is along a search direction \(\underline{d}_1\), which is defined below. This direction has the property \(\underline{a}_j^{T} \underline{d}_1 \le 0, j \in \mathcal{J}( \underline{x}_k )\), in order that every positive step along \(\underline{d}_1\) goes no closer to the boundary of the

*j*-th constraint for every \(j \in \mathcal{J}( \underline{x}_k )\). It follows that, if the length of the step from \(\underline{p}_1\) to \(\underline{p}_2\) is governed by the need for \(\underline{p}_2\) to be feasible, then the length \(\Vert \underline{p}_2 - \underline{p}_1 \Vert \) is greater than \(\eta _{2\,} \Delta _k\), which prevents the tiny step that is addressed in the previous paragraph. This device is taken from the TOLMIN algorithm of [7].

*j*is in \(\mathcal{J}( \underline{x}_k )\) but not in \(\mathcal{I}( \underline{x}_k )\), then a sufficiently small perturbation to the

*j*-th constraint does not alter \(\underline{d}_1\). It follows that \(\underline{d}_1\) is also the vector \(\underline{d}\) that minimizes \(\Vert \underline{g}_k + \underline{d}\Vert _2\) subject to \(\underline{a}_j^{T} \underline{d}\le 0, j \in \mathcal{I}( \underline{x}_k )\). Furthermore, the definition of \(\mathcal{I}( \underline{x}_k )\) implies that \(\underline{d}_1\) is the \(\underline{d}\) that minimizes \(\Vert \underline{g}_k + \underline{d}\Vert _2\) subject to the equations

In the LINCOA software, \(\underline{d}_1\) is calculated by the Goldfarb-Idnani algorithm [5] for quadratic programming. A subset \(\mathcal{A}\) of \(\mathcal{J}( \underline{x}_k )\) is updated until it becomes the required active set. Let \(\underline{d}( \mathcal{A})\) be the vector \(\underline{d}\) that minimizes \(\Vert \underline{g}_k + \underline{d}\Vert _2\) subject to \(\underline{a}_j^ {T\!} \underline{d}= 0, j \in \mathcal{A}\). The vectors \(\underline{a}_j, j \in \mathcal{A}\), are linearly independent for every \(\mathcal{A}\) that occurs. Also, by employing the signs of some Lagrange multipliers, every \(\mathcal{A}\) is given the property that \(\underline{d}( \mathcal{A})\) is the vector \(\underline{d}\) that minimizes \(\Vert \underline{g}_k + \underline{d}\Vert _2\) subject to \(\underline{a}_j^{T} \underline{d}\le 0, j \in \mathcal{A}\). It follows that the Goldfarb–Idnani calculation is complete when \(\underline{d}= \underline{d}( \mathcal{A})\) satisfies all the inequalities (3.4). Otherwise, a strict increase in \(\Vert \underline{g}_k + \underline{d}( \mathcal{A}) \Vert _2\) is obtained by picking an index \(j \in \mathcal{J}( \underline{x}_k )\) with \(\underline{a}_j^{T} \underline{d}( \mathcal{A}) > 0\) and adding it to \(\mathcal{A}\), combined with deletions from \(\mathcal{A}\) if necessary to achieve the stated conditions on \(\mathcal{A}\). For \(k \ge 2\), the initial \(\mathcal{A}\) of this procedure is derived from the previous active set, which is called a “warm start”. Usually, when the final \(\mathcal{A}\) is different from the initial \(\mathcal{A}\), the amount of work of this part of LINCOA is within our target, namely \(\mathcal{O}( n^2)\) operations for each *k*.

Let *A* be the \(n \times | \mathcal{A}|\) matrix that has the columns \(\underline{a}_j, j \in \mathcal{A}\), where \(\mathcal{A}\) is any of the sets that occur in the previous paragraph. When \(\mathcal{A}\) is updated in the Goldfarb–Idnani algorithm, the QR factorization of *A* is updated too. We let \(\widehat{Q}R\) be this factorization, where \(\widehat{Q}\) is \(n \times | \mathcal{A}|\) with orthonormal columns and where *R* is square, upper triangular and nonsingular. Furthermore, an \(n \times ( n - | \mathcal{A}| )\) matrix \(\check{Q}\) is calculated and updated such that the \(n \times n\) matrix \(( \widehat{Q}_{\,} |_{\,} \check{Q})\) is orthogonal. We employ the matrices \(\widehat{Q}, R\) and \(\check{Q}\) that are available when the choice of the active set at \(\underline{x}_k\) is complete.

*k*, due to the first order conditions at the solution of a smooth optimization problem, and this cancellation occurs in the second term of the product \(\underline{d}_1 = -\check{Q}( \check{Q}^{T} \underline{g}_k )\). Fortunately, because the definition of \(\check{Q}\) provides \(\underline{a}_j^T \check{Q}= 0, j \in \mathcal{A}\), the vector (3.7) satisfies the constraints \(\underline{a}_j^{T} \underline{d}_1 = 0, j \in \mathcal{A}\), for every product \(\check{Q}^{T} \underline{g}_k\).

The important point of this example is that, when \(\mathcal{A}\) is updated at \(\underline{p}_2\), it is necessary not only to add a constraint index to \(\mathcal{A}\) but also to make a deletion from \(\mathcal{A}\). Furthermore, in all cases when \(\mathcal{A}\) is updated at \(\underline{x}= \underline{p}_{\ell }\), say, we want the length of the step from \(\underline{p}_{\ell }\) to \(\underline{p}_{\ell +1}\) to be at least \(\eta _{2\,} \Delta _k\) if it is restricted by the linear constraints. Therefore, if a new \(\mathcal{A}\) is required at the feasible point \(\underline{p}_{\ell }\), it is generated by the procedure that is described earlier in this section, after replacing \(\underline{g}_k\) by \(\underline{\nabla }Q_k ( \underline{p}_{\ell } )\) and \(\underline{x}_k\) by \(\underline{p}_{\ell }\). We are reluctant to update the active set at \(\underline{p}_{\ell }\), however, when \(\underline{p}_{\ell }\) is close to the boundary of the trust region. Indeed, if a new active set at \(\underline{p}_{\ell }\) is under consideration in the LINCOA software, then the change to \(\mathcal{A}\) is made if and only if the distance from \(\underline{p}_{\ell }\) to the trust region boundary is at least \(\eta _2 \Delta _k\), which is the condition \(\Vert \underline{p}_{\ell } - \underline{x}_k \Vert \le ( 1 - \eta _2 )_{\,} \Delta _k\). Otherwise, the calculation of \(\underline{x}_k^+\) is terminated with \(L = \ell \) and \(\underline{x}_k^+ = \underline{p}_{\ell }\).

## 4 Krylov subspace methods

*i*and

*j*. Furthermore, Eqs. (4.2) and (4.3) show that \(Q_k^{{\, \mathrm{red}}}( \underline{s}_{\ell +1} )\) is the same as \(Q_k ( \underline{p}_{\ell +1} )\), where \(\underline{p}_{\ell +1}\) is the point

The Krylov subspace method with the active set \(\mathcal{A}\) is now very close to the method described in the two paragraphs that include Eqs. (2.11)–(2.14). The first change to the description in Sect. 2 is that, instead of the form (2.1), \(\underline{p}_2\) is now the point \(\underline{p}_1 - \alpha _1 \check{Q}\check{Q}^{T} \underline{g}_k\), where \(\alpha _1\) is the value of \(\alpha \) that minimizes \(Q_k ( \underline{p}_1 - \alpha \check{Q}\check{Q}^{T} \underline{g}_k ), \alpha \in \mathcal{R}\), subject to \(\Vert \underline{p}_2 - \underline{x}_k \Vert \le \Delta _k\). Equation (2.10) is a consequence of the properties (4.5) and (4.12), so \(\nabla ^2 \Phi _{k \ell } ( \cdot )\) is still tridiagonal. The last \(\ell - 1\) components of the gradient \(\underline{\nabla }\Phi _{k \ell } (0) \in \mathcal{R}^{\ell }\) are still zero, but its first component is now \(\pm \Vert \check{Q}\check{Q}^{T} \underline{g}_k \Vert = \pm \Vert \check{Q}^{T} \underline{g}_k \Vert \). Finally, the Arnoldi formula (2.14) is replaced by expression (4.13). We retain termination if inequality (2.6) holds.

The Krylov method was employed in an early version of the LINCOA software. If the calculated point (4.7) violates an inactive linear constraint, then \(\underline{p}_{\ell +1}\) has to be replaced by a feasible point, and often the active set is changed. Some difficulties may occur, however, which are addressed in the remainder of this section. Because of them, the current version of LINCOA applies the version of conjugate gradients given in Sect. 5, instead of the Krylov method.

## 5 Conjugate gradient methods

The conditions of LINCOA for terminating the conjugate gradient steps for the current active set \(\mathcal{A}\) are close to the conditions in the paragraph that includes expressions (2.4)–(2.6). Again there is termination with \(\underline{x}_k^+ = \underline{p}_{\ell }\) if inequality (2.4) or (2.5) holds, where \(\hat{\alpha }_{\ell }\) is still the nonnegative value of \(\alpha \) such that \(\underline{p}_{\ell } + \alpha _{\,} \underline{d}_{\ell }\) is on the trust region boundary. Alternatively, the new point \(\underline{p}_{\ell +1} = \underline{p}_{\ell } + \alpha _{\ell \,} \underline{d}_{\ell }\) is calculated. If the test (2.6) is satisfied, or if \(\underline{p}_{\ell +1}\) is a feasible point on the trust region boundary, or if \(\underline{p}_{\ell +1}\) is any feasible point with \(\ell = n - | \mathcal{A}|\), then the conjugate gradient steps for the current iteration number *k* are complete, the value \(\underline{x}_k^+ = \underline{p}_{\ell +1}\) being chosen, except that \(\underline{x}_k^+ = \underline{p}_{\mathrm{new}}\) is preferred if \(\underline{p}_{\ell +1}\) is infeasible, as suggested in the first paragraph of this section. Another possibility is that \(\underline{p}_{\ell +1}\) is a feasible point that is strictly inside the trust region with \(\ell < n - | \mathcal{A}|\). Then \(\ell \) is increased by one in order to continue the conjugate gradient steps for the current \(\mathcal{A}\). In all other cases, \(\underline{p}_{\ell +1}\) is infeasible, and we let \(\underline{p}_{\mathrm{new}}\) be the point (5.1).

In these other cases, a choice is made between ending the conjugate gradient steps with \(\underline{x}_k^+ = \underline{p}_{\mathrm{new}}\), or generating a new active set at \(\underline{p}_{\mathrm{new}}\). We recall from the last paragraph of Sect. 3 that the LINCOA choice is to update \(\mathcal{A}\) if and only if the distance from \(\underline{p}_{\mathrm{new}}\) to the trust region boundary is at least \(\eta _2 \Delta _k\). Furthermore, after using the notation \(\underline{p}_1 = \underline{x}_k\) at the beginning of the *k*-th iteration as in Sect. 2, we now revise the meaning of \(\underline{p}_1\) to \(\underline{p}_1 = \underline{p}_{\mathrm{new}}\) whenever a new active set is constructed at \(\underline{p}_{\mathrm{new}}\). Thus the description in this section of the truncated conjugate gradient method is valid for every \(\mathcal{A}\).

*k*, which can be proved in the following way.

*k*.

*j*-th constraint be in the current \(\mathcal{A}\), which demands the condition

*j*-th constraint active for the remainder of the calculation. The

*j*-th constraint cannot be in the final active set, however, unless changes to \(\underline{p}_1\) cause \(( b_j - \underline{a}_j^T \underline{p}_1 ) / ( \eta _2 \Vert \underline{a}_j \Vert )\) to be at most the final value of \(\Delta _k\). On the other hand, while

*j*is in \(\mathcal{A}\), condition (3.2) shows that all the conjugate gradient steps give \(b_j - \underline{a}_j^T \underline{p}_{\ell +1} = b_j - \underline{a}_j^T \underline{p}_{\ell }\). Thus the index

*j*may remain in \(\mathcal{A}\) until \(\Delta _k\) becomes less than \(( b_j - \underline{a}_j^T \underline{p}_{\mathrm{curr}}) / ( \eta _2 \Vert \underline{a}_j \Vert )\). Then

*j*has to be removed from \(\mathcal{A}\), which allows the conjugate gradient method to generate a new \(\underline{p}_1\) that supplies the reduction \(b_j - \underline{a}_j^T \underline{p}_1 < b_j - \underline{a}_j^T \underline{p}_{\mathrm{curr}}\). If

*j*is the index of a linear constraint that is important to the final vector of variables, however, then

*j*will be reinstated in \(\mathcal{A}\) by yet another change to the active set.

*A*that satisfies \(A^{T} \underline{d}_{{\, \mathrm{perp}}} = \underline{r}\), where

*A*has the columns \(\underline{a}_j, j \in \mathcal{A}\), and where \(\underline{r}\) has the components \(b_j - \underline{a}_j^T \underline{p}_1, j \in \mathcal{A}\). We recall from near the middle of Sect. 3 that the QR factorization \(A = \widehat{Q}R\) is available, which assists the construction of \(\underline{d}_{{\, \mathrm{perp}}}\). Indeed, it has the form \(\underline{d}_{{\, \mathrm{perp}}} = \widehat{Q}\underline{s}\), and \(\underline{s}\) is defined by the equations

*R*. The new search direction \(\underline{d}_1\) of the extension is a nonnegative linear combination of \(\underline{d}_{{\, \mathrm{old}}}\) and \(\underline{d}_{{\, \mathrm{perp}}}\), as explained below.

## 6 Moves round the trust region boundary

The conjugate gradient method of Sect. 5 terminates at \(\underline{x}_k^+ = \underline{p}_{\ell +1}\) if \(\underline{p}_{\ell +1}\) is a feasible point on the boundary of the trust region, but usually a move round the boundary can generate another feasible point, \(\underline{x}_k^{++}\) say, that provides the strict reduction \(Q_k ( \underline{x}_k^{++} ) < Q_k ( \underline{x}_k^+ )\). In the case (4.19)–(4.20), for example, the conjugate gradient method yields \(\underline{x}_k^+ = (3,4,1)^T\) with \(Q_k ( \underline{x}_k^+ ) = -19.4\), although the point \(\underline{x}_k^{++} = (5,0,1)^T\) is feasible with \(Q_k ( \underline{x}_k^{++} ) = -25\), as mentioned at the end of Sect. 4. The early versions of the LINCOA software included an extension to the conjugate gradient method that seeks reductions in \(Q_k ( \cdot )\) by searching round the trust region boundary, which is described briefly below. Now, however, LINCOA has been made simpler by the removal of the extension, because of some numerical results that are given too in this section.

Equations (1.2) and (6.5) show that the function (6.6) is a trigonometric polynomial of degree two. The coefficients of this polynomial are generated, the amount of work when *n* is large being dominated by the need for the product \(H_k \underline{v}\), the product \(H_k \check{Q}\check{Q}^T ( \underline{x}_k^+ - \underline{x}_k )\) being available. We calculate an estimate, \(\theta ^*\) say, of the least positive value of \(\theta \) that satisfies \(\phi ^{\prime } ( \theta ) = 0\), the relative error of the estimate being at most 0.01. By considering every inactive constraint whose boundary is within distance \(\Delta _k\) of \(\underline{x}_k\), the value of \(\theta ^*\) is reduced if necessary so that all the points \(\underline{x}_k^+ + \underline{s}( \theta ), 0 \le \theta \le \theta ^*\), are feasible. Then we make the choice \(\underline{x}_k^{++} = \underline{x}_k^+ + \underline{s}( \theta ^* )\).

We take the view that \(\underline{x}_k^{++}\) is a new vector \(\underline{x}_k^+\). There are no more searches round the trust region boundary for the current *k* if an inactive constraint causes a decrease in \(\theta ^*\) in the previous paragraph, or if the change above to \(\underline{x}_k^+\) reduces \(Q_k ( \underline{x}_k^+ )\) by an amount that is at most the new value of \(\eta _1 \{ Q_k ( \underline{x}_k ) - Q_k ( \underline{x}_k^+ ) \}\), which corresponds to the test (2.6), or if condition (6.4) fails for the new \(\underline{x}_k^+\). Otherwise, another move is made from \(\underline{x}_k^+\) to a new \(\underline{x}_k^{++}\) in the way that has been described already. This procedure is continued until termination occurs.

*n*be even, and, for each \(\underline{x}\in \mathcal{R}^n\), we set the points

*n*/ 2 linear constraints, namely that every \(\underline{p}_i\) is in the triangle with the vertices (0, 0), (2, 0) and (0, 2). The initial positions of the points are chosen randomly within the triangle. For example, the left hand and middle parts of Fig. 2 show the initial random positions and the final calculated positions of the points in a case with \(n = 80\), while the right hand part of Fig. 2 shows calculated positions for a different random start. Both sets of final points satisfy to high accuracy the first order conditions for the solution of the test problem, but the numbers of final points that are strictly inside the triangle are different, the two final values of the objective function being \(F = 0.15626737\) and \(F = 0.15603890\). This test problem has several local minima, and LINCOA tries to find only one of them.

LINCOA requires not only an initial vector of variables but also the initial and final values of \(\Delta _k\), which are set to 0.1 and \(10^{-6}\) in these calculations. The value of NPT is also required, which is the number of interpolation conditions satisfied by each approximation \(Q_k ( \underline{x}) \approx F ( \underline{x}), \underline{x}\in \mathcal{R}^n\). The amount of routine work for each *k* is of magnitude NPT squared, due to a linear system of equations that supplies each new quadratic model, so it is helpful for NPT to be of magnitude *n* when *n* is large. The values \(\text{ NPT } = n + 6\) and \(\text{ NPT } = 2n + 1\) are compared in the numerical results of this section.

*n*, the cases being distinguished by different random choices of the initial vector of variables. For each application of LINCOA, we let #

*F*be the number of calculations of the objective function, and we let #TRI (Trust Region Iterations) be the number of iterations that construct \(\underline{x}_k^+\) by the truncated conjugate gradient method of Sect. 5, with or without searches round the trust region boundary. The second and third columns of the tables show the averages to the nearest integer of #

*F*and #TRI over the five cases for each

*n*. We recall that every step in the construction of \(\underline{x}_k^+\) requires a vector to be multiplied by the matrix \(H_k = \nabla ^2 Q_k\), which is the most expensive part of every step. For each

*k*, the number of multiplications is in the range [1, 3], [4, 10] or \([11, \infty )\). The number of occurrences of each range is counted for each test problem, the sum of these numbers being #TRI. These counts are also averaged to the nearest integer over the five cases for each

*n*, the results being shown in the last three columns of the tables. Two versions of LINCOA are employed, the first one being the current version that is without searches round the trust region boundary, and the second one being the extension of the current version that includes the boundary searches and termination conditions of this section. The main entries in the table have the form

*p*/

*q*, where

*p*and

*q*are the averages of the version of LINCOA that is without and with boundary searches, respectively. Good accuracy is achieved throughout the experiments of Tables 1 and 2, the greatest residual of a KKT first order condition at a final vector of variables being about \(3 \times 10^{-5}\).

The points in triangle problem with \(\text{ NPT } = n + 6\)

| # | #TRI | [1, 3] | [4, 10] | \([ 11, \infty )\) |
---|---|---|---|---|---|

10 | 144/127 | 100/91 | 96/68 | 5/19 | 0/3 |

20 | 483/451 | 324/318 | 243/101 | 80/198 | 0/19 |

40 | 1472/2270 | 983/1554 | 822/430 | 161/881 | 0/243 |

80 | 7324/5873 | 4861/3927 | 4555/2183 | 304/1433 | 2/311 |

160 | 34815/33396 | 22641/22192 | 22327/17702 | 313/3790 | 2/700 |

320 | 226065/287538 | 146717/189130 | 146042/173010 | 667/13758 | 9/2362 |

The points in triangle problem with \(\text{ NPT } = 2n + 1\)

| # | #TRI | [1, 3] | [4, 10] | \([ 11, \infty )\) |
---|---|---|---|---|---|

10 | 179/193 | 121/129 | 117/82 | 4/43 | 0/4 |

20 | 584/457 | 354/292 | 250/54 | 104/207 | 0/31 |

40 | 1472/1733 | 898/1078 | 608/115 | 287/667 | 3/296 |

80 | 5462/5838 | 3459/3741 | 2583/292 | 837/1716 | 39/1733 |

160 | 19645/26569 | 12588/18010 | 9766/2145 | 2726/6801 | 96/9064 |

320 | 73560/73061 | 48483/49428 | 39116/7878 | 9092/16559 | 275/24991 |

The results in the last rows of both tables are highly unfavourable to searches round the trust region boundary. We find in the \(n = 320\) row of Table 1 that the extra work of the searches causes #*F* to become worse by 27 %, while the 0.7 % improvement in #*F* in the \(n = 320\) row of Table 2 is very expensive. Indeed, although the conjugate gradient method is truncated after at most 3 steps in 39,116 out of 48,483 applications, the method with boundary searches requires more than 10 multiplications of a vector by \(H_k = \nabla ^2 Q_k\) in about half of its 49,428 applications. Furthermore, the boundary searches in the \(n = 80\) and \(n = 160\) rows of Table 2 also take much more effort, and they cause #*F* to increase by about 7 and 35%, respectively. Table 2 is more relevant than Table 1 for \(n \ge 80\) because its values of #*F* are smaller. Moreover, the #*F* entries in the \(n = 40\) rows of both tables suggest strongly that boundary searches are unhelpful. We give less attention to smaller values of *n*, because LINCOA is designed to be particularly useful for minimization without derivatives when there are hundreds of variables, by taking advantage of the discovery that the symmetric Broyden updating method makes such calculations possible. Thus Tables 1 and 2 provide excellent support for the decision, taken in 2013, to terminate the calculation of \(\underline{x}_k^+\) by LINCOA when the steps of the conjugate gradient method reach the trust region boundary.

## 7 Further remarks and discussion

We begin our conclusions by noting that, when there are no linear constraints on the variables, the Krylov subspace method provides searches round the boundary of the trust region that compare favourably with the searches of Sect. 6. Let \(\widehat{\underline{p}}_{\ell +1} \in \mathcal{R}^n\) and \(\check{\underline{p}}_{\ell +1} \in \mathcal{R}^n\) be the points that are generated by the \(\ell \)-th step of the Krylov method and by the \(\ell \)-th step of the conjugate gradient method augmented by boundary searches, respectively, starting at the trust region centre \(\widehat{\underline{p}}_1 = \check{\underline{p}}_1 = \underline{x}_k\), but the greatest values of \(\ell \) for the two methods may be different. Because \(\check{\underline{p}}_ {\ell +1} - \underline{x}_k\) is in the linear space spanned by the gradients \(\underline{\nabla }Q_k ( \check{\underline{p}}_j), j = 1,2, \ldots , \ell \), it is also in the Krylov subspace \(\mathcal{K}_{\ell }\), defined in the complete paragraph between expressions (2.8) and (2.9), even if the dimension of \(\mathcal{K}_{\ell }\) is less than \(\ell \). Also the choice of \(\widehat{\underline{p}}_ {\ell +1}\) by the \(\ell \)-th step of the Krylov method satisfies \(\widehat{\underline{p}}_{\ell +1} - \underline{x}_k \in \mathcal{K}_{\ell }\), with the additional property that \(Q_k ( \widehat{\underline{p}}_{\ell +1} )\) is the least value of \(Q_k ( \underline{x}), \Vert \underline{x}- \underline{x}_k \Vert \le \Delta _k\), subject to \(\underline{x}- \underline{x}_k \in \mathcal{K}_{\ell }\). Thus the Krylov method enjoys the advantage \(Q_k ( \widehat{\underline{p}}_{\ell +1} ) \le Q_k ( \check{\underline{p}}_{\ell +1} )\) for every \(\ell \) that occurs for both methods. Moreover, the Krylov method terminates when \(\mathcal{K}_{\ell +1} = \mathcal{K}_{\ell }\) holds, which gives \(\ell \le n\) even if the other conditions for termination are ignored. Usually, however, the number of boundary searches in Sect. 6 can be made arbitrarily large by letting the parameter \(\eta _1\) in the test (6.4) be sufficiently small. These remarks suggest that the very poor results for searches in the last three columns of Tables 1 and 2 may be avoided if the Krylov method is applied.

Much of the effort in the development of LINCOA was spent on attempts to include the Krylov subspace method of Sect. 4 when there are linear constraints on the variables. Careful attention was given to the situation when, having chosen an active set at the point \(\underline{p}_1 \in \mathcal{R}^n\), which need not be the trust region centre \(\underline{x}_k\), the method generates the sequence \(\underline{p}_{j+1}, j = 1,2, \ldots , \ell \), in the trust region, and \(\underline{p}_{\ell +1}\) is the first point in the sequence that violates a linear constraint. If the function \(\phi ( \alpha ) = Q_k ( \underline{p}_{\ell } + \alpha \{ \underline{p}_{\ell +1} - \underline{p}_{\ell } \} ), 0 \le \alpha \le 1\), decreases monotonically, then the method in the first paragraph of Sect. 5 is recommended, which allows either termination with \(\underline{x}_k^+ = \underline{p}_{\mathrm{new}}\), or a change to \(\mathcal{A}\) with \(\underline{p}_{\mathrm{new}}\) becoming the starting point \(\underline{p}_1\) of the Krylov method for the new active set, \(\underline{p}_{\mathrm{new}}\) being the vector (5.1).

There are three excellent reasons for starting the Krylov method for the new active set \(\mathcal{A}\) at \(\underline{p}_1\) instead of at \(\widehat{\underline{x}}_k\) when \(\underline{p}_1\) is not a trust region centre. The point \(\widehat{\underline{x}}_k\) may be infeasible, the value \(Q_k ( \underline{p}_1 )\) is the least known value of \(Q_k (\underline{x}), \underline{x}\in \mathcal{R}^n\), so far subject to the linear constraints and the trust region bound, and the new \(\mathcal{A}\) has been chosen carefully so that a move from \(\underline{p}_1\) along the direction \(-\check{Q}\check{Q}^T \underline{\nabla }Q_k ( \underline{p}_1 )\) does not violate a constraint until the length of the move is greater than \(\eta _2 \Delta _k\). Moreover, while the sequence of Krylov steps is strictly inside the trust region, the steps are suitable, because they are the same as the conjugate gradient steps in Sect. 5. When the Krylov steps move round the boundary of the trust region, however, there is the very strong objection that the definition (7.5) of the Krylov subspaces is without any attention to the actual trust region boundary, this deficiency being shown clearly by the property \(\mathcal{K}_1 = \mathcal{K}_2\) in the example of the previous paragraph. Therefore, although it is argued in the first paragraph of this section that boundary searches by the Krylov method are superior to those of Sect. 6 in the case \(\underline{p}_1 = \underline{x}_k\), and although this argument is valid too in the unusual situation \(\underline{p}_1 = \widehat{\underline{x}}_k \ne \underline{x}_k\), we expect the crude searches of Sect. 6 to be better in the cases \(\underline{p}_1 \ne \widehat{\underline{x}}_k\). We recall also that, when \(\underline{p}_{\ell +1}\) is generated by the Krylov method, the uphill property \(( \underline{p}_{\ell +1} - \underline{p}_{\ell } )^T \underline{\nabla }Q_k ( \underline{p}_{\ell } ) > 0\) is possible, which causes difficulties if \(\underline{p}_{\ell +1}\) is infeasible. These disadvantages led to the rejection of the Krylov method from the LINCOA software, as mentioned earlier. Nevetheless, the description of the Krylov method with linear constraints in Sect. 4 may be useful, because, in many applications of LINCOA, most of the changes to \(\mathcal{A}\) occur at the beginning of an iteration, and then \(\underline{p}_1\) is at the trust region centre \(\underline{x}_k\).

The choice of the quadratic model \(Q_k ( \underline{x}), \underline{x}\in \mathcal{R}^n\), for each iteration number *k* is important, but it is outside the range of our work. Nevertheless, because Tables 1 and 2 in Sect. 6 compare \(\text{ NPT } = n + 6\) with \(\text{ NPT } = 2n + 1\), we comment briefly on the number of interpolation conditions. When the author began to investigate the symmetric Broyden method for minimization without derivatives, as reported in the last three paragraphs of [8], NPT was chosen to be \(\mathcal{O}( n )\) for large *n*, in order to allow the routine work of each iteration to be only \(\mathcal{O}( n^2 )\). Comparisons were made with \(\text{ NPT } = \frac{1}{2} ( n + 1 ) ( n + 2 )\), which is the number of degrees of freedom in a quadratic function. The finding that smaller values of NPT often provide much lower values of #*F* was a welcome surprise. For the smaller values, the second derivative matrix \(\nabla ^2 Q_k\) is usually very different from \(\nabla ^{2\!} F ( \underline{x}_k )\) at the end of the calculation, even when \(F ( \cdot )\) is quadratic. It seems, therefore, that quadratic models without good accuracy can be helpful to the choice of \(\underline{x}_{k+1}\). This view is supported by the following advantage of \(Q_k ( \cdot )\) over a linear approximation to \(F ( \cdot )\).

*k*-th iteration. If a linear approximation \(\Lambda _k ( \cdot ) \approx F ( \cdot )\), say, is employed for the prediction, and if \(\underline{\nabla }\Lambda _k ( \underline{x}_k )\) is nonzero, then the answer is affirmative for all vectors \(\underline{x}\) in the set \(\{ \underline{x}: \Lambda _k ( \underline{x}) < \Lambda _k ( \underline{x}_k ) \} \cap \{ \underline{x}: \Vert \underline{x}- \underline{x}_k \Vert \le \Delta _k \}\), which is half of the trust region on one side of the plane \(\{ \underline{x}: ( \underline{x}- \underline{x}_k )^T \underline{\nabla }\Lambda _k ( \underline{x}_k ) = 0 \}\). On the other hand, a typical quadratic model \(Q_k ( \cdot )\) is subject to the interpolation conditions

*t*is an integer in \([1, \text{ NPT } ]\) that satisfies \(F ( \underline{y}_t ) \le F ( \underline{y}_i ), i = 1,2, \ldots , \text{ NPT }\). Thus the set \(\{ \underline{x}: Q_k ( \underline{x}) < Q_k ( \underline{x}_k ) \}\) is usually very different from a half plane, especially if \(\underline{x}_k\) is a strictly interior point of the convex hull of the interpolation points. Indeed, the set excludes a neighbourhood of every \(\underline{y}_i\) with \(F ( \underline{y}_i ) > F ( \underline{x}_k )\), and searches for relatively small values of \(Q_k ( \cdot )\) stay away automatically from the current interpolation points. Quadratic models with NPT of magnitude

*n*are obvious candidates for providing this useful feature. Furthermore, because symmetric Broyden updating takes up the freedom in each new model by minimizing the change to the model in a particular way, some helpful properties of the old model can be inherited by the new one, although \(\nabla ^2 Q_k\) may be a very bad estimate of \(\nabla ^2 F ( \underline{x}_k )\). The author is enthusiastic about such models, because of their success in his software for optimization without derivatives when there are hundreds of variables.

Updating the quadratic model is an example of a subject that is fundamental to the development of algorithms such as LINCOA, but the subject is separate from our present work. Another fundamental subject that has not received our attention is the choice of vectors \(\underline{x}\) for the calculation of new values of \(F ( \underline{x})\) on iterations that are designed to improve the quadratic model, instead of trying to achieve the reduction \(F ( \underline{x}) < F ( \underline{x}_k )\). The number of these “model iterations” is about \(( \#F - \#\text{ TRI } - \text{ NPT } )\) in Tables 1 and 2. Therefore our paper is definitely not intended to be a description of LINCOA, although it may be welcomed by users of LINCOA, because that description has not been written yet. Instead, we have studied the investigations for LINCOA into the construction of feasible vectors \(\underline{x}\) that provide a sufficiently small value of \(Q_k ( \underline{x}), \Vert \underline{x}- \underline{x}_k \Vert \le \Delta _k\), subject to linear constraints. Most of the efforts of those investigations, which have taken about 2 years, were spent on promising techniques that have not been included in the software. The prime example is the Krylov subspace method, which was expected to perform better than truncated conjugate gradients, due to its attractive way of taking steps round the trust region boundary in the unconstrained case. The reason for giving so much attention to failures of the Krylov method is that our findings may be helpful to future research.

## Notes

### Acknowledgments

The author is very grateful to the Liu Bie Ju Centre and the Department of Mathematics at the City University of Hong Kong for excellent facilities and support, while he was investigating versions of the Krylov subspace method for the LINCOA software.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## References

- 1.Broyden, C.G., Dennis, J.E., Moré, J.J.: On the local and superlinear convergence of quasi-Newton methods. J. Inst. Math. Appl.
**12**, 223–245 (1973)zbMATHMathSciNetCrossRefGoogle Scholar - 2.Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. MPS/SIAM Series on Optimization, SIAM (Philadelphia) (2000)Google Scholar
- 3.Fletcher, R.: Practical Methods of Optimization. Wiley, Chichester (1987)zbMATHGoogle Scholar
- 4.Gill, P.E., Murray, W.: Numerical Methods for Constrained Optimization. Academic Press, London (1974)Google Scholar
- 5.Goldfarb, D., Idnani, A.: A numerically stable dual method for solving strictly quadratic programs. Math. Program.
**27**, 1–33 (1983)MathSciNetCrossRefGoogle Scholar - 6.Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput.
**4**, 553–572 (1983)zbMATHCrossRefGoogle Scholar - 7.Powell, M.J.D.: A tolerant algorithm for linearly constrained optimization calculations. Math. Program. B
**45**, 547–566 (1989)zbMATHCrossRefGoogle Scholar - 8.Powell, M.J.D.: On trust region methods for unconstrained minimization without derivatives. Math. Program. B
**97**, 605–623 (2003)zbMATHCrossRefGoogle Scholar - 9.Powell, M.J.D.: Least Frobenius norm updating of quadratic models that satisfy interpolation conditions. Math. Program. B
**100**, 183–215 (2004)zbMATHCrossRefGoogle Scholar - 10.Powell, M.J.D.: The NEWUOA software for unconstrained optimization without derivatives. In: Di Pillo, G., Roma, M. (eds.) Large-Scale Optimization, pp. 255–297. Springer, New York (2006)CrossRefGoogle Scholar
- 11.Powell, M.J.D.: Beyond symmetric Broyden for updating quadratic models in minimization without derivatives. Math. Program. B
**138**, 475–500 (2013)zbMATHCrossRefGoogle Scholar - 12.Powell, M.J.D.: LINCOA. http://en.wikipedia.org/wiki/LINCOA (2013). Accessed 22 May 2015