Quasi-Newton approaches to Interior Point Methods for quadratic problems

Interior Point Methods (IPM) rely on the Newton method for solving systems of nonlinear equations. Solving the linear systems which arise from this approach is the most computationally expensive task of an interior point iteration. If, due to problem's inner structure, there are special techniques for efficiently solving linear systems, IPMs enjoy fast convergence and are able to solve large scale optimization problems. It is tempting to try to replace the Newton method by quasi-Newton methods. Quasi-Newton approaches to IPMs either are built to approximate the Lagrangian function for nonlinear programming problems or provide an inexpensive preconditioner. In this work we study the impact of using quasi-Newton methods applied directly to the nonlinear system of equations for general quadratic programming problems. The cost of each iteration can be compared to the cost of computing correctors in a usual interior point iteration. Numerical experiments show that the new approach is able to reduce the overall number of matrix factorizations and is suitable for a matrix-free implementation.


Introduction
Let us consider the following general quadratic programming problem where x, c ∈ R n , b ∈ R m , Q ∈ R n×n and A ∈ R m×n . We will suppose that the rows of A are linearly independent. Define function F : R 2n+m → R 2n+m by where X, Z ∈ R n×n are diagonal matrices defined by X = diag(x) and Z = diag(z), respectively, and e is the vector of ones of appropriate size. First order necessary conditions for (1) state that, if x * ≥ 0 is a minimizer, then there exist z * ∈ R n , z * ≥ 0, and λ * ∈ R m such that F (x * , λ * , z * ) = 0. Primal-Dual IPMs try to solve (1) by solving a sequence of relaxed constrained nonlinear equations in the form of where µ ∈ R is called the barrier parameter, which is associated with the logarithmic barrier applied to the inequalities x ≥ 0 used to derive the method [1,2]. As µ → 0 more importance is given to optimality over feasibility. Systems of type (3) are not easy to solve. When µ = 0, they can be solved by general algorithms for bounded nonlinear systems [3,4]. In this case, a suitable merit function, usually F (x) , has to be used to select the step-sizes. IPMs try to stay near the solution of (3), called the central path, and reduce µ at each iteration. Instead of solving (3) exactly, one step of the Newton method is applied. Thus, given an iterate (x k , λ k , z k ), in the interior of the bound constraints, i.e.
x k , z k > 0, the next point is given by (x k+1 , λ k+1 , z k+1 ) = (x k , λ k , z k ) + (α P ∆x k , α D ∆λ k , α D ∆z k ), (4) where (∆x k , ∆λ k , ∆z k ) is computed by solving some Newton-like systems where v ∈ R 2n+m and J : R 2n+m → R (2n+m)×(2n+m) is the Jacobian of F , defined by Standard predictor-corrector algorithms solve (5) twice: first the affine scaling predictor is computed for v = −F (x k , λ k , z k ) and then the corrector step is computed using v = 0 0 σ k µ k e T , with σ k ∈ (0, 1), µ k = x k T z k /n. Additional correctors can be computed in one iteration to further accelerate convergence, such as second order correctors [5] or multiple centrality correctors [6]. Scalars α P and α D are selected such that x k+1 > 0 and s k+1 > 0, respectively. The most expensive task during an interior point (IP) iteration is to solve (5). The coefficient matrix J(x, λ, z) is known as unreduced matrix and has dimension (2n+m)×(2n+m), but its nice structure allows efficient solution techniques to be used. The most common approaches for solving the linear system in IPMs are to work with augmented system or normal equations. If we eliminate ∆s in (5), we have the augmented system for which we can solve directly using matrix factorizations or compute adequate preconditioners and solve iteratively by Krylov subspace methods. If matrix Q is easily invertible, or Q = 0 (linear programming problems), it is possible to further eliminate ∆x and solve the normal equations by Cholesky factorization or by Conjugate Gradients, depending on the size of the problem. For both approaches it is known that computing good preconditioners or computing the factorization can be most expensive part of the process. Therefore (5) can be solved several times for the same J(x k , λ k , z k ) with different right-hand sides, in a classical predictor-corrector approach [5] or in the multiple centrality correctors framework [1,7]. In this work we will extensively use the fact that the backsolves in (5) are less expensive than computing a good preconditioner or factorization.
Although J(x, y, s) is unsymmetric, under reasonable assumptions Greif, Moulding and Orban showed that it has only real eigenvalues [8]. Based on those results, Morini, Simoncini and Tani [9] developed preconditioners for the unreduced matrix and compared the performance of interior point methods using unreduced matrices and augmented system. The unreduced matrix has also two more advantages, when compared to augmented system and normal equations. First, small changes of variables x or z result in small changes in J(x, λ, z). Second, J is the Jacobian of F , so it is possible to approximate it by building models or evaluating F on some extra points. These two characteristics are explored in this work.
Since J is the Jacobian of F , it is natural to ask if it can be approximated by evaluating F in some points. Function F is composed by two linear and one nonlinear functions. Therefore, the only part of J which may change during iterations is the third row. Moreover, it can be efficiently stored by just storing A, Q, x and z. Since computing and storing J is inexpensive, the only reason to use an approximation B of J is if system (5), using B k instead of J(x k , λ k , z k ), becomes easier to solve. That is where quasi-Newton methods and low rank updates become an interesting tool in interior point methods.
Quasi-Newton methods are well known techniques for solving large scale nonlinear systems or nonlinear optimization problems. The main motivation is to replace the Jacobian used by the traditional Newton method by its good and inexpensive approximation. Originally, they were useful to avoid computing the derivatives of F , but they have become popular as a large scale tool, since they usually do not need to explicitly build matrices and enjoy superlinear convergence. Classical references for quasi-Newton methods are [10,11] for nonlinear equations and [12] for unconstrained optimization.
In the review [11] about practical quasi-Newton methods for solving non-linear equations, Martínez suggests that there is room for studying such techniques in the interior point context. The author points to the work of Dennis Jr., Morshedi and Turner [13] which applies quasi-Newton techniques to make the projections in Karmarkar's algorithm cheaper. The authors write the interpolation equations associated with the linear system in interior point iterations and describe a fast algorithm to compute updates and also to update an already existing Cholesky factorization. When solving general nonlinear programming problems by IPMs, a well known approach is to replace the Hessian of the Lagrangian function by low rank approximations [12]. In 2000, Morales and Nocedal [14] used quasi-Newton arguments to show that the directions calculated by the Conjugate Gradient algorithm can be used to build an automatic preconditioner for the matrix under consideration. The preconditioner is a sequence of rank-one updates of an initial diagonal matrix. Such approach is efficient when solving a sequence of linear systems with the same (or a slowly varying) coefficient matrix. Based on those ideas, a limited memory BFGS-like preconditioner for positive definite matrices was developed in [15] and was specialized for symmetric indefinite matrices in [16] . Recently, Bergamaschi et al. [17] developed limited-memory BFGS-like preconditioners to KKT systems arising from IP iterations and described their spectral properties. The approach was able to reduce the number of iterations in the Conjugate Gradient algorithm, but the approximation deteriorates as the number of interior point iterations increase. Also, extra linear algebra has to be performed to ensure orthogonality of the vectors used to build the updates.
In all works, with exception of [13], the main focus was to use low rank updates of an already computed preconditioner such that new preconditioners are constructed in an inexpensive way and reduce the overall number of linear algebra iterations. In the present work, our main objective is to work directly with nonlinear equations and use low rank secant updates for computing the directions in the IP iterations. We use least change secant updates, in particular Broyden updates, and replace the Newton system (5) by an equivalent one. Some properties of the method are presented and extensive numerical experiments are performed. The main features of the proposed approach are: • Low rank approximations are matrix-free and use only vector multiplications and additions; • The quasi-Newton method for solving (5) can be easily inserted into an existing IPM; • The number of factorizations is reduced for small and large instances of linear and quadratic problems; • When the cost of the factorization is considerably higher than the cost of the backsolves, the total CPU time is also decreased.
In Section 2 we discuss the basic ideas of quasi-Newton methods, in particular the Broyden method, which is extensively used in the work. In Section 3 we show that, if the initial approximation is good enough, least change secant updates preserve most of the structure of the true coefficient matrix and a traditional IP iteration can be performed with the cost of computing correctors only. New low rank secant updates, which are able to exploit the sparsity of J are also discussed. In Section 4 we describe the aspects of a successful implementation of a quasi-Newton interior point method. In Section 5 we compare our approach with a research implementation of the primal-dual IPM for solving small-and medium-sized linear and quadratic problems. Finally, in Section 6 we draw the conclusions and mention possible extensions of the method.
Notation. Throughout this work we use F k and J k as short versions of vector F (x k , λ k , z k ) and matrix J(x k , λ k , z k ), respectively. The vector e denotes the vector of ones of appropriate dimension.

Background for quasi-Newton methods
Quasi-Newton methods can be described as algorithms which use approximations to the Jacobian in the Newton method in order to solve nonlinear systems. The approximations are generated using information from previous iterations. Suppose that we want to findx ∈ R N such that F (x) = 0, where F : R N → R N is continuously differentiable. Given the current pointx k at iteration k, Newton method builds a linear model of F aroundx k in order to findx k+1 . Now, suppose thatx k andx k+1 have already been calculated and let us create a linear model for F aroundx k+1 : The choice B k+1 = J k+1 results in the Newton method for iteration k + 1. In secant methods, B k+1 is constructed such that M k+1 interpolates F atx k and x k+1 , which gives us the secant equation where s k =x k+1 −x k and y k = F (x k+1 )−F (x k ). When s k = 0 and N > 1 there are more unknowns than equations and several choices for B k+1 exist [11,18]. Let B k be the current approximation to J k , the Jacobian of F atx k (it can be J k itself, for example). One of the most often used simple secant approximations for unsymmetric Jacobians is given by the Broyden "good" method. Given B k , a new approximation B k+1 to J k+1 is given by Matrix B k+1 is the closest matrix to B k , in Frobenius norm, which satisfies (8).
The update of the Broyden method belongs to the class of least change secant updates, since B k+1 is a rank-one update of B k . As we are interested in solving a linear system, it may be interesting to analyze matrix B −1 k+1 = H k+1 , which is obtained by the well known Sherman-Morrison-Woodbury formula: where u k = s k − H k y k and ρ k = s T k H k y k . We can see that H k+1 is also a least change secant update of H k . To store H k+1 , one needs first to compute H k y k and then store one scalar and two vectors. Storing u k is more efficient than storing H k s k when H k+1 is going to be used more than once. According to (10), the cost of computing H k+1 v is the cost of computing H k v plus one scalar product and one sum of vectors times a scalar. After updates of an initial approximation B k− , current approximation H k is given by Instead of updating B k and then computing its inverse, the Broyden "bad" method directly computes the least change secant update of the inverse: where V k = I − y k y T k ρ k and ρ k = y T k y k . Similarly to B k+1 in (10), H k+1 given by (11) is the closest matrix of H k , in the Frobenius norm, such that H −1 k+1 satisfies (8). The cost of storing H k+1 is lower than that of (10), since vectors s k and y k have already been computed. The cost of calculating H k+1 v is higher: it involves one scalar product, two sums of vector times a scalar and H k v. After updates of an initial approximation H k− , current approximation H k is given by Approach (11) has some advantages over (10). First, it does not need to compute H k v for constructing the update. When H k is a complicated matrix, this is a costly operation. Second, unlike (10), matrices V j depend solely on y j and s j for all j = 1, . . . , , so it is possible to replace H k− by different matrices without updating the whole structure. This is suitable to be applied in a limited-memory scheme [16]. Third, the computation of H k v can be efficiently implemented in a scheme similar to the BFGS update described in [12], as we show in Algorithm 1. Unfortunately, the Broyden "bad" method is known to behave worse in practice than the "good" method [10]. To avoid the extra cost of computing H k y k in (10) it is common to compute a Cholesky or LU factorization of B k− and work directly with (9), performing rank-one updates of the factorization, which can be efficiently implemented [19].
The class of rank-one least change secant updates can be generically represented by updates of the form where w T k s k = 0. Setting w k = s k defines the Broyden "good" method and w k = B T k y k defines the Broyden "bad" method. Several other well known quasi-Newton methods fit in update (13), such as the Symmetric Rank-1 update used in nonlinear optimization, which defines w k = y k − B k s k . See [10,18] for details on least change secant updates.

A quasi-Newton approach for IP iterations
According to the general description of primal-dual IPMs in Section 1, we can see that, at each iteration, they perform one Newton step associated with the nonlinear system (3), for decreasing values of µ. Each step involves the computation of the Jacobian of F and the solution of a linear system (5).
Our proposal for this work is to perform one quasi-Newton step to solve (3), replacing the true Jacobian J(x, λ, z) by a low rank approximation B. The idea might seem surprising at first glance, since, for quadratic problems, J(x, λ, z) is very cheap to evaluate. In this section we further develop the quasi-Newton ideas applied to interior point methods and show that they might help to reduce the cost of the linear algebra when solving (1).
It is important to note that F and J discussed in Section 2 will be given by (2) and (6), respectively, in the interior point context, which highlights the importance of using the unreduced matrix in our analysis. Therefore, variablē x in Section 2 is given by (x, λ, z) and, consequently, N = 2n + m.

Initial approximation and update
Suppose that k ≥ 0 is an interior point iteration for which system (5) was solved and (x k+1 , λ k+1 , z k+1 ) was calculated, using any available technique. Usually, solving (5) involves an expensive factorization or the computation of a good preconditioner associated with J k . Most traditional quasi-Newton methods for general nonlinear systems compute B k by finite differences or use a diagonal matrix as the initial approximation. According to Section 2, it is necessary to have an initial approximation of J k in order to generate approximation B k+1 of J k+1 by low rank updates. Most of traditional quasi-Newton methods for general systems compute B k by finite differences or use a diagonal matrix. Since J k have already been computed, we will define it as B k , i.e., the perfect approximation to J k . It is clear that, in such case, H k = J −1 k is the approximation to J −1 k . In order to compute B k+1 , vectors s k and y k in secant equation (8) have to be built: The use of J k as the initial approximation ensures that the first two block elements of B k s k − y k are zero. This is a well known property of low rank updates given by (13) when applied to linear functions (see [10,Ch. 8]). In Lemma 1 we show that rank-one secant updates maintain most of the good sparsity structure of approximation B k when its structure is similar to the true Jacobian of F . Lemma 1. Let J be the Jacobian of F given by (2). If the least change secant update B k+1 for approximating J k+1 is computed by (13) Proof. By the definition of s k and y k in (14) it is easy to see that Using the secant update (13), we have that the first two rows of B k are kept the same and By Section 2 we know that Broyden "good" and "bad" updates are represented by specific choices of w k and, therefore, enjoy the consequences of Lemma 1. Unfortunately, not much can be said about the structure of the "third row" of B k+1 . When B k = J k , the diagonal structure of blocks Z k and X k , as well as the zero block in the middle, are likely to be lost. However, if we select w T k = s k,x 0 s k,z T , then, by Lemma 1, the zero block is kept in B k+1 .
The update given by this choice of w k is a particular case of Schubert's quasi-Newton update for structured and sparse problems [20]. This update minimizes the distance to B k on the space of the matrices that satisfy (8) and have the same block sparsity pattern of B k [18]. Using the Sherman-Morrison-Woodbury formula, we also have the update for H k : which only needs an extra computation of H k y k to be stored. There is no need to store w k , since it is composed by components of s k . We can say that this approach is inspired in the Broyden "good" update.
On the other hand, if we use w T k = 0 y k,b y k,µ T B k , then we still have M 2 k+1 = 0 by Lemma 1 and, in addition, we are able to remove the calculation H k y k in the inverse. This approach is inspired by the Broyden "bad" update and results in the following update Up to the knowledge of the authors, this update has not been theoretically studied in the literature. Lemma 1 also justifies our choice to work with approximations of J −1 rather than J. After > 0 rank-one updates, if B k u = v is solved by factorizations and backsolves, it would be necessary to perform updates on the factorization of initial matrix B k− , what could introduce many nonzero elements. A clear benefit of defining B k− = J k− is that computing H k v uses the already calculated factorizations/preconditioners for B k− , which were originally used to solve (5) at iteration k − . Step 3 of Algorithm 1 is an example of low rank update (12). Clearly, we do not explicitly compute H k− v, but instead solve the system B k− u = v.

Computation of quasi-Newton steps
Having defined how quasi-Newton updates are initialized and constructed, we now have to insert the approximations in an interior point framework. Denoting (x 0 , λ 0 , z 0 ) as the starting point of the algorithm, at the end of any iteration k it is possible to build a rank-one secant approximation of the unreduced matrix to be used at iteration k + 1. Let us consider iteration k, where k ≥ 0 and ≥ 0. If = 0, then, by the previous subsection, B k− = B k = J k and the step in the interior point iteration is the usual Newton step, given by (5). If > 0, we have a quasi-Newton step, which can be viewed as a generalization of (5), and is computed by solving or, equivalently, by performing H k v. All the other steps of the IPM remain exactly the same. When > 0, the cost of solving (16) depends on the type of update that is used. In general, it is the cost of solving system J k− r = q (or, equivalently, J −1 k− q) plus some vector multiplications and additions. However, since J k− has already been the coefficient matrix of a linear system at iteration k − , it is usually less expensive than solving for the first time. That is one of the main improvements that a quasi-Newton approach brings to interior point methods.
When the Broyden "bad" update (12) is used together with defining B k− = J k− as the initial approximation, it is possible to derive an alternative interpretation of (16). Although this update is known to have worse numerical behavior when compared with the "good" update (10), this interpretation can result in a more precise implementation, which is described in Lemma 2.
Lemma 2. Assume that k, ≥ 0 and H k is the approximation of J −1 k constructed by updates (12) using initial approximation Proof. Using the expansion (11) of Broyden "bad" update, the definition of α i and the fact that H k = J −1 k , we have that where the last equality comes from the definition of V k in (11), applied recursively. When i = 1, we assume that k−1 j=k−i+1 V j results in the identity matrix, therefore α 1 = y T k−1 v/ρ k−1 . Multiplying J k− on the left on both sides of (17), we obtain By Lemma 1 and definition (14), the first two components of J k− s k−i − y k−i are zero, for all i, which demonstrates the lemma.
Lemma 2 states that only the third component of the right hand side actually needs to be changed in order to compute Broyden "bad" quasi-Newton steps at iteration k. This structure is very similar to corrector or multiple centrality correctors in IPMs and reinforce the argument that the cost of computing a quasi-Newton step is lower than the Newton step. It is important to note that scalars α i are the same as the ones computed at step 2 of Algorithm 1.

Dealing with regularization
Rank-deficiency of A, near singularity of Q or the lack of strict complementarity at the solution may cause matrix J, the augmented system or the normal equations to become singular near the solution of (1). As the iterations advance, it becomes harder to solve the linear systems. Regularization techniques address this issue by adding small perturbations to J in order to increase numerical accuracy and convergence speed, without losing theoretical properties. A common approach is to interpret the perturbation as the addition of weighted proximal terms to the primal and dual formulations of (1). Saunders and Tomlin [21] consider fixed perturbations while Altman and Gondzio [22] consider dynamic ones, computed at each iteration. Friedlander and Orban [23] add extra variables to the problem, expand the unreduced system and, after an initial reduction, arrive in a regularized system similar to [22]. In all these approaches, given reference pointsx andλ, the regularized matrix J where diagonal matrices R p ∈ R n×n and R d ∈ R m×m represent primal and dual regularization, respectively, can be viewed as the Jacobian of the following functionF Any choice is possible for reference pointsx andλ. However, in order to solve the original Newton system (5) and make use of the good properties of the regularization (18) at the same time, they are usually set to the current iteration points x k and λ k , respectively, which annihilates terms R p (x −x) and R d (λ −λ) on the right hand side of (5) during affine scaling steps. Matrix J given by (18) now depends on R p and R d in addition to x and z. The regularization terms R p and R d do not need to be considered as variables, but if new regularization parameters are used, a new factorization or preconditioner needs to be computed. Since this is one of the most expensive tasks of the IP iteration, during quasi-Newton step k the regularization parameters are not allowed to change from those selected at iteration k − , where the initial approximation was selected. That is a reasonable decision, as the system that is actually being solved in practice has the coefficient matrix from iteration k − . The fact that the regularization terms are linear inF implies, by Lemma 1, that the structure of (18) is maintained during least change secant updates.
The reference points have no influence in J, but they do influence the func-tionF . Suppose, as an example, that = k, i.e., the initial approximation for quasi-Newton is the Jacobian at the starting point (x 0 , λ 0 , z 0 ), and only quasi-Newton steps are taken in the interior point algorithm. If we use x 0 and λ 0 as the reference points and the algorithm converges, the limit point could be very different from the true solution, as initial points usually are far away from the solution, especially for infeasible IPMs. If we update the reference points at each quasi-Newton iteration, as it is usually the choice in literature [22,23], we eliminate their effect on the right hand side of (16) during affine scaling steps. By (7), B k+1 is the Jacobian of a linear approximation ofF which interpolates (x k , λ k , z k ) and (x k+1 , λ k+1 , z k+1 ). As the regularization parameters are fixed during quasi-Newton iterations, the reference points can be seen as simple constant shifts onF , with no effect on the Jacobian. Therefore, the only request is thatF has to be evaluated at points (x k , λ k , z k ) and (x k+1 , λ k+1 , z k+1 ) using the same reference points, when calculating y k by (14). The effect of changing the reference points at each iteration in practice is the extra evaluation ofF at the beginning of iteration k.

Implementation
The quasi-Newton approach can easily be inserted into an existing interior point method implementation. In this work, the primal-dual interior point algorithm HOPDM [24] was modified to implement the quasi-Newton approach. Algorithm 2 describes the steps of a conceptual quasi-Newton primal-dual interior point algorithm.

Algorithm 2: Quasi-Newton Interior Point algorithm
Initialization: F , J and (x 0 , λ 0 , z 0 ). Set k ← 0 and ← 0. 2. Calculate α k P and α k D such that (x k+1 , λ k+1 , z k+1 ) given by (4) satisfy x k+1 , λ k+1 > 0 3. Compute s k and y k by (14) if will store quasi-Newton information, then Store appropriate quasi-Newton information ← + 1 else ← 0 4. k ← k + 1 and go back to step 1 The most important element of Algorithm 2 is , the memory size of the low rank update, which controls if the iteration involves Newton or quasi-Newton steps. At step 1 several systems (16) might be solved, depending on the IPM used. HOPDM implements the strategy of multiple centrality correctors [7], which tries to maximize the step-size at the iteration. HOPDM also implements the regularization strategy (18). Note in (16) that we do not have to care how the systems are solved, only how to implement the matrix-vector multiplication H k v efficiently.
Step 3 is the most important step in a quasi-Newton IP algorithm, since it decides whether or not quasi-Newton steps will be used in the next iteration. Several possible strategies are discussed in this section, as well as some implementation details.
Bound constraints l ≤ x ≤ u, l, u ∈ R n can be considered in the general definition (1) of a quadratic programming problem by using slack variables. HOPDM explicitly deals with bound constraints and increases the number of variables to 4n + m. When bound constraints are considered, function F is given by Note that, in this case, l is eliminated by proper shifts, u represents upper shifted constraints and t represents slacks. All the results and discussions considered so far can be easily adapted to the bound-constrained case. Therefore, in order to keep notation simple, we will refer to the more general and simpler formulation (1) and work in the (2n + m)-dimensional space.

Storage of H k and computation of H k v
When solving quadratic problems, the Jacobian of function F used in a primaldual interior point method is not expensive to compute and has an excellent structure, which can be efficiently explored by traditional approaches. Therefore, there is no point in explicitly building approximation matrix B k (or H k ) since, by Lemma 1, they would be denser. For an efficient implementation of the algorithm only the computation H k v has to be performed in (16). To accomplish this task, we store • Initial approximation J k− and . . , , if updates are based on Broyden "good" or "bad" method, respectively.
In order to store J k− we have to store vectors x k− and λ k− , since all other blocks of J are constant. If regularization is being used, vectors R p and R d used at iteration k − are also stored. The reference points are not stored. The most important structure to store is the factorization or the preconditioner computed when solving (16) at iteration k − for the first time. Without this information, the computation of H k v would have the same computational cost of using the true matrix J k . Data is stored at step 3 of Algorithm 2, whenever it has decided to store quasi-Newton information and = 0.
Regarding the triples, they are composed of two (2n + m)-dimensional vectors and one scalar. Storing y k−i is the most expensive part in Broyden "bad" updates, since function F has to be evaluated twice. In Broyden "good" updates the computation of u k−i is the most expensive, due to the computation of H k−i y k−i .
The implementation of an algorithm to compute H k v depends on the selected type of low rank update. Algorithm 1 is an efficient implementation of the general Broyden "bad" update (12). If the structure described by Lemma 1 is being used, then all vector multiplications are performed before the solution of the linear system, as described by Algorithm 3. Both algorithms can be easily modified to use updates of the form w T k = a k b k c k T B k in the generic update (13). The only changes are the storage of an extra vector and the computation of scalars α i at step 2. The implementation of the sparse update (15) is straightforward and there is no need to store extra information. Algorithm 3 uses a little extra computation, since vector q is discarded after the computation of all α i . On the other hand, there is no need to store blocks s k−i,λ , i = 1, . . . , .
Algorithm 3: Algorithm for matrix-vector multiplications in Broyden "bad" update using structural information Algorithm 4 describes the steps to compute H k v when Broyden "good" update (10) is considered. Note that a linear system is first solved, then a sequence of vector multiplications and additions is applied. The algorithm is simpler and more general than Algorithm 1, but it has to be called more often in an interior point algorithm: to compute the steps (step 1 in Algorithm 2) and to compute H k y k , needed to build u k (step 3 in Algorithm 2). Algorithm 4 is very general and can be easily modified to consider any least change secant update of the form (13) without extra storage requirements, although not necessarily in an efficient way.

Size of
The cost of computing H k v increases as the quasi-Newton memory increases. In addition, it was observed that the quality of the approximation decreases when the quasi-Newton memory is large [17]. In our implementation of Algorithm 2, we also observed the decrease in the quality of the steps when is too large. The decrease of the barrier parameter µ k = x k T z k /n for different bounds on is shown in Figure 1, for problem afiro, the smallest example in Netlib test collection. In this example, Newton steps were allowed after max quasi-Newton iterations, where max ∈ {0, 5, 20, 100, 200}. The maximum of 200 iterations was allowed.
We can see that if the Jacobian is only evaluated once ( max = 200) then the method is unable to converge in 200 iterations. As the maximum memory is reduced, the number of iterations to convergence is also reduced. On the other hand, the number of (possibly expensive) Newton steps is increased. When max = 0, i.e., no quasi-Newton steps, the algorithm converges in 7 iterations. We take the same approach as [17] and define an upper bound max on in the implementation of Algorithm 2. When this upper bound is reached, we set to 0, which, by (16), results in the computation of a Newton step. The verification is performed at step 3 of Algorithm 2. This approach is also known as quasi-Newton with restarts [25] and differs from usual limited-memory quasi-Newton [12], where only the oldest information is dropped.

The quasi-Newton steps
The behavior of consecutive quasi-Newton steps depicted in Figure 1 reminds us that it is important to use the true Jacobian in order to improve convergence of the method. However, we would like to minimize the number of times the Jacobian is evaluated, since it involves expensive factorizations and computations. Unfortunately, to use only the memory bound as a criterion to compute quasi-Newton steps is not a reasonable choice. When max = 100, for example, the algorithm converges in 110 iterations, but it spends around 60 iterations without any improvement. As the dimension of the problem increases, this behavior is getting even worse. We can also see that the choice max = 20 is better for this problem, as the algorithm converges in 31 iterations, computing only two times the Cholesky factorization of the Jacobian.
The lack of reduction is related to small step-sizes α k P and α k D . Our numerical experience with quasi-Newton IP methods indicates that the quasi-Newton steps often are strongly attracted to the boundaries. The step-sizes calculated for directions originated from a quasi-Newton predictor-corrector strategy are almost always small and need to be fixed. Several strategies have been tried to increase the step-sizes of those steps: (i) Perturb complementarity pairs x i z i for which the relative component-wise direction magnitude is high and then recompute quasi-Newton direction; (ii) Use multiple centrality correctors [7]; (iii) Gentle reduction of µ on quasi-Newton iterations, selecting σ k close to 1 in the predictor and corrector steps.
Note that the terms in (i) are the inverse of the maximum step-size allowed by each component. The motivation of strategy (i) is the strong relation observed between components of the quasi-Newton direction which are too large with respect their associated variable and components which differ too much from the respective component of the Newton direction for the same iteration, i.e., We display this relation in Figure 2(a) for one iteration on linear problem GE. Positive spikes represent the component-wise relative magnitude of quasi-Newton steps (19) for each component of variables x and z. The higher the spikes, the smaller the step-sizes are. Negative spikes represent the componentwise relative error between the Newton and quasi-Newton directions (20). The lower the spikes, the larger the relative difference between Newton and quasi-Newton components. To generate this figure, the problem was solved twice and, at the selected iteration, the Newton step and quasi-Newton step were saved. Only negative quasi-Newton directions were considered in the figure. It is possible to see in Figure 2(a) that very few components are responsible for the small step-sizes. Interestingly, most of those blocking components are associated with components of the quasi-Newton direction which differ considerably from the Newton direction. Unfortunately, numerical experiments show that the perturbation of variables or setting the problematic components to zero has the drawback of increasing the infeasibility and cannot be performed at every iteration.
To test the impact of each strategy on the quality of the steps, four linear programming problems were selected: afiro, GE, stocfor3 and finnis. The tests were performed as follows. Given an iteration k of a problem, we run algorithm HOPDM allowing only Newton steps up to iteration k − 1. At iteration k only one of each approach is applied: Newton step, quasi-Newton step, or one of the discussed strategies (i), (ii) or (iii). Only one affine-scaling predictor and one corrector were allowed, except for strategy (ii), where multiple centrality correctors were used at iteration k. We repeated this procedure for k from 2 up to the total number of iterations that the original version of HOPDM needed to declare convergence.
The average of the sum of the step-sizes for each problem and for each approach is shown in Table 1. We can see that quasi-Newton steps are considerably smaller than Newton steps. All improvement strategies are able to increase, on average, the sum of the step-sizes. Strategy (i) has the drawback of increasing the infeasibility and has a huge impact on the convergence of the algorithm. Strategy (iii) is simple and efficient to implement but has worse results when compared to strategy (ii), based on multiple centrality correctors. Strategy (ii) has the ability to improve quasi-Newton directions in almost all iterations and has the drawback of extra backsolves. Similar behavior was observed in [7]. The effect of strategy (ii) is shown in Figure 2(b).
Step-sizes are increased, (a) (b) Figure 2: Relation between small step-sizes for quasi-Newton steps (positive spikes) and large relative errors when compared with Newton step (negative spikes) for one iteration on linear problem GE. High positive spikes represent blocking components of the quasi-Newton direction. The errors when only a simple predictor-corrector direction is used are displayed in (a). The effect of using strategy (ii) to improve step-sizes is shown in (b).
but the new quasi-Newton direction is slightly different from the Newton direction for the same step. Strategy (ii) was selected as the default one in our implementation.
In order to perform as few Newton steps as possible, step 3 of Algorithm 2 has to be carefully implemented. Clearly, the first basic condition to try a quasi-Newton step at iteration k + 1, k ≥ 0, is to check if there is available memory to store it at iteration k.
Our experience shows that quasi-Newton steps should always be tried, since they are cheaper than Newton steps. This means that a quasi-Newton step is always tried (but not necessarily accepted) after a Newton step in the present implementation. As shown in Figure 1, using only Criterion 1 can lead to slow convergence and slow convergence is closely related to small step-sizes. Therefore, in addition to Criterion 1 we tested two criteria, which cannot be used together. In Section 5 we compare those different acceptance criteria.
Criterion 2 (α criterion). If iteration k is a quasi-Newton iteration and  Table 1: Average of the sum α k P + α k D for different improvement strategies on selected linear programming problems. The use of multiple centrality correctors (strategy (ii)) resulted in values similar to the Newton step.

Criterion 3 (Centrality criterion). If iteration k is a quasi-Newton iteration and
x k+1 T z k+1 ≤ ε c x k T z k .

Numerical results
Algorithm 2 was implemented in Fortran 77 as a modification of the primaldual interior point algorithm HOPDM [24], release 2.45. The code was compiled using gfortran 4.8.5 and run in a Dell PowerEdge R830 powered with Red Hat Enterprise Linux, 4 processors Intel Xeon E7-4660 v4 2.2GHz and 512GB RAM. The modifications discussed in Sections 3 and 4 have been performed in order to accommodate the quasi-Newton strategy. The main stopping criteria have been set to Mehrotra and Li's stopping criteria [7,26]: where µ = x T z/n. By default, in HOPDM parameters are defined to ε opt = 10 −10 , ε P = 10 −8 and ε D is set to 10 −8 for linear problems and to 10 −6 for quadratic problems. In addition to (21), successful convergence is also declared when lack of improvement is detected and µ/(1 + |c T x|) ≤ 10 3 ε opt . Besides several performance heuristics, HOPDM implements the regularization technique [22] and the multiple centrality correctors strategy [7]. When solving systems with the unreduced matrix, sparse Cholesky factorization of normal equations or LDL T factorization of the augmented system is automatically selected on initialization.
HOPDM also has a matrix-free [27] implementation for which the present approach is fully compatible. According to Algorithm 2, once a quasi-Newton step is computed, it is used to build point (x k+1 , λ k+1 , z k+1 ). However, in practice, if such step is considered "bad", it is also possible to discard it, setting = 0, compute the exact Jacobian and perform the Newton step at this iteration. The idea is to avoid quasi-Newton steps which might degrade the quality of the current point. Preliminary experiments using linear programming problems from Netlib collection were performed, in order to test several possibilities for max in Criterion 1 and to select between Criteria 2 and 3. In addition we also verified the possibility to reject quasi-Newton steps, instead of always accepting them. The selected combination uses max = 5 and Criterion 3 with ε c = 0.99. Rejecting quasi-Newton steps has not led to reductions in the number of factorizations and has the drawback of more expensive iterations, therefore, the steps are always taken. As mentioned in Section 4, the multiple centrality correctors strategy (ii) is used to improve quasi-Newton directions.
A key comparison concerns the type of low rank update to be used. Three implementations were tested: U1 General Broyden "bad" algorithm, described by Algorithm 1; U2 Sparse Broyden "bad" algorithm, described by Algorithm 3 using update (15) inspired in Schubert's update [20]; U3 General Broyden "good" algorithm, described by Algorithm 4.
Four test sets were used in the comparison: 96 linear problems from Netlib 1 , 10 medium-sized linear problems from Maros-Mészáros misc library 2 , 39 linear problems from the linear relaxation of Quadratic Assignment Problems (QAP) 3 and 138 convex quadratic programming problems from Maros-Mészáros qpdata library 4 . In order to compare algorithms in large test sets, performance profiles were used [28]. A problem is declared solved by an algorithm if the obtained solution (x * , λ * , z * ) satisfies (21). Number of factorizations or total CPU time are used as performance measures.
Using the default HOPDM values for (21), implementations U1, U2 and U3 are able to solve 269, 275 and 271 problems, respectively, out of 283. There were 19 problems in which at least one implementation did not solve. We relaxed the parameters in (21), multiplying them by a factor of 10 2 , and solved the 19 problems again. The resulting performance profiles in numbers are shown in Table 2, using number of factorizations and CPU time as performance measures. The efficiency of an algorithm is the number of solved problems in which the algorithm spent the smallest number of factorizations (or the smallest amount of CPU time) among the compared algorithms. The robustness is the total number of problems solved.
We can see that update U2 solves 210 problems using the smallest number of factorizations and 137 problems using least CPU time, while U1 solves 177 and 126 and U3 solves 123 and 85, respectively. In addition, updates U2 and U3 are the most robust implementations, being able to solve 281 out of 283 problems. Therefore, U2 was used as the default update in this work. Update U2 has performed particularly well on quadratic problems, what explains the difference in efficiency between updates.
Based on the preliminary results, the default implementation of Algorithm 2, denoted qnHOPDM from now on, uses update U2 for solving (16)   the step, strategy (ii) to improve quasi-Newton directions and Criteria 1 and 3 to decide when to use quasi-Newton at step 3. By default, HOPDM uses multiple centrality correctors, which were shown to improve convergence of the algorithm [7]. We implemented two versions of Algorithm 2: with (qnHOPDM-mc) and without (qnHOPDM) multiple centrality correctors for computing Newton steps. Since we are using strategy (ii), multiple correctors are always used for quasi-Newton steps. Each implementation was compared against its respective original version: HOPDM-mc and HOPDM.
In the first round of tests only the QAP collection was excluded from the comparison, which gives 244 problems from Netlib and from Maros-Mészáros linear and quadratic programming test collection. The performance profiles using number of factorizations and CPU time as performance measures are shown in Figure 3. Comparisons between the implementation of HOPDM without multiple centrality correctors and qnHOPDM are given by Figures 3(a) and 3(b). The comparison of implementations HOPDM-mc and qnHOPDM-mc is displayed in Figures 3(c) and 3(d).
Similarly to the previous comparison, using default parameters, 5 problems were not solved by qnHOPDM or HOPDM without multiple centrality correctors, while 7 problems were not solved by qnHOPDM-mc or HOPDM-mc. Criteria (21) was relaxed in the same way on these problems. Using this approach, HOPDM is able to solve all the 244 problems, qnHOPDM solves 242, HOPDM-mc solves 243 and qnHOPDM-mc solves 242. The quasi-Newton implementations are able to successfully reduce the number of factorizations, as shown in Figures 3(a) and 3(c). We can see in Figure 3(a) that from all 242 problems considered solved by qnHOPDM, in 237 it uses less factorizations than HOPDM without multiple centrality correctors. On the other hand, for about 150 problems, qnHOPDM uses at least twice as much CPU time as HOPDM (Figure 3(b)). The behavior of the implementations using multiple centrality correctors in the Newton step is similar, but HOPDM-mc has improved efficiency results. The problems where qnHOPDM reduces both factorizations and CPU time when compared to HOPDM without centrality correctors are highlighted in Table 3. The only problem which qnHOPDM-mc uses strictly less CPU time than HOPDM-mc is the quadratic programming problem cont-101.
Our last comparison considers 39 medium-sized problems from the QAP collection. These problems are challenging, since they are sparse, but their Cholesky factorization is very dense. Performance profiles were once more used for comparing the implementations. As the algorithm approaches the solution, the linear systems become harder to solve. Therefore, using default HOPDM values for parameters in (21) the number of problems solved is 21 (HOPDM), 31 (qnHOPDM), 25 (HOPDM-mc) and 35 (qnHOPDM-mc). Clearly the quasi-Newton approach benefits of using matrices that are not too close to the solution. From the 39 problems, 19 were solved again using relaxed parameters for the comparison between HOPDM and qnHOPDM, and 14 were solved again for the comparison between HOPDM-mc and qnHOPDM-mc. The results are shown in Figure 4. Quasi-Newton IPM is the most efficient and robust algorithm in terms of CPU time  for both implementations, solving all 39 problems. Without multiple centrality correctors (Figure 4(a)), HOPDM has a poor performance and is not able to solve any problem using less CPU time than qnHOPDM. When multiple centrality correctors are allowed (Figure 4(b)), HOPDM-mc is able to solve only 10 problems using less or equal CPU time than qnHOPDM-mc. Clearly, the efficiency of qnHOPDM is due to the decrease in the number of factorizations, as shown in Table 4. In this table we display the number of factorizations (F) and CPU time (CPUt) for each problem and each algorithm in all QAP test problems considered. When no multiple centrality correctors are allowed at Newton steps, qnHOPDM displays the biggest improvements, being the fastest solver in all problems. The results are more competitive when multiple centrality correctors are allowed, but qnHOPDM-mc was the most efficient in 29 problems while HOPDM-mc was the most efficient in 10 problems.

Conclusions
In this work we discussed a new approach to IPM based on rank-one secant updates for solving quadratic programming problems. The approach was motivated by the multiple centrality correctors, which provide many possible points where the function F can be evaluated in order to build a good approximation of J. Instead of using several points, the present approach uses only the new computed point in order to build a low rank approximation to the unreduced matrix at the next iteration. The computational cost of solving the quasi-Newton linear system can be compared with the cost of computing one corrector, as all the factorizations and preconditioners have already been calculated.
It was shown that rank-one secant updates maintain the main structure of the unreduced matrix. Also, several aspects of an efficient implementation were discussed. The proposed algorithm was implemented as a modification of algorithm HOPDM using the Broyden "bad" update, modified to preserve the (a) (b) Figure 4: Performance profiles for the comparison between quasi-Newton IPM and HOPDM on the QAP test collection. The CPU time was used as performance measure.
sparsity structure of the unreduced matrix. The implementation was compared with the original version of HOPDM and was able to reduce the overall number of factorizations in most of the problems. However, only in the test set containing linear relaxations of quadratic assignment problems, the reduction in the number of factorizations was systematically translated into the reduction of the CPU time of the algorithm. This suggests that the proposed algorithm is suitable for problems where the computational cost of the factorizations is much higher than the cost of the backsolves.