Exact linesearch limited-memory quasi-Newton methods for minimizing a quadratic function

The main focus in this paper is exact linesearch methods for minimizing a quadratic function whose Hessian is positive definite. We give a class of limited-memory quasi-Newton Hessian approximations which generate search directions parallel to those of the method of preconditioned conjugate gradients, and hence give finite termination on quadratic optimization problems in exact arithmetic. With the framework of reduced-Hessians this class provides a dynamical framework for the construction of limited-memory quasi-Newton methods. We give an indication of the performance of the methods within this framework by showing numerical simulations on sequences of related systems of linear equations, which originate from the CUTEst test collection. In addition, we give a compact representation of the Hessian approximations in the full Broyden class for the general unconstrained optimization problem. This representation consists of explicit matrices and gradients only as vector components.


Introduction
In this work we mainly study the behavior of limited-memory quasi-Newton methods on unconstrained quadratic optimization problems on the form min x∈R n where H = H T and H ≻ 0. (Throughout, "≻" is used to denote positive definite and we denote the corresponding set of matrices by S n + , i.e. S n + = {B 0 ∈ R n×n : B 0 = B T 0 and B 0 ≻ 0}.) In particular, we study exact linesearch limitedmemory quasi-Newton methods that generate search directions parallel to those of the method of preconditioned conjugate gradients (PCG). Under exact linesearch parallel search directions imply identical iterates.
The motivation for this work originates from applications in which it is desired to solve or approximately solve a sequence of related systems of linear equations. In particular systems where the matrix is symmetric positive definite. Such sequences occur when solving unconstrained nonlinear optimization problems with Newton's method and in interior-point methods, which constitute some of the most widely used methods in numerical optimization. As the problems become larger the arising systems of linear equations typically become increasingly computationally expensive to solve and iterative methods may be considered. In exact arithmetic, our model method is the method of preconditioned conjugate gradients, but this method may be too inaccurate in finite precision. Quasi-Newton methods may be expected to be significantly more accurate, but the computational cost is typically too high. In consequence, we aim for less computationally expensive limited-memory versions of quasi-Newton methods that are more accurate than the method of preconditioned conjugate gradients. The goal is to provide better understanding of whether it is viable and/or efficient to use such methods to solve or approximately solve systems of linear equations that arise as Newton's method or interior-point methods converge.
We envisage the use of the limited-memory quasi-Newton methods as an accelerator for a direct solver when solving a sequence of systems of linear equations. E.g., when the direct solver and the iterative solver can be run in parallel, and where the preconditioner is updated when the direct solver is faster for some system of linear equations in the sequence.
We mainly propose a framework for the construction of reduced-Hessian limitedmemory quasi-Newton methods. To give an indication of their potential we construct two examples in this framework for which we give numerical results.
Limited-memory quasi-Newton methods have previously been studied by various authors, e.g., as memory-less quasi-Newton methods by Shanno [23], limitedmemory BFGS (L-BFGS) by Nocedal [18] and more recently as limited-memory reduced-Hessian methods by Gill and Leonard [13]. In contrast, we specialize to exact linesearch methods for problems on the form (QP). The model method is PCG, which is interpreted as a particular quasi-Newton method as is done by e.g., Shanno [23] and Forsgren and Odland [11]. We start from a result by Forsgren and Odland [11], which provides necessary and sufficient conditions on the Hessian approximation for exact linesearch methods on (QP) to generate search directions that are parallel to those of PCG. The focus is henceforth directly on Hessian approximations with this property. The approximations are described by a novel compact representation which contains explicit matrices together with gradients and search directions as vector components. The framework for the compact representation is first given for the full Broyden class where we consider unconstrained optimization problems on the form min where the function f : R n → R is assumed to be smooth. Compact representations of quasi-Newton matrices have previously been used by various authors but were first introduced by Byrd, Nocedal and Schnabel [1]. They were thereafter extended to the convex Broyden class by Erway and Marcia [6,7], and to the full Broyden class by DeGuchy, Erway and Marcia [3]. In contrast, we give an alternative compact representation of the Hessian approximations in the full Broyden class which only contains explicit matrices and gradients as vector components. In addition we discuss how exact linesearch is reflected in this representation. Compact representations of limited-memory Hessian approximations in the Broyden class are also discussed by Byrd, Nocedal and Schnabel [1] and Erway and Marcia [7]. In contrast, our discussion is on limited-memory representations of Hessian approximations intended for exact linesearch methods for problems on the form (QP), and the approximations are not restricted to the Broyden class. In addition, our alternative representation provides a dynamical framework for the construction of limited-memory approximations for the mentioned purpose.
In Section 2 we provide a brief background to quasi-Newton methods, unconstrained quadratic optimization problems (QP) and to the groundwork that provides the basis for this study. Section 3 contains the alternative compact representation for the full Broyden class. In Section 4 we present results which include a class of limited-memory Hessian approximations together with a discussion of how to solve the systems of linear equations that arise using reduced-Hessian methods. Section 5 contains numerical results on randomly generated problems on the form (QP) and on systems of linear equations which originate from the CUTEst test collection [5]. Finally in Section 6 we give some concluding remarks.

Notation
Throughout, R(M ) and N (M ) denote the range and the nullspace of a matrix M respectively. Moreover, e i denotes the ith unit vector of the appropriate dimension and |S| denotes the cardinality of a set S.

Background
In this section we give a short introduction to quasi-Newton methods for unconstrained optimization problems on the form (P). Thereafter, we give a background to unconstrained quadratic optimization problems (QP) and to the groundwork that provides the basis for this study.

Background on quasi-Newton methods
Quasi-Newton methods were first introduced as variable metric methods by Davidon [2] and later formalized by Fletcher and Powell [10]. For a thorough introduction to quasi-Newton methods see, e.g., [8,Chapter 3] and [19,Chapter 6]. In quasi-Newton methods the search direction, p k , at iteration k is generated by where B k is an approximation of the true Hessian ∇ 2 f (x k ) and g k is the gradient ∇f (x k ). The symmetric two-parameter class of Huang [17] satisfies the scaled secant condition where and σ k is one of the free parameters. The most well-known quasi-Newton class is obtained if σ k = 1 in (2), namely the one-parameter Broyden class which updates B k−1 to where with φ k−1 as the free parameter [9]. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) update scheme is obtained if φ k−1 = 0 and Davidon-Fletcher-Powell (DFP) if φ k−1 = 1. In this work we study Hessian approximations described by compact representations with gradients and search directions as vector components. We will therefore throughout this work explicitly use the quantities g, p and the steplength α in all equations. In this notation, the Broyden class Hessian approximations in (3) may be written as where As shown in (4), the previous Hessian approximation is in general updated by a rank-two matrix with range equal to the space spanned by the current and the previous gradient. Furthermore, it is well known that under exact linesearch all Broyden class updates generates identical iterates, as shown by Dixon [4]. The case φ k−1 = 0 in (4), i.e., the BFGS update, will have a particular role in part of our analysis. We will refer to quantities B k , p k and α k corresponding to this case as B BF GS k , p BF GS k and α BF GS k .

Background on quadratic problems
Solving (QP) is equivalent to solving the linear system which has a unique solution if H ≻ 0. The problem (QP), and hence (5), is in this work solved by an exact linesearch method on the following form. The steplength, iterate and gradient at iteration k is updated as which together with a specific formula for p k constitute the particular exact linesearch method. The model method is summarized in the Algorithm 1 below.
The search direction in Algorithm 1 may be calculated using PCG with a symmetric positive definite preconditioner M . The corresponding algorithm for solving (5) can be formulated using the Cholesky factor L defined by M = LL T . This is equivalent to application of the method of conjugate gradients (CG) to the preconditioned linear system withx = L T x, see, e.g., Saad [22,Chapter 9.2]. If all quantities generated by CG on (6) are denoted by "ˆ", then these quantities will relate to those from CG on (5) as,ĝ = L −1 g andp = L T p. The iteration space when M = I or when M is an arbitrary symmetric positive definite matrix will thus be related through a linear transformation. In this work the following PCG update is considered, The discussion in this work is mainly on Hessian approximations B k that generate p k parallel to p P CG k . We will therefore hereinafter only consider the preconditioner M = B 0 where B 0 ∈ S n + . If no preconditioner is used, i.e. B 0 = I, then (7) is the update referred to as Fletcher-Reeves, which together with the exact linesearch method of Algorithm 1 is equivalent to the method of conjugate gradients by Hestenes and Stiefel [16]. If the search direction (7) is used in Algorithm 1, the method terminates when g r = 0 for some r where r ≤ n and x r solves (QP). The search directions generated by the method are mutually conjugate with respect to H and satisfy p i ∈ span {B −1 0 g 0 , . . . , B −1 0 g i } , i = 0, . . . , r. In addition, it holds that the generated gradients are mutually conjugate with respect to B −1 0 , i.e. g T i B −1 0 g j = 0, i = j. By expanding (7), the search direction of PCG may be expressed as Forsgren and Odland [11] have provided necessary and sufficient conditions on B k for an exact linesearch method to generate p k parallel to p P CG k . This result provides the basis of our work and is therefore reviewed. Under exact linesearch and B 0 ∈ S n (7) can be written as where Insertion of p P CG (9) gives Premultiplication by C −T k while noting that C −T k g k = g k and letting for W k nonsingular. Finally, it holds that B k ≻ 0 if and only if W k ≻ 0. For further details, see [11,Proposition 4].
With the exact linesearch method of Algorithm 1 for solving (QP), parallel search directions imply identical iterates, and therefore search directions parallel to those of PCG imply finite termination. Huang has shown that the quasi-Newton Huang class, the Broyden class and PCG generate parallel search directions [17].
Finally we review a result which is related to the conjugacy of the search directions. Part of the result is similar to those given by Fletcher in [8]. The result will have a central part the analysis to come.
Proof. Note that by the assumptions, g i , i = 0, . . . , k, are identical to those generated by PCG. We first show the only-if direction. Premultiplication of p P CG k in (8) by g T i while taking into account the conjugacy of the g j 's with respect to B −1 0 gives To show the other direction, let Premultiplication of (14) by g T i while taking into account the conjugacy of the g j 's with respect to B −1 Insertion of (15) into (14) gives p k = δ k p P CG k , with p P CG k given by (8).
For further details on methods of conjugate gradients and related analyses, see e.g. [21].

A compact representation of Broyden class Hessian approximations
In this section we consider unconstrained optimization problems on the form (P) and give a compact representation of the Hessian approximations in the full Broyden class. The representation contains only explicit matrices and gradients as vector components. In this section, the gradient of f in (P) will be denoted by g, in contrast to all other sections where g refers to the gradient of the objective function of (QP). We first give the general representation without exact linesearch and then discuss how exact linesearch is reflected in the representation.
Lemma 3.1. Consider iteration k of solving (P ) by a quasi-Newton method. Let B 0 be a given nonsingular matrix. Assume that p i , i = 0, . . . , k − 1, has been given by

as any nonsingular matrix on the form (4). Any Hessian approximation in the Broyden class can then be written as
Proof. The result follows directly from telescoping (4) and writing it on outer product form.
One of the most commonly used quasi-Newton update schemes is the BFGS update. For φ i = 0, i = 1, . . . , k − 1, it follows from Lemma 3.1 that Under exact linesearch the step length is chosen such that g T The choice of Broyden member is thus only reflected in the diagonal of T k in Lemma 3.1. This can be observed directly in (17e) -(17h) by making use of the exact linesearch condition g T i p i−1 = 0, i = 1, . . . , k. All non-diagonal terms of T φ k become zero and the diagonal terms may be simplified to Any Hessian approximation in the Broyden class may in fact be written as In consequence, B k is independent of φ i for i = 0, . . . , k − 2 and the choice of Broyden member only affects the scaling of the search direction. This property follows solely from exact linesearch, in comparison to the properties that stem from exact linesearch on quadratic optimization problems (QP), which are discussed in Section 4.

Quadratic problems
In this section we consider quadratic problems on the form (QP) and start from the requirement that p k generated by the exact linesearch method of Algorithm 1 shall be parallel to p P CG k . Motivated by the performance of the Broyden class, we start by considering Hessian approximations and thereafter look at generalizations. A characterization of all such update matrices U k is provided as well as a multiparameter Hessian approximation that generates p k = δ k p P CG k for nonzero scalar a δ k . Thereafter, we consider limited-memory Hessian approximations with this property, discuss potential extensions and how to solve the arising systems with a reduced-Hessian method.
where ρ k−1 is a free parameter.
Proof. As stated in [11,Proposition 5], the assumptions in the proposition together with (11) and B k = B k−1 + U k give the following necessary and sufficient condition on U k such that p k = δ k p P CG k for a scalar δ k = 0, Any symmetric rank-two matrix, U k , with R(U k ) = span {g k−1 , g k } can be written as for parameters η k−1 , ρ k−1 and ϕ k . Insertion of (19) into (18), taking into account which is independent of ρ k−1 . Identification of coefficients for g k and g k−1 respectively gives Insertion of ϕ k and η k−1 into (19) gives U k as stated in the lemma, with ρ k−1 free.
The result in Proposition 4.1 provides a two-parameter update matrix, U k . If the conditions of Proposition 4.1 apply then it follows directly from U k that the iterates satisfy the scaled secant condition (2). This can be seen by considering Consequently the characterization in Proposition 4.1 provides a class which under exact linesearch is equivalent to the symmetric Huang class. The scaling in the secant condition does neither affect the search direction nor its scaling. Utilizing the secant condition sets the parameter gives the exact linesearch form of the Broyden class matrices in (4). Hence, as expected, utilizing the secant condition fixates one of the parameters and gives the Broyden class. The result of Proposition 4.1 motivates further study of the structure in the corresponding Hessian approximations.
with ρ i−1 and ϕ i chosen such that B i is nonsingular. Then B k takes the form Proof. With the assumptions in the proposition, the update of (21) satisfies the requirements of Proposition 4.1 and hence for each Inverting (23) and taking into account that g T i p P CG By telescoping (21) at iteration k we obtain Insertion of (24) into (25) gives (22). (22) show that if p k is given by (1) with B k as in (22), then p k is independent of all ρ i , i = 0, . . . , k − 1, as long as B k is nonsingular. This result is formalized in the following proposition.  (7).

Lemma 4.2 and
i , i = 0, . . . , k − 1, and ϕ k chosen such that B k is nonsingular. Then, Proof. From Proposition 4.1 and Lemma 4.2 it follows that B k given by (22) gen- where and The direction is determined solely by B 0 + V ϕ k , compare with (8). The parameter ϕ k gives a scaling and the parameters ρ (k) i , i = 0, . . . , k − 1, have no effect on the direction. Certain choices of these parameters merely guarantee nonsingularity and may provide numerical stability.

Limited-memory Hessian approximations
In this section we extend the above discussion to limited-memory Hessian approximations. The goal is to obtain approximations such that (1) . As long as B k remains nonsingular, gradient information can be discarded from V ρ k but not from the part essential for the direction, namely V ϕ k . However, parallel directions can be generated if a specific correction term is added to the right hand side of (1), as is shown in Appendix, Theorem A.3. It can also be done as e.g. in [1], by at each iteration k recalculating the basis vectors from the m latest vector pairs (s i , y i ), In light of the above, the discussion will now be extended to consider Hessian approximations on the form  is the search direction of PCG with associated B 0 ∈ S n + , as stated in (7).
In particular, (29) with δ k = 1, and Proof. The parameters ρ (11) gives the if and only if condition for p k = δ k p P CG k , δ k = 0. Using the identity which follows from Lemma 2.1, in (32) and moving terms corresponding to B 0 to the right-hand side gives if and only if condition (29) on V k . In particular, with B 0 + V k = C T k B 0 C k , as given by (30), letting W k = B 0 in (12) gives p k = p P CG k . In addition, since C k is nonsingular and B 0 ≻ 0, it follows that since g T k−1 p P CG k−1 < 0 and p k−1 = δ k−1 p P CG k−1 , with δ k−1 = 0, by assumption. Therefore, Finally, V k of (31) satisfies (29) for δ k = 1 since g T k p k−1 = 0, as required. The result in Theorem 4.4 provides a class of multi-parameter limited-memory Hessian approximations where the memory usage can be changed between iterations. The choices of V k in (30) and (31) are merely two examples of members in the class. The matrix V k of (30) is of rank-two, and if used in B k = B 0 + V k + V ρ k , with ρ (k) i = 0, i = 0, . . . , k − 1, then B k p k = −g k may viewed as a symmetric PCG update, compare with (9). The matrix V k of (31) is the matrix of least rank that satisfies (29) with δ k = 1. In general, there is little restriction on which iterations to include information from in V ρ k . Information can thus be included from iterations that are believed to be of importance. All information may also be expressed in terms of search directions and the current gradient g k . This provides the ability to reduce the amount of storage when the arising systems are solved by reduced-Hessian methods, described in Section 4.2, with search directions in the basis.

Solving the systems
In this section we discuss solving systems of linear equations using reduced-Hessian methods. These methods provide an alternative procedure for solving systems arising in quasi-Newton methods. We follow Gill and Leonard [12,13] and refer to their work for a thorough introduction.
Assume that a Hessian approximation of Theorem 4.4 is given and used together with the exact linesearch method of Algorithm 1 for solving (QP ). The search direction at iteration k then satisfies p k = δ k p P CG k for a scalar δ k and hence by (7) and let Ψ k be a subspace such that Ψ min k ⊆ Ψ k . Furthermore let Q k be a matrix whose columns span Ψ k and Z k be the matrix whose columns are the vectors obtained from the Gram-Schmidt process on the columns of Q k . The search direction can then be written as p k = Z k u k for some vector u k . Premultiplication of (1) by Z T k together with which has a unique solution if B k is positive definite. Hence p k = Z k u k where u k satisfies (33). Note that the analogous procedure is also applicable for the result in Appendix, Theorem A.3 where the Hessian approximation is given by (A.6a) and p k is generated by

Construction of methods and complexity
The Hessian approximations proposed in Theorem 4.4 combined with the reduced-Hessian framework of Section 4.2 provide freedom in the construction of limitedmemory quasi-Newton methods. A summary of the quantities that can be chosen is shown in Table 1 below.
Provides the space for p k , columns must span Ψ min k . m k # columns of Q k and dimension of the reduced-Hessian.
The complexity of the method is essentially determined by the construction and solution of (33). Each iteration k requires a Gram-Schmidt process as well as the construction and factorization of the matrix Z T k (B 0 + V k + V ρ k ) Z k . If no information is re-used this gives the worst case complexity where constants have been omitted. However, the overall complexity can be reduced with particular choices of the quantities in Table 1. We will demonstrate this by constructing two relatively simple quasi-Newton methods with fixed limitedmemory m =m > 3 for which numerical results are given in Section 5. The methods will be denoted by symPCGs and Vsr1 for which V k is given by (30) of Theorem 4.4 and (31) respectively. Both use Hessian approximations on the form (28) and parameters defined by where ρ B i , i = 0, . . . , k − 1, are the quantities which corresponds to the secant condition. The basis is computed from Hence systems of size at most m has to be solved at every iteration with information from the (m-3)-first and the 3-latest iterations for k > m. In consequence part of the matrices Z k and V ρ k stay constant and the reduced-Hessian may be updated instead of recomputed at every iteration. The computational complexity is then dominated by max{n 2 , m 3 }, see Appendix for motivation. The choice m = n 2/3 gives complexity n 2 which is the same as that of PCG with exact linesearch.

Numerical results
In this section we first show the convergence behavior of quasi-Newton-like methods and PCG for randomly generated quadratic optimization problems. Thereafter we give performance profiles [5] for the two limited-memory methods described in Section 4.3. As mentioned in Section 4.3, these methods are referred to as symPCGs and Vsr1 respectively. These are also compared to our own Matlab implementations of PCG, with search direction given by (7), BFGS, with search direction given by (3) with φ k−1 = 0 for all k, and L-BFGS with search direction as proposed by Nocedal in [18]. All methods use exact linesearch, as defined by Algorithm 1, with their particular search direction update. We refer to our implementation of these methods as PCG, BFGS and L-BFGS respectively. The performance profiles are in terms of number of iterations for solving systems of linear equations which originate from the CUTEst test collection [15]. The benchmark problems were initially processed using the Julia packages CUTEst.jl and NLPmodels.jl by Orban and Siqueira [20].
The purpose of the first part is partly to illustrate the difference between quasi-Newton-like methods and PCG to give an indication of the round-off error effects. Convergence for a member in the class of quasi-Newton Hessian approximations in (26) of Proposition 4.3, here denoted by MuP, is shown in Figure 1. The figure also contains the convergence of the BFGS method and PCG in both finite and exact arithmetic, all with B 0 = I. In this study we consider exact arithmetic PCG as the original but with 512 digits precision. The parameters of (26) were chosen as follows, ϕ k = 0 for all k and The three methods compared in Figure 1 are with the exact linesearch method of Algorithm 1 equivalent in exact arithmetic on (QP). However, in finite arithmetic this is not the case. As can be seen in the figure, PCG suffers from round-off errors while BFGS behaves like the exact arithmetic PCG. The maximum error from all simulations between the iterates of BFGS and exact PCG was 5.1 · 10 −14 , i.e.
Consequently, the BFGS method does not suffer from round-off errors on these randomly generated quadratic problems. By the result of Proposition 4.3 it is not required to fix the parameters ρ (k) i , i = 0, . . . , k − 1, and as Figure 1 shows there is an interval where this result also holds in finite arithmetic. The secant condition is expected to provide an appropriate scaling of the quantities since it gives the true Hessian in n iterations. Our results indicate that there is no particular benefit for the quadratic case to choose the values given by the secant condition. This freedom may be useful, since values of ρ (k) i close to zero for some i may make the Hessian approximation close to singular, and such values could potentially be avoided.
The purpose of the next part is to give a first indication of the performance of the proposed class of limited-memory quasi-Newton methods for solving a sequence of related systems of linear equations. The sequences of systems were generated by considering unconstrained nonlinear CUTEst problems of the order 10 3 . For such a problem, a sequence of matrix and right-hand side pairs ∇ 2 f (x j ) and -∇f (x j ), j = 1, . . . , J, was accepted if an initial point x 0 for which Newton's method with unit step converged to a point x J such that ∇f (x J ) ≤ 10 −6 and λ min ∇ 2 f (x J ) > 10 −6 . In addition, it was required for each j that ∇ 2 f (x j )d j = −∇f (x j ), was solvable with accuracy at least 10 −7 by the built-in solver in Julia and that ∇ 2 f (x j ) was positive definite. These conditions reduced the test set to 21 problems giving 21 sequences of linear equations, corresponding to 220 systems of linear equations in total. The performance measure in terms of number of iterations is defined as follows. Let N p,s be the number of iterations required by method s ∈ S on problem p ∈ P. If the method failed to converge within a maximum number of iterations this value is defined as infinity. The measure P s (τ ) is defined as Performance profiles are shown in Figure 2 for ] and two different tolerances in the stopping criterion. The first corresponds to the criterion g k ≤ 10 −7 , namely when an approximate solution is desired. The other corresponds to when a more accurate solution is desired and the criterion g k ≤ max{ǫ DS , 10 −13 }, where ǫ DS is the solution accuracy of the built-in direct solver on the preconditioned Newton system with preconditioner B 0 . The two stopping criteria will be referred to as low and high accuracy respectively in Table 2 Table 3 we show a measure of the mean computational work done for solving systems on the form (33) per iteration. The measure is in terms of the average size of the reduced-Hessian relative to m. Moreover, the average number of iterations are shown in Table 2. The maximum number of iterations was set to 5n in all simulations.
We also give performance profiles when no regard is taken to the fact that the systems of linear equations within a sequence are related, namely when B 0 = I. The corresponding performance profiles and averaged quantities are shown in Figure 3 and Table 2-3 respectively.     Table 2 show that the BFGS method has the best overall performance and PCG the worst. With preconditioner SymPCGs and Vsr1 outperform L-BFGS for the two larger limited memory values m. This difference is not as distinct in the non-preconditioned case, as can be seen in Figure 3 even though SymPCGs and Vsr1 are able to solve more problems within the maximum number of iterations compared to L-BFGS. Numerical experiments have shown that, depending on the particular instance, small m-values can lead to loss of convergence rate close the solution. Tendencies to this can be seen in Figure 2-3 and Table 2 for the limited-memory methods with m = n 1 2 , which is around 30 for problems of size 10 3 . Further decreasing m also increases this tendency. The memory parameter m might be considered to be large however, as discussed in Appendix in regards to complexity, this value does not have a significant impact on the complexity when m ≤ n 2 3 . As Table 3 also shows, most systems solved in the preconditioned case is of size less than m. The slight difference between the low and high tolerances in Table 2-3 indicates that the bulk of the work is made prior to reaching low tolerance level. Numerical experiments has further shown that the behavior on a specific system of linear equations not only depends on the choices when designing the method but also on the individual properties of the linear system. We choose to give the results for the methods denoted by SymPCGs and Vsr1 since we believe that these are representative for the proposed limited-memory quasi-Newton class, given the complexity restrictions, on the test set. However we have observed that the results can be improved and dis-improved for different choices of the quantities in Table 1. We conclude that there is a potential for accelerating the process of solving sequences of related systems of linear equations with iterative methods in the proposed framework.

Conclusion
In this work we have given a class of limited-memory quasi-Newton Hessian approximations which on (QP) with the exact linesearch method of Algorithm 1 generate p k parallel to p P CG k . With the framework of reduced-Hessians this provides a dynamical framework for the construction of limited-memory quasi-Newton methods. In addition, we have characterized all symmetric rank-two update matrices, U k with R(U k ) = span {g k−1 , g k } which gives p k parallel to p P CG k in this setting. The Hessian approximations were described by a novel compact representation whose framework was first presented in Section 3 for the full Broyden class on unconstrained optimization problems (P). The representation of the full Broyden class consists only of explicit matrices and gradients as vector components.
Numerical simulations on randomly generated unconstrained quadratic optimization problems have shown that for these problems our suggested multiparameter class, with parameters within a certain range, is equivalent to the BFGS method in finite arithmetic. Simulations on sequences of related systems of linear equations which originate from the CUTEst test collection have given an indication of the potential of the the proposed class of limited-memory methods. The results indicate that there is a potential for accelerating the process of solving sequences of related systems of linear equations with iterative methods in the proposed framework.
The results of this work are meant to contribute to the theoretical and numerical understanding of limited-memory quasi-Newton methods for minimizing a quadratic function. We envisage that they can lead to further research on limitedmemory methods for unconstrained optimization problems. In particular, limitedmemory methods for minimizing a near-quadratic function and for systems arising as interior-point methods converge.
The following lemma relates a symmetric rank-one update in the matrix to a scaling of the solution when the rank-one vector is a multiple of the right-hand side.
Insertion of x = A −1 b into (A.2) and rearranging gives Insertion of y = αx into (A.3) and solving for α yields The result in (A.1) follows by premultiplication of y = 1 1+γb T x x by b T and rearranging. For the final result, note that b T x = x T Ax > 0 since A ≻ 0 and that which is a congruent transformation and hence I + γA −1/2 bb T A −1/2 ≻ 0 if and only if A + γbb T ≻ 0. Then consider the similarity transformation where the only eigenvalue not equal to unity is 1 Next, we give a nonsingularity condition on a class of quasi-Newton matrices that we consider.
Proof. Any vector p in R n can be written as Insertion of (A.4) into p T B k p gives For the remainder of the proof, let From the positive definiteness of B 0 , (A.5a) gives α k = 0, which in combination with (A.5c) gives α i = 0, i = 0, . . . , k. In addition, (A.5b) gives u = 0. Therefore, p T B k p = 0 only if p = 0, proving that B k is positive definite.
In Theorem 4.4, it has been shown how search directions parallel to PCG may be generated in a limited-memory setting if search directions are included in the basis in addition to gradients. The following theorem gives a way of using gradients only by modifying the right-hand side. Theorem A.3. Consider iteration k, 1 ≤ k < r, of the exact-linesearch method of Algorithm 1 for solving (QP ). Assume that p i = δ i p P CG i with δ i = 0 for i = 0, . . . , k−1, where p P CG i is the search direction of PCG with associated B 0 ∈ S n + , as stated in (7). Let A k = {j 1 , . . . , jm k } ⊆ {0, . . . , k} with j 1 < j 2 < · · · < jm k such that jm k = k and let I k = {0, . . . k} \ A k . Furthermore, let p k satisfy B k p k = −N k g k where and where δ k and ρ Proof. We will show that p k = δ k p P CG k , δ k = 0 satisfies B k p k = −N k g k . Note that by Lemma 2.1 it follows that g T i p P CG k = −g T k B −1 0 g k , i = 0, . . . , k and hence Insertion of p P CG k as in (8) with M = B 0 into (A.7) gives with N k given by (A.6b). Thus p k = δ k p P CG k , δ k = 0 is a solution to B k p k = −N k g k , since B k is nonsingular this is also the unique solution. The matrix B k is positive definite with δ k = 1 and ρ (k) j i > 0, i = 1, . . . , m k − 1 by Lemma A.2. The steps to show that B k ≻ 0 if it in addition holds that δ k > 0 is analogues to the last steps of Lemma A.1.
If all indices are chosen to be active in the Hessian approximation of (A.6a) then it is equivalent to (26) of Proposition 4.3 where ϕ k relates to δ k as in (20). Conversely, if the indices corresponding to all previous gradients, i.e. i = 0, . . . , k − 1 are inactive, and δ k = 1 then (A.6) with B k p k = −N k g k is equivalent to the PCG update (8). The update scheme of Theorem A.3 contains only gradients as vector components and with the exact linesearch method of Algorithm 1 the finite termination property is maintained. However, this is at the expense of adding a correction term on the right hand side. Moreover, the update in Theorem A.3 relies heavily on the result in Lemma 2.1 which is exact on quadratic problems. For non-quadratic problems other more accurate approximations and modifications may be considered to improve the method.

Complexity of symPCGs and Vsr1
It is sufficient to consider the complexity for iteration k ≥ m. Note that the first (m − 3) columns of Z k remain constant since the first (m − 3) columns of Q k remain the same. The basis matrix may hence be written as Z k = Z 0Zk where Z 0 ∈ R n×(m−3) andZ k ∈ R n×3 . This reduces the complexity of the Gram-Schmidt process to O(mn). Moreover, for both SymPCGs and Vsr1 the matrix B k can be written as . Note that F k has a compact representation of rank-five in symPCGs and of rank-four in Vsr1 respectively. Moreover, note that the matrix multiplicationM FN whereM ∈ R q 1 ×n ,N ∈ R n×q 2 and F ∈ R n×n is of rankr with a known compact representation, has complexity O(nq 1r + q 1 q 2r + nq 2r ). The reduced-Hessian can at iteration k be written as The matrices Z T 0 (B 0 + V ρ 0 )Z 0 ∈ R (m−3)×(m−3) and (B 0 + V ρ 0 )Z 0 ∈ R n×(m−3) have been successively built for iteration k < m − 3 and can be stored. What remains to be computed and the corresponding complexity, taking into account thatr ≤ 5, is shown in It remains to add everything together, which has complexity O(m 2 ), and to factorize, which has complexity m 3 . The overall asymptotic complexity is thus dominated by max{n 2 , m 3 }.