Abstract
Problem of solving the strictly convex, quadratic programming problem is studied. The idea of conjugate directions is used. First we assume that we know the set of directions conjugate with respect to the hessian of the goal function. We apply n simultaneous directional minimizations along these conjugate directions starting from the same point followed by the addition of the directional corrections. Theorem justifying that the algorithm finds the global minimum of the quadratic goal function is proved. The way of effective construction of the required set of conjugate directions is presented. We start with a vector with zero value entries except the first one. At each step new vector conjugate to the previously generated is constructed whose number of nonzero entries is larger by one than in its predecessor. Conjugate directions obtained by means of the above construction procedure with appropriately selected parameters form an upper triangular matrix which in exact computations is the Cholesky factor of the inverse of the hessian matrix. Computational cost of calculating the inverse factorization is comparable with the cost of the Cholesky factorization of the original second derivative matrix. Calculation of those vectors involves exclusively matrix/vector multiplication and finding an inverse of a diagonal matrix. Some preliminary computational results on some test problems are reported. In the test problems all symmetric, positive definite matrices with dimensions from \(14\times 14\) to \(2000\times 2000\) from the repository of the Florida University were used as the hessians.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the paper we consider problem of solving the following unconstrained optimization problem
Equation (3) corresponds to the first order necessary optimality conditions for problem (1)
where matrix \(\mathbf{G}\) is symmetric and positive definite.
The problem is equivalent to the problem of finding the solution of the set of linear equations
The equivalence of (3) and (1) is easily verified.
There are two main approaches to solve the quadratic problem (1) or the set of linear equations (3). In the first, a sequence of consecutive directional minimizations along conjugate directions (first of all conjugate gradients) is carried out. In the second approach, one first finds the triangular Cholesky decomposition of matrix \(\mathbf{G}\) and next two sets of linear equations with triangular matrices are solved.
The notion of the conjugate directions plays very important role in the solution methods of the QP problem (1) and general unconstrained nonlinear minimization problems. Powell (1964) has proposed the conjugate directions method without calculating the derivatives. The consecutive direction was generated as the difference between the minima found on two parallel straight lines. The most popular are the conjugate gradient methods. The first one was developed by Hestenes and Stiefel (1952), who proposed it for solving symmetric, positive-definite linear algebraic systems. In 1964 the method has been extended to nonlinear problems by Fletcher and Reeves (1964). The procedure is recursive
where \(\mathbf{d}^0=-\nabla f(\mathbf{x}^0)\) and every consecutive search direction is the linear combination of the previous one and the Cauchy steepest descent direction
It means that starting from an initial guess \(\mathbf{x}^0\) (with corresponding residual \(\mathbf{r}^0 = \nabla f(\mathbf{x}^0) = \mathbf{G}\mathbf{x}^0 -\mathbf{b}\)), this method generates a sequence of iterates \(\left\{ \mathbf{x}^k\right\} \) that minimize f over the Krylov subspaces \(\mathbf{x}^0 + span(\mathbf{r}^0,\mathbf{G}\mathbf{r}^0,\ldots ,\mathbf{G}^{k-1}\mathbf{r}^0\)). Since the publication of the Hestenes and Stiefel (1952), conjugate gradient methods have become an area of intensive research and large number of variants of conjugate gradient algorithms have been designed. Some well-known choices for \(\beta ^k\) are Fletcher and Reeves (1964), Polak and Ribiére (1969), Polyak (1969), Dai and Yuan (1999) and Hestenes and Stiefel (1952):
respectively, where \(\mathbf{g}^k = \nabla f(\mathbf{x}^k)\), \(\mathbf{r}^k = \mathbf{g}^{k+1} \mathbf{g}^k\) and \(\Vert \cdot \Vert \) means the Euclidean norm. Surveys on the developments in that area one may find for instance in Hager and Zhang (2005), Stachurski (2012) and in Dai (2011).
Standard algebraic approach to solve either the set of linear equations (1) or the quadratic problem (3) is two-phase. In phase one, the triangular Cholesky decomposition (published post-mortem by Benoit (1924)) of the hessian matrix is found, i.e., the lower triangular matrix \(\mathbf{L}\) such that \(\mathbf{G} = \mathbf{L}\mathbf{L}^T\). In phase two, two sets of linear equations with triangular matrices are consecutively solved using the forward substitution followed by the backward substitution in order to find the solution of (1)
A closely related approach is to express the hessian as \(\mathbf{G} = \mathbf{L}\mathbf{D}\mathbf{L}^T\), where \(\mathbf{L}\) is lower triangular with ones on the main diagonal and \(\mathbf{D}\) is a diagonal matrix. Similarly as in formula (4), after finding the decomposition, two sets with triangular matrices and one with a diagonal matrix are solved
A survey of direct methods for sparse linear systems one may find in the Davis et al. (2016) technical report. Problems of type (1,3) are a kind of inverse problems and have many applications. An inverse problem in science is the process of calculating from a set of observations the unknown input values: for instance calculating an image in computer tomography, deblurring the digital picture, reconstructing the source in acoustics or calculating the density of the Earth from measurements of its gravity field (for more information see Wikipedia (2017)). Many of them involve large-scale problems with symmetric, positive definite matrices and multiple right-hand sides.
We observe recently a renewed interest in large, sparse, unconstrained quadratic optimization problems in that context. Currently the research goes into many directions. There are many papers devoted to search for the best preconditioner to the problem, see for instance Gratton et al. (2011), Andrea et al. (2017), Fasano and Roma (2016) and many other. Feng et al. (1995) consider the block conjugate gradient methods for solving linear systems with multiple right-hand sides. Block algorithm is also applied to problems with multiple right-hand sides in El Guennouni et al. (2003) (this time the considered matrices are not symmetric). Some researchers concentrate on finding effectively the solution of the unconstrained quadratic optimization or the associated system of linear equations. For instance in Alléon et al. (2003), Simoncini and Szyld (2007) one may find information on algorithms combining direct methods with the iterative approach. Tasks of the noisy pictures deblurring are formulated frequently in the form of the quadratic programming problems with nonnegativity conditions on variables and regularization, see for instance Chen and Zhou (2010) and references therein. The problem in Chen and Zhou (2010) is essentially quadratic and non smooth due to the regularity term formulated in the \(l_1\)-norm. Another trend is to find directly factorization of the inverse matrix. Let us mention in that context the papers of Yunlan et al. (1997) and Zhu et al. (2011). In Yunlan et al. (1997) and Zhu et al. (2011) the structure of the hessian matrix is deeply exploited. Let us mention also SelInv – the selected inversion algorithm for the selected inversion of a sparse symmetric matrix developed by Lin et al. (2011). In the first stage of SellInv the \(\mathbf{L}\mathbf{L}^T\) matrix decomposition of \(\mathbf{G}\) is calculated and in the second stage selected subsets of entries are found. SelInv requires special structure of the matrix to do it effectively. Electronic structure calculations are among possible applications areas. In our result we start from the notion of conjugate vectors and do not use the structure. It is worth to mention here also the iterative refinement technique introduced by Ogita and Oishi (2012).
The content of the paper may be treated as a result on the junction of two disciplines: optimization and linear algebra. Currently a close relationship between optimization and numerical linear algebra algorithms is observed. As O’Leary (2000) stated
“The efficiency and effectiveness of most optimization algorithms hinges on the numerical linear algebra algorithms that they utilize. Effective linear algebra is crucial to their success, and because of this, optimization applications have motivated fundamental advances in numerical linear algebra.”
Our paper inscribes into that direction. In our opinion it may be of interest to some researchers working in optimization algorithms and linear algebra as well. In our paper, we use the notion of conjugacy frequently exploited in optimization. We propose a non conventional way of using given set of conjugate directions. In that approach n independent simultaneous directional minimizations starting from the same point \(\mathbf{x}^0\) are simultaneously executed and finally the solution is obtained by adding the linear combination of the search directions with combination coefficients equal to the calculated directional step sizes to \(\mathbf{x}^0\). Section 2 contains some basic definitions and properties of the conjugate directions and in consecutive Sect. 3 the main theorem justifying correctness of the proposed procedure using n independent, simultaneous directional searches is formulated and proved. In Sect. 4 the algorithm for recursive generation of the conjugate directions is presented. In Sect. 5 it is shown that the generated conjugate directions form the \(\mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\) decomposition of the inverse of the second order derivative matrix (hessian) – \(\mathbf{G}^{-1}\). Particular values on the main diagonal matrix \(\mathbf{R}\) and what is strongly related the main diagonal entries of the \(\mathbf{D}\) matrix depend on the choice of the parameters controlling the length of the generated vectors. Let us stress that the recursive presentation of the Cholesky factorization presented in the book written by Stoer (1976) (see also Stoer and Bulirsch (1993)) was to some extent an inspiration for us (the same representation was used by Wilkinson (1963) to prove the numerical stability of the Cholesky decomposition). Our method works also recursively, coming out from a subspace of the given dimension to the subspace of dimension larger by one.
The resulting algorithm is two-phase. Its details are discussed in Sect. 6. In the first phase, we generate set of conjugate directions. The second phase consists of performing n independent directional minimizations. It may be in our opinion useful in the distributed calculations and cloud computing. The second phase may be also executed by means of simply applying three consecutive matrix/vector multiplications. Both approaches are compared numerically on the set of 128 different strictly convex, quadratic programming problems of different sizes (space dimension varies from 14 to 2000). They were created with the aid of 64 symmetric, positive definite matrices taken from the Florida University repository Davis and Hu (2016) and two different versions of the \(\mathbf{b}\) vector (see Sect. 7). All those matrices were taken from the real life applications and they are stored in the sparse matrix format.
The computational effort in the second phase in our approach would be much lower than in the standard way of decomposing the matrix \(\mathbf{G}\) followed by the solution of two sets of linear equations with triangular matrices and plus eventually a set with the diagonal matrix. The computational cost of the first phase is identical with the cost of calculating the standard \(\mathbf{L}\mathbf{D}\mathbf{L}^T\) factorization of matrix \(\mathbf{G}\). In the final Sect. 8 some conclusions and corollaries are presented.
2 Some definitions and properties of the conjugate directions
To realize factorization of the inverse \(\mathbf{G}^{-1}\) of the considered symmetric, positive definite matrix \(\mathbf{G}\) we shall construct directions which are conjugate with respect to \(\mathbf{G}\). Formal definition of the \(\mathbf{G}\)–conjugacy is recalled below. Definition of conjugate vectors and proof of their linear independence are taken from Bertsekas (1995).
Definition 1
Nonzero vectors \(\mathbf{d}_1\), \(\mathbf{d}_2\), \(\ldots \), \(\mathbf{d}_k\) are conjugate with respect to symmetric, positive definite matrix \(\mathbf{G}\) if and only if
Let us recall that \(\mathbf{G}\)–conjugate directions are linearly independent.
Lemma 1
If \(\mathbf{G}\) is a symmetric, positive definite, \(n\times n\) matrix, vectors \(\mathbf{d}_i\) (for \(i=1, \ldots ,k\le n\) are nonzero and mutually conjugate with respect to \(\mathbf{G}\), then they are linearly independent.
Proof
Let us assume that the thesis is not valid. Then, there exists a convex linear combination of vectors \(\mathbf{d}_i\) equal to \(\mathbf{0}\)
Let us multiply the equation from both sides by \(\mathbf{d}_j^T\mathbf{G}\). Then,
due to the \(\mathbf{G}\)–conjugacy property. From the last equality we deduce that \(\alpha ^j = 0\), because \(\mathbf{d}_j\mathbf{G}\mathbf{d}_j> 0\) by positive definiteness assumption. We obtained the contradiction. That concludes the proof.
3 Solving QP problem by means of simultaneous directional minimizations
The key property used in this section is formulated as the theorem.
Theorem 1
Let vectors \(\mathbf{d}^i\) (for \(i=1, \ldots ,n\)) be a set of directions mutually conjugate with respect to the symmetric, positive definite matrix \(\mathbf{G}\), where \(\mathbf{G}\) is the second order derivative matrix of problem (1). Let us take any point \(\mathbf{x}^0\in R^n\) and execute n exact directional minimization searches along any vector \(\mathbf{d}^i\), i.e., solve the following one-dimensional minimization problems
and their solutions by \(\alpha ^i\) respectively.
Then point \(\bar{\mathbf{x}}\) defined as
is the optimal solution of the quadratic programming problem (1).
Proof
We shall show that \(\mathbf{G}\tilde{\mathbf{x}} - \mathbf{b} = \mathbf{0}\). To prove the thesis it is sufficient to show that the gradient \(\bar{\mathbf{g}} = \nabla f(\bar{\mathbf{x}})\) (first derivative of function f calculated at the point \(\bar{\mathbf{x}}\)) is orthogonal to all vectors \(\mathbf{d}^i\), i.e.,
Let us select any of the conjugate vectors \(\mathbf{d}^i\) (let it be \(\mathbf{d}^k\)). Then
The conjugacy property ensures that inside the sum there is only one term with nonzero value, namely for \(i=k\). Therefore after simple recalculations we obtain
Let us notice that
by the rule of differentiating composite functions. Hence
due to the exact directional minimization on the straight line defined by the set of points \(\left\{ \mathbf{x}(\alpha )\in R^n\; | \; \mathbf{x}(\alpha ) = \mathbf{x}_0 + \alpha \mathbf{d}^k, \alpha \in R^1 \right\} \).
From Lemma 1 we deduce that vectors \(\mathbf{d}^i\) (\(i=1, \ldots ,n\)) form the basis of \(R^n\). We proved that the derivative \(\nabla f(\bar{\mathbf{x}})\) is orthogonal to all vectors from that basis. Therefore, it should be equal to \(\mathbf{0}\).
Similar theorem may be found in Fletcher (1987). However, he considered it in the essentially sequential way – directional search into one direction, followed by the move to the new point, then another directional minimization from the second point followed by the move to the third point and so on. Fletcher’s theorem is the basis for the conjugate directions methods in optimization. Here, we consider n independent directional minimizations starting all of them from the same point \(\mathbf{x}^0\).
Theorem 1 suggests an obvious numerical procedure to solve a quadratic programming problem (1) with a symmetric, positive definite matrix \(\mathbf{G}\) in a computational cloud or on a machine with many parallel processors. We may simply run simultaneously n independent directional minimizations starting from the same point \(\mathbf{x}^0\). The crucial problem is the effective generation of the conjugate directions. A recursive procedure of their generation is proposed in the consecutive section.
4 Recursive generation of conjugate directions
We shall start with the vector having only one nonzero entry, namely the first one (for instance equal to 1) and every consecutive vector will have one nonzero entry more than the previous one, i.e., k-th vector will have first k entries possibly not equal to zero, the rest will assume zero value. Furthermore every \(k+1\) vector should be conjugate to all previous vectors with respect to matrix \(\mathbf{G}\).
So, let us assume now that the starting vector \(\mathbf{d}^1=\left[ c, 0, \ldots ,0\right] ^T\), where \(c>0\) is a positive constant, and try to find consecutive vector \(\mathbf{d}^2\) verifying the following requirements:
-
\(\mathbf{d}^2\) is conjugate to \(\mathbf{d}^1\) with respect to \(\mathbf{G}\),
-
only two first entries of \(\mathbf{d}^2\) may have nonzero values, all others are equal to 0, i.e., \(\mathbf{d}^2 = \left[ d_1, d_2, 0, \ldots , 0\right] ^T\).
Then
Defining vector \(\mathbf{d}^2\) as a vector with two nonzero entries as follows
we shall have the following situation
Vector \(\mathbf{d}^2\) should be \(\mathbf{G}\) conjugate with \(\mathbf{d}^1\), i.e.,
Let us calculate appropriate vector \(\mathbf{d}^2\). Due to the choice of \(\mathbf{d}^1\) and \(\mathbf{d}^2\) we have
Now, simple matrix calculations lead to
In order to preserve the conjugacy property this quantity should be equal to 0
and therefore coefficients of vector \(\mathbf{d}^2\) should verify the equality
We’ve got one degree of freedom in Eq. (8). We may take for instance \(d_2^2 = 1\) and solve Eq. (8) with respect to \(d_1^2\).
In what follows we shall derive similar representation for every k. In step number \(k>1\) we permit that only the first k elements of the new direction \(\mathbf{d}^k\) may assume nonzero values, all others are set to 0. Furthermore, we preserve the conjugacy with respect to \(\mathbf{G}\) to all previous directions \(\mathbf{d}^1, \ldots ,\mathbf{d}^{k-1}\) where in every vector \(\mathbf{d}^i\) (\(i<k\)) only the first i coefficients may assume nonzero values.
Let us assume the following block representation of matrix \(\mathbf{G}\)
and let \(\mathbf{d}^k\) has the following structure
with \(\mathbf{d}\) being a column vector of size \(k-1\) and c is a scalar.
Similarly \(\mathbf{d}^i\) (\(i<k\)) may be represented as
where \(\bar{\mathbf{d}}^i\) belongs to the space .
The conjugacy requirement due to the splitting formulated in Eqs. (9–11) may be written as
Hence, the considerations may be restricted to the k dimensional space and conjugacy with respect to the \(k\times k\) principal submatrix \(\mathbf{G}_k\). Applying (9) again we shall arrive to the following equivalent form of (12)
Vector \(\mathbf{d}\) belongs to the space spanned on vectors \(\mathbf{d}^1, \ldots ,\mathbf{d}^{k-1}\). Therefore it may be expressed as their linear combination
Let us put the above expression of \(\mathbf{d}\) to (13) and calculate coefficients \(\tau _j\) which will ensure verification of the conjugacy condition by vector \(\mathbf{d}^k\)
where \(D_i = \left[ \bar{\mathbf{d}}^i\right] ^T\mathbf{G}_{k-1}\bar{\mathbf{d}}^i\) for each \(i<k\). Now, formula (14) results in the following values of coefficients
and finally we obtain the formula specifying new vector
Hence, we may formulate the following algorithm for generation of vectors mutually conjugate with respect to matrix \(\mathbf{G}\).
Algorithm I
[Initialization] Set \(k=1\). Select c, for instance \(c=1\). Build vector
Calculate \(D_1 = \left( \mathbf{d}^1\right) ^TG_{11}\mathbf{d}^1\).
[Loop]
Increase the dimension, set k = k+1.
while (k < n)
\(\{\)
-
1.
Calculate vector \(\mathbf{d} = -\sum _{j=1}^{k-1} c \bar{\mathbf{d}}^j D_j^{-1}\left( \bar{\mathbf{d}}^j\right) ^T\mathbf{g}\).
-
2.
Build vector \(\mathbf{d}^k\) as follows \(\left( \mathbf{d}^k\right) ^T = \left[ \mathbf{d}^T,\;c, \; \mathbf{0}\right] \).
-
3.
Calculate \(\mathbf{D}_k = \left( \mathbf{d}^k\right) ^TG_{kk}\mathbf{d}^k\).
-
4.
Set \(k = k+1\).
\(\}\)
The induction analysis presented above proves the following lemma
Lemma 2
Vectors \(\mathbf{d}^1\), \(\mathbf{d}^2\), \(\ldots \), \(\mathbf{d}^n\) generated by Algorithm I are mutually conjugate with respect to matrix \(\mathbf{G}\).
4.1 Matrix form of the algorithm for generating conjugate directions
Algorithm presented above may be written in the matrix form.
Let us denote by \(\mathbf{R}^i\) (\(1 \le i \le k-1\)) the rectangular matrix with columns equal to consecutive \(\mathbf{d}^j\) (\(1\le j\le i\)) vectors and \(\bar{\mathbf{R}}^i\) (\(1 \le i \le k-1\)) the rectangular matrix with columns equal to consecutive \(\bar{\mathbf{d}}^j\) (\(1\le j\le i\)), i.e.,
Due to their construction
and
Taking the above relation into account we may express formula (16) in the following matrix form
Let us introduce the following notation
Then formula (17) may be expressed as
Below is the formal formulation of the pseudo code of the matrix form of Algorithm I.
Algorithm II
[Initialization] Set \(k=1\). Set \(\mathbf{L}^1 = 1\), select c, for instance \(c=1\).
[Loop]
while (k < n)
\(\{\)
-
1.
Calculate \(\mathbf{D} = \mathbf{L}^k\mathbf{G}_k\left( \mathbf{L}^k\right) ^T\).
-
2.
Increase the dimension by 1; set \(k = k+1\). Calculate vector \(\mathbf{d}\) using formula (18).
-
3.
Build new matrix \(\mathbf{L}^k\) as follows
$$\begin{aligned}\mathbf{L}^k = \left[ \begin{array}{ll} \mathbf{L}^{k-1} &{} \mathbf{0} \\ \mathbf{d}^T &{} c \end{array} \right] . \end{aligned}$$
\(\}\)
To calculate vector \(\mathbf{d}\) we should realize the matrix/vector multiplication, followed by the eventual multiplication of each component by scalar c and consecutive division by the main-diagonal entry of the diagonal matrix \(\mathbf{D}\). The last two operations can be realized simultaneously.
New lower triangular matrix \(\mathbf{L}^k\) is augmented at each step by the addition of a new row representing transposed vector \(\bar{\mathbf{d}}^k\) and a new column containing exclusively entries equal to 0 above the main diagonal.
5 Generated directions and their relation with the factorization of the inverse matrix
Last matrix \(\mathbf{L}^n\) contains n-rows being transpositions of n mutually \(\mathbf{G}\)-conjugate vectors and verifies the following matrix relation
where \(\mathbf{L}^n\) is lower triangular and \(\mathbf{D}^n\) is the diagonal matrix.
Below we shall show that Algorithm II produces the LDL factorization of the inverse of \(\mathbf{G}\).
Lemma 3
The following equality holds
Proof
Taking the inverses in Eq. (19) we obtain
Multiplying (21) from the left-hand side by \(\left[ \mathbf{L}^n\right] ^T\) and by \(\mathbf{L}^n\) from the right-hand side we shall obtain the desired expression (20) of the inverse of \(\mathbf{G}\).
It means that the proposed algorithm produces directly an \(\mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\) factorization of the inverse matrix \(\mathbf{G}^{-1}\). It is easily observed that the same statement is valid for the intermediate set of generated vectors.
Lemma 4
The following equality holds for each \(1\le k\le n\)
Deeper analysis and stability issues of this approach to obtain directly the factorization of the inverse matrix will be pursued elsewhere.
6 Two-phase algorithm
A two-phase algorithm is proposed on the basis of the above theoretical considerations. In the first phase we calculate the conjugate directions. For their calculation we may apply either Algorithm I or Algorithm II formulated above. As follows from the above discussion the proposed algorithm is recursive. We start with the space of dimension equal to 1 and increase it by one at each step. It is also important in this procedure that the only algebraic operations involved are the matrix/vector multiplication and division or multiplication of the matrix entries by an appropriate number.
It is possible to reduce further the computational effort, if we restrict calculation of the entries of the diagonal matrix \(\mathbf{D}\) in step 1 to the last entry on the main diagonal, i.e., \(D_{kk}\). All other entries on the main diagonal are the same as in the previous iteration.
In the second phase the optimal solution of the QP (quadratic programming) problem (1) is found. After closing calculation of the conjugate gradients it is clear, whether the QP problem has the optimal solution or it is unbounded. So, let us now assume that the QP problem is bounded from below.
The algorithm executing n independent simultaneous directional minimizations is presented in Sect. 6.1. It is based on theorem 1.
Second possibility presented in Sect. 6.2 is to solve the stationarity condition. By Lemma 4 in Sect. 5 we know that as a byproduct of the conjugate gradient generation we obtain the factorization of the inverse hessian. Therefore we can find the solution of the stationarity condition (3) by simple matrix/vector multiplications. This approach is discussed in the Sect. 6.2.
6.1 Finding the solution of the QP problem
The key property used in this subsection is the result of Theorem 1.
Similar theorem may be found in Fletcher (1987). However, he considered it in the essentially sequential way – directional search into one direction, followed by the move to the new second point, then another directional minimization from the second point followed by the move to the third point and so on. Fletcher’s theorem is the basis for the conjugate directions methods in nonlinear unconstrained optimization. Here, we consider n independent directional minimizations starting all of them from the same point \(\mathbf{x}^0\).
Theorem 1 suggests an obvious numerical procedure to find the minimum of the quadratic function (1) with a symmetric, positive definite matrix \(\mathbf{G}\) in a computational cloud or on a machine with many parallel processors. We may simply run simultaneously n independent directional minimizations starting from the same point \(\mathbf{x}^0\).
Algorithm III
-
A0.
Select the starting point \(\mathbf{x}^0\). \(\mathbf{x}^0=\mathbf{0}\) may be a suitable choice, because then \(\nabla f(\mathbf{x}^0) = -\mathbf{b}\).
-
A1.
Execute n independent directional minimizations, i.e., find solutions to all problems
$$\begin{aligned} \min _\alpha \;\; \left\{ \bar{f}^i(\alpha ) = f(\mathbf{x}^0 + \alpha \mathbf{d}^i)\right\} , \;\; \forall \; i=1, \ldots ,n. \end{aligned}$$(23) -
A2.
Calculate the solution of the problem using the formula
$$\begin{aligned} \bar{\mathbf{x}} = \mathbf{x}^0 + \sum _{i=1}^n \alpha ^i \mathbf{d}^i. \end{aligned}$$(24)
Function f in formula (23) is quadratic (see formula (1)). Therefore, each directional step size \(\alpha ^i\) is the stationary point of the derivative of function \(\bar{f}^i\), which is minimized in Step A1. Hence the step size verifies the following equation
Simple arithmetic operations on Eq. (25) leads to the following expression of \(\alpha ^i\)
What is important, derivative \(\nabla f(\mathbf{x}^0)\) may be calculated once. The same value is used for every step i. We need the following data: \(\nabla f(\mathbf{x}^0)\), \(\mathbf{d}^i\) and \(\mathbf{G}\) to calculate \(\alpha ^i\).
6.2 Finding the solution of the set of equations
Conjugacy condition may be equivalently written as follows
Multiplying Eq. (27) from the left-hand side by the inverse of \(\mathbf{R}^T\) and from the right-hand side by \(\mathbf{R}^{-1}\) we shall obtain
Now application of the commonly known rule of calculating the inverse of the product of square nonsingular matrices yields
Let us apply formula (29) to equation
This results in
In view of equality (30) it is easy to distribute the calculation of \(\mathbf{x}\) verifying Eq. (1). It involves exclusively matrix/vector multiplications and calculation of the image of the matrix \(\mathbf{D}^{-1}\) on some appropriate vector. Matrix \(\mathbf{D}\) is diagonal. Hence multiplication by \(\mathbf{D}^{-1}\) requires only n arithmetic divisions. There are efficient methods to realize matrix/vector multiplication effectively, see for instance Bertsekas and Tsitsiklis (1989).
7 Numerical experiments
The sample matrix test problems from the repository of the University of Florida were used for the numerical verification of the proposed algorithm (description may be found in Davis and Hu (2011a)). The repository is available on-line and maintained by Davis and Hu (2016). We restricted calculations to the subset of symmetric, positive definite matrices from the repository. They are stored in three different formats: Rutherford-Boeing format (see Duff et al. (1989a, b, 1997)), in the Matrix Market format Boisvert et al. (1997a) and Boisvert et al. (1997b) and the MATLAB format. In the MATLAB format used by us, the problem set is a single MATLAB variable of struct type, stored in a single MAT-file.
Tables 1 and 2 contain information about the test problems. We selected from the repository all problems with symmetric, positive definite matrices of size between 0 and 2000. Many of them are structural problems, where matrix \(\mathbf{G}\) is the so-called stiffness matrix of the structure and is usually symmetric and positive definite. It arises as the result of the finite element discretization of the problem. The details may be found for instance in Felippa (2001) or McGuire et al. (2000).
The computational results are presented in Tables 3–6. The quantities defined below do appear in particular columns:
![figure a](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00186-017-0607-z/MediaObjects/186_2017_607_Figa_HTML.gif)
![figure b](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00186-017-0607-z/MediaObjects/186_2017_607_Figb_HTML.gif)
Tables 3 and 4 contain the results for problems with the right-hand side vector \(\mathbf{b}=[1,1, \ldots ,1]^T\). Unfortunately, almost all problems in the repository does not contain vector \(\mathbf{b}\).
Tables 5 and 6 contain results for problems with the same matrices \(\mathbf{G}\) as above. The only difference is that the solution vector of the linear system is assumed to have all components equal to one (\(\hat{\mathbf{x}}_i = 1, \;\; \forall i\)) and vector \(\mathbf{b} = \mathbf{G}\hat{\mathbf{x}}\) respectively. We omitted the factorization accuracy and the decomposition times. They are the same as in Tables 3 and 4 presented above.
Below, in Tables 7 and 8 we present the results obtained by means of the standard Cholesky decomposition of matrix \(\mathbf{G} = \mathbf{L}\mathbf{L}^T\). Having this factorization available two sets of linear equations with triangular matrices are solved. The first one by means of the forward substitution, the second by means of the backward substitution. Those calculations were suggested by one of the reviewers for comparison purposes. Some quantities stored in columns of Tables 7 and 8 have different meaning than in previous tables. Therefore, they are redefined below.
![figure c](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00186-017-0607-z/MediaObjects/186_2017_607_Figc_HTML.gif)
![figure d](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00186-017-0607-z/MediaObjects/186_2017_607_Figd_HTML.gif)
The following observations can be drawn on the basis of the results presented in the paper
-
Factorization of the inverse \(\mathbf{G}^{-1}\) is calculated without decomposition of \(\mathbf{G}\).
-
Decomposition of the inverse matrix involves exclusively matrix/vector multiplication, calculation of some scalar products and calculation of the image of an inverse of a diagonal matrix on a vector (it is equivalent to the division of vector components by the main diagonal entries of the matrix).
-
Second stage is easily distributed, what may be useful in cloud computing.
-
Presented results of sequential calculations seem to be encouraging.
-
They failed only on problems: bcsstk20, bcsstk19, bcsstk11, bcsstk12, ex3, ex33, plat1919. Let us stress, that we haven’t used any scaling. Further numerical experimentation is necessary.
-
Variant with the matrix/vector operations was usually faster than the n simultaneous directional minimizations. The accuracy was exactly the same (all digits of the solution and the forward and backward errors were equal).
-
Standard approach (Cholesky decomposition of matrix \(\mathbf{G}\) followed by solving two sets of linear equations with triangular matrices was usually more robust. In any case, when the method proposed in the paper succeeded, the results were similar. The accuracy was sometimes slightly better in our approach, sometimes in the standard one. In cases, when our approach failed, the standard approach generated solutions with better quality indicators. However, they were not fully satisfactory. This comparison is not fully fair, because we compare the results of factorization \(\mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\) of \(\mathbf{G}^{-1}\) with the \(\mathbf{L}\mathbf{L}^T\) decomposition of \(\mathbf{G}\).
8 Conclusions
In the paper we investigated the two-stage approach to solve strictly convex quadratic programming problems. In the first stage we generate the set of vectors mutually conjugate with respect to the hessian of the QP problem. It is followed in the second stage by n independent directional minimizations executed concurrently from the same starting point \(\mathbf{x}^0\). We think that it may be useful for distributed calculations in cloud computing.
Additionally we find in that way a factorization of the inverse matrix \(\mathbf{G}^{-1}\) without calculating factorization of \(\mathbf{G}\) itself. It has the form \(\mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\), where columns of matrix \(\mathbf{R}\) are the conjugate vectors found by the first stage algorithm. Knowledge of such inverse factorization may be very useful for problems both from optimization (when we have a family of QP problems with different linear terms) and linear algebra problems (with multiple right-hand sides). Our approach may be also used in conjunction with the Bientinesi et al. (2008) one sweep procedure to calculate the inverse of a symmetric, positive definite matrix. It may be applied to find the factorization of the particular Schur block components.
The first stage is essentially sequential. But our second idea of applying n independent directional minimizations of the associated quadratic programming problem permits distribution of calculations in the second stage. It should be stressed that the solution accuracy of two algorithms tested in the second stage was exactly the same. All digits in the reported errors were equal.
There are still some open questions which require further investigation. The first one is, what are the classes of sparse problems, where the approach presented in the paper will be better than the standard factorization of \(\mathbf{G}\) followed by the solution of two sets of linear equations with triangular matrices. Second is how to apply those ideas to the problems with the block structure. May be the fact that columns of factor \(\mathbf{L}\) in the decompositions \(\mathbf{G} = \mathbf{L}\mathbf{D}\mathbf{L}^T\) or \(\mathbf{G} = \mathbf{L}\mathbf{L}^T\) are mutually conjugate with respect to \(\mathbf{G}^{-1}\) would serve as a hint. Another possibility is to combine the block approach of Bientinesi et al. (2008) with our approach applied to particular blocks. Factorization generated by means of our approach may be also used as the starting one in the refinement technique studied by Ogita and Oishi (2012) and Yanagisawa et al. (2014).
The reported computational results should be treated as preliminary. There are of course some issues which should be clarified. The most important are: identification of the suitable sparsity structure and selection of the appropriate values of parameter c. We are not restricted to the choice of \(c=1\), which was used in our test calculations. This may be especially important in establishing the numerical stability of the method. To solve that task the most promising idea would be the normalization of the consecutive conjugate vectors.
References
Alléon G, Carpentieri B, Duff IS, Giraud L, Langou J, Martin É, Sylvand G (2003) Efficient parallel iterative solvers for the solution of large dense linear systems arising from the boundary element method in electromagnetism. In: Proceedings of the international conference on supercomputing in nuclear application (SNA), Paris
Andrea C, Giovanni F, Massimo R (2017) Novel preconditioners based on quasi-Newton updates for nonlinear conjugate gradient methods. Opt Lett 11(4):835–853
Benoit C (1924) Note sur une methode de resolution des equations normales provenant de l’application de la methode des moindres carres é un systeme d’equations lineaires en nombre inferieur à celui des inconnues (Procede du Commandant Cholesky). Bull Geod 2:67–77
Bertsekas DP (1995) Nonlinear programming. Athens Scientific, Belmont
Bertsekas DP, Tsitsiklis JN (1989) Parallel and distributed computation, numerical methods. Prentice Hall, Englewood Cliffs
Bientinesi P, Gunter B, van de Geijn RA (2008) Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans Math Softw 35(1):3
Boisvert RF, Pozo R, Remington K, Barrett R, Dongarra J, Miller B, Lipman B (1997a) The matrix market. http://math.nist.gov/MatrixMarket. Accessed October 2016
Boisvert RF, Pozo R, Remington K, Barrett R, Dongarra JJ (1997b) The matrix market: a web resource for test matrix collections. In: Boisvert RF (ed) Quality of numerical software, assessment and enhancement. Chapman & Hall, London, pp 125–137
Chen X, Zhou W (2010) Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex ninimization. SIAM J Imaging Sci 3(4):765–790
Dai YH (2011) Nonlinear conjugate gradient methods. ftp://lsec.cc.ac.cn/pub/dyh/papers/CGoverview.pdf. Accessed 26 May 2017
Dai YH, Yuan Y (1999) A nonlinear conjugate gradient method with a strong global convergence property. SIAM J Optim 10(1):177–182
Davis TA (2014) University of Florida sparse matrix collection. University of Florida, Gainesville
Davis TA, Hu Y (2011a) The University of Florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1), article 1 http://www.cise.ufl.edu/research/sparse/matrices/. Accessed October 2016
Davis TA, Hu Y (2016) The University of Florida sparse matrix collection. http://yifanhu.net/PUB/matrices.pdf. Accessed October 2016
Davis TA, Rajamanickam S, Sid-Lakhdar WM (2016) A survey of direct methods for sparse linear systems. Technical report, Department of Computer Science and Engineering, Texas A&M University. http://faculty.cse.tamu.edu/davis/publications.html. Accessed May 2017
Duff IS, Grimes RG, Lewis JG (1989a) Harwell/Boeing collection. http://math.nist.gov/MatrixMarket/collections/hb.html. Accessed October 2016
Duff IS, Grimes RG, Lewis JG (1997) The Rutherford–Boeing sparse matrix collection. Tech report RAL-TR-97-031, Rutherford Appleton Laboratory, Oxon, UK
Duff IS, Grimes RG, Lewis JG (1989b) Sparse matrix test problems. ACM Trans Math Softw 15(1):1–14
El Guennouni A, Jbilou K, Sadok H (2003) A block version of BIGSTAB for linear systems with multiple right-hand sides. ETNA El Trans Numer Anal 16:129-142. https://eudml.org/doc/124803
Fasano G, Roma M (2016) A novel class of approximate inverse preconditioners for large positive definite linear systems in optimization. Comp Optim Appl 65(2):399–429
Felippa CA (2001) A historical outline of matrix structural analysis: a play in three acts. Comp Struct 79(14):1313–1324. doi:10.1016/S0045-7949(01)00025-6
Feng YT, Owen DRJ, Peri D (1995) A block conjugate gradient method applied to linear systems with multiple right-hand sides. Comp Meth Appl Mech Eng 127(1–4):203–215
Fletcher R (1987) Practical methods of optimization, sec edn. Wiley, Chichester
Fletcher R, Reeves CM (1964) Function minimization by conjugate gradients. Comput J 7(2):149–154
Gratton S, Sartemaer A, Ilunga JT (2011) On a class of limited memory preconditioners for large scale linear systems with multiple right-hand sides. SIAM J Optim 21(3):912–935
Hager WW, Zhang H (2005) A new conjugate gradient method with guaranteed descent and an effcient line search. SIAM J Optim 16(1):170–192
Hestenes MR, Stiefel EL (1952) Methods of conjugate gradients for solving linear systems. J Res Nat Bur Stand 49(6):409–436
Inverse problems. In: Wikipedia. https://en.wikipedia.org/wiki/Inverse_problem. Accessed May 2017
Lin L, Yang C, Meza JC, Lu J, Ying L (2011) SelInv-an algorithm for selected inversion of a sparse symmetric matrix. J ACM Trans Math Softw (TOMS) 37(4):40. doi:10.1145/1916461.1916464
McGuire W, Gallagher RH, Ziemian RD (2000) Matrix structural analysis, 2nd edn. Wiley, New York
Ogita T, Oishi S (2012) Accurate and robust inverse Cholesky factorization. Nonlinear Theory Appl IEICE 3(1):103–111
O’Leary DP (2000) Symbiosis between linear algebra and optimization. J Comp Appl Math 123(1–2):447–465
Polak E, Ribiére G (1969) Note sur la convergence de méthods de directions conjugées. Rev Francaise Inf Rech Opér 3(16):35–43
Polyak BT (1969) The conjugate gradient method in extreme problems. USSR Comp Math Math Phys 9(4):94–112
Powell MJD (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comp J 7(2):155–162
Simonicini V, Szyld D (2007) Recent computational developments in Krylov subspace methods for linear systems. Numer Linear Algebra Appl 14(1):1–59
Stachurski A (2012) Quadratic approximations in nonlinear continuous optimization (in polish: Aproksymacje kwadratowe w cigej optymalizacji nieliniowej). Prace Naukowe Politechniki Warszawskiej, Elektronika 184, Publishing House of WUT, Warsaw
Stoer J (1972) 1976) Einfuehrung in die numerische mathematik I. Springer, Berlin
Stoer J, Bulirsch R (1993) Introduction to numerical analysis. Springer, Berlin
Wilkinson JH (1963) Rounding errors in algebraic processes. Prentice-Hall Inc., Englewood Cliffs
Yanagisawa Y, Ogita T, Oishi S (2014) Convergence analysis of an algorithm for accurate inverse Cholesky factorization. Jpn J Ind Appl Math 31(3):461–482. doi:10.1007/s13160-014-0154-4
Yunlan F, Huirao Z, Wuhan XH (1997) Fast inverse Cholesky decomposition for rectangular Toeplitz-block matrix. Univ J Nat Sci 2(4):418–422. doi:10.1007/BF02830245
Zhu H, Chen W, Li B, Gao F (2011) An improved square-root algorithm for V-BLAST based on efficient inverse Cholesky factorization. IEEE Trans Wireless Commun 10(1):43–48
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Stachurski, A. On a conjugate directions method for solving strictly convex QP problem. Math Meth Oper Res 86, 523–548 (2017). https://doi.org/10.1007/s00186-017-0607-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-017-0607-z