1 Introduction

In the paper we consider problem of solving the following unconstrained optimization problem

$$\begin{aligned} \min _{\mathbf{x}\in R^n}\; f(\mathbf{x}) = -\mathbf{b}^T\mathbf{x} + \frac{1}{2}\mathbf{x}^T\mathbf{G}\mathbf{x}. \end{aligned}$$
(1)

Equation (3) corresponds to the first order necessary optimality conditions for problem (1)

$$\begin{aligned} \nabla f(\mathbf{x}) = \mathbf{0}\;\;\;\equiv \;\;\; -\mathbf{b}+\mathbf{G}\mathbf{x}= \mathbf{0}, \end{aligned}$$
(2)

where matrix \(\mathbf{G}\) is symmetric and positive definite.

The problem is equivalent to the problem of finding the solution of the set of linear equations

$$\begin{aligned} \mathbf{G}\mathbf{x} = \mathbf{b}. \end{aligned}$$
(3)

The equivalence of (3) and (1) is easily verified.

There are two main approaches to solve the quadratic problem (1) or the set of linear equations (3). In the first, a sequence of consecutive directional minimizations along conjugate directions (first of all conjugate gradients) is carried out. In the second approach, one first finds the triangular Cholesky decomposition of matrix \(\mathbf{G}\) and next two sets of linear equations with triangular matrices are solved.

The notion of the conjugate directions plays very important role in the solution methods of the QP problem (1) and general unconstrained nonlinear minimization problems. Powell (1964) has proposed the conjugate directions method without calculating the derivatives. The consecutive direction was generated as the difference between the minima found on two parallel straight lines. The most popular are the conjugate gradient methods. The first one was developed by Hestenes and Stiefel (1952), who proposed it for solving symmetric, positive-definite linear algebraic systems. In 1964 the method has been extended to nonlinear problems by Fletcher and Reeves (1964). The procedure is recursive

$$\begin{aligned} \mathbf{x}^{k+1} = \mathbf{x}^k+\alpha ^k\mathbf{d}^k, \end{aligned}$$

where \(\mathbf{d}^0=-\nabla f(\mathbf{x}^0)\) and every consecutive search direction is the linear combination of the previous one and the Cauchy steepest descent direction

$$\begin{aligned} \mathbf{d}^{k+1} = -\nabla f(\mathbf{x}^{k+1}) + \beta ^k\mathbf{d}^k. \end{aligned}$$

It means that starting from an initial guess \(\mathbf{x}^0\) (with corresponding residual \(\mathbf{r}^0 = \nabla f(\mathbf{x}^0) = \mathbf{G}\mathbf{x}^0 -\mathbf{b}\)), this method generates a sequence of iterates \(\left\{ \mathbf{x}^k\right\} \) that minimize f over the Krylov subspaces \(\mathbf{x}^0 + span(\mathbf{r}^0,\mathbf{G}\mathbf{r}^0,\ldots ,\mathbf{G}^{k-1}\mathbf{r}^0\)). Since the publication of the Hestenes and Stiefel (1952), conjugate gradient methods have become an area of intensive research and large number of variants of conjugate gradient algorithms have been designed. Some well-known choices for \(\beta ^k\) are Fletcher and Reeves (1964), Polak and Ribiére (1969), Polyak (1969), Dai and Yuan (1999) and Hestenes and Stiefel (1952):

$$\begin{aligned} \beta ^k_{FR} = \displaystyle \frac{\Vert \mathbf{g}^{k+1}\Vert }{\Vert \mathbf{g}^k\Vert },\;\; \beta ^k_{PRP} = \displaystyle \frac{\left( \mathbf{g}^{k+1}\right) ^T\mathbf{r}_k}{\Vert \mathbf{g}^k\Vert },\;\; \beta ^k_{DY} = \displaystyle \frac{\Vert \mathbf{g}^{k+1}\Vert ^2}{(\mathbf{d}^k)^T\mathbf{r}^k},\;\; \beta ^k_{HS} = \displaystyle \frac{(\mathbf{g}^{k+1})^T\mathbf{r}^k}{(\mathbf{d}^k)^T\mathbf{r}^k} \end{aligned}$$

respectively, where \(\mathbf{g}^k = \nabla f(\mathbf{x}^k)\), \(\mathbf{r}^k = \mathbf{g}^{k+1} \mathbf{g}^k\) and \(\Vert \cdot \Vert \) means the Euclidean norm. Surveys on the developments in that area one may find for instance in Hager and Zhang (2005), Stachurski (2012) and in Dai (2011).

Standard algebraic approach to solve either the set of linear equations (1) or the quadratic problem (3) is two-phase. In phase one, the triangular Cholesky decomposition (published post-mortem by Benoit (1924)) of the hessian matrix is found, i.e., the lower triangular matrix \(\mathbf{L}\) such that \(\mathbf{G} = \mathbf{L}\mathbf{L}^T\). In phase two, two sets of linear equations with triangular matrices are consecutively solved using the forward substitution followed by the backward substitution in order to find the solution of (1)

$$\begin{aligned} \begin{array}{lcl} \mathbf{L}\mathbf{y} &{} = &{} \mathbf{b} \\ \mathbf{L}^T\mathbf{x} &{} = &{} \mathbf{y}. \end{array} \end{aligned}$$
(4)

A closely related approach is to express the hessian as \(\mathbf{G} = \mathbf{L}\mathbf{D}\mathbf{L}^T\), where \(\mathbf{L}\) is lower triangular with ones on the main diagonal and \(\mathbf{D}\) is a diagonal matrix. Similarly as in formula (4), after finding the decomposition, two sets with triangular matrices and one with a diagonal matrix are solved

$$\begin{aligned} \begin{array}{lcl} \mathbf{L}\mathbf{y} &{} = &{} \mathbf{b} \\ \mathbf{D}\mathbf{z} &{} = &{} \mathbf{y} \\ \mathbf{L}^T\mathbf{x} &{} = &{} \mathbf{z}. \end{array} \end{aligned}$$
(5)

A survey of direct methods for sparse linear systems one may find in the Davis et al. (2016) technical report. Problems of type (1,3) are a kind of inverse problems and have many applications. An inverse problem in science is the process of calculating from a set of observations the unknown input values: for instance calculating an image in computer tomography, deblurring the digital picture, reconstructing the source in acoustics or calculating the density of the Earth from measurements of its gravity field (for more information see Wikipedia (2017)). Many of them involve large-scale problems with symmetric, positive definite matrices and multiple right-hand sides.

We observe recently a renewed interest in large, sparse, unconstrained quadratic optimization problems in that context. Currently the research goes into many directions. There are many papers devoted to search for the best preconditioner to the problem, see for instance Gratton et al. (2011), Andrea et al. (2017), Fasano and Roma (2016) and many other. Feng et al. (1995) consider the block conjugate gradient methods for solving linear systems with multiple right-hand sides. Block algorithm is also applied to problems with multiple right-hand sides in El Guennouni et al. (2003) (this time the considered matrices are not symmetric). Some researchers concentrate on finding effectively the solution of the unconstrained quadratic optimization or the associated system of linear equations. For instance in Alléon et al. (2003), Simoncini and Szyld (2007) one may find information on algorithms combining direct methods with the iterative approach. Tasks of the noisy pictures deblurring are formulated frequently in the form of the quadratic programming problems with nonnegativity conditions on variables and regularization, see for instance Chen and Zhou (2010) and references therein. The problem in Chen and Zhou (2010) is essentially quadratic and non smooth due to the regularity term formulated in the \(l_1\)-norm. Another trend is to find directly factorization of the inverse matrix. Let us mention in that context the papers of Yunlan et al. (1997) and Zhu et al. (2011). In Yunlan et al. (1997) and Zhu et al. (2011) the structure of the hessian matrix is deeply exploited. Let us mention also SelInv – the selected inversion algorithm for the selected inversion of a sparse symmetric matrix developed by Lin et al. (2011). In the first stage of SellInv the \(\mathbf{L}\mathbf{L}^T\) matrix decomposition of \(\mathbf{G}\) is calculated and in the second stage selected subsets of entries are found. SelInv requires special structure of the matrix to do it effectively. Electronic structure calculations are among possible applications areas. In our result we start from the notion of conjugate vectors and do not use the structure. It is worth to mention here also the iterative refinement technique introduced by Ogita and Oishi (2012).

The content of the paper may be treated as a result on the junction of two disciplines: optimization and linear algebra. Currently a close relationship between optimization and numerical linear algebra algorithms is observed. As O’Leary (2000) stated

“The efficiency and effectiveness of most optimization algorithms hinges on the numerical linear algebra algorithms that they utilize. Effective linear algebra is crucial to their success, and because of this, optimization applications have motivated fundamental advances in numerical linear algebra.”

Our paper inscribes into that direction. In our opinion it may be of interest to some researchers working in optimization algorithms and linear algebra as well. In our paper, we use the notion of conjugacy frequently exploited in optimization. We propose a non conventional way of using given set of conjugate directions. In that approach n independent simultaneous directional minimizations starting from the same point \(\mathbf{x}^0\) are simultaneously executed and finally the solution is obtained by adding the linear combination of the search directions with combination coefficients equal to the calculated directional step sizes to \(\mathbf{x}^0\). Section 2 contains some basic definitions and properties of the conjugate directions and in consecutive Sect. 3 the main theorem justifying correctness of the proposed procedure using n independent, simultaneous directional searches is formulated and proved. In Sect. 4 the algorithm for recursive generation of the conjugate directions is presented. In Sect. 5 it is shown that the generated conjugate directions form the \(\mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\) decomposition of the inverse of the second order derivative matrix (hessian) – \(\mathbf{G}^{-1}\). Particular values on the main diagonal matrix \(\mathbf{R}\) and what is strongly related the main diagonal entries of the \(\mathbf{D}\) matrix depend on the choice of the parameters controlling the length of the generated vectors. Let us stress that the recursive presentation of the Cholesky factorization presented in the book written by Stoer (1976) (see also Stoer and Bulirsch (1993)) was to some extent an inspiration for us (the same representation was used by Wilkinson (1963) to prove the numerical stability of the Cholesky decomposition). Our method works also recursively, coming out from a subspace of the given dimension to the subspace of dimension larger by one.

The resulting algorithm is two-phase. Its details are discussed in Sect. 6. In the first phase, we generate set of conjugate directions. The second phase consists of performing n independent directional minimizations. It may be in our opinion useful in the distributed calculations and cloud computing. The second phase may be also executed by means of simply applying three consecutive matrix/vector multiplications. Both approaches are compared numerically on the set of 128 different strictly convex, quadratic programming problems of different sizes (space dimension varies from 14 to 2000). They were created with the aid of 64 symmetric, positive definite matrices taken from the Florida University repository Davis and Hu (2016) and two different versions of the \(\mathbf{b}\) vector (see Sect. 7). All those matrices were taken from the real life applications and they are stored in the sparse matrix format.

The computational effort in the second phase in our approach would be much lower than in the standard way of decomposing the matrix \(\mathbf{G}\) followed by the solution of two sets of linear equations with triangular matrices and plus eventually a set with the diagonal matrix. The computational cost of the first phase is identical with the cost of calculating the standard \(\mathbf{L}\mathbf{D}\mathbf{L}^T\) factorization of matrix \(\mathbf{G}\). In the final Sect. 8 some conclusions and corollaries are presented.

2 Some definitions and properties of the conjugate directions

To realize factorization of the inverse \(\mathbf{G}^{-1}\) of the considered symmetric, positive definite matrix \(\mathbf{G}\) we shall construct directions which are conjugate with respect to \(\mathbf{G}\). Formal definition of the \(\mathbf{G}\)–conjugacy is recalled below. Definition of conjugate vectors and proof of their linear independence are taken from Bertsekas (1995).

Definition 1

Nonzero vectors \(\mathbf{d}_1\), \(\mathbf{d}_2\), \(\ldots \), \(\mathbf{d}_k\) are conjugate with respect to symmetric, positive definite matrix \(\mathbf{G}\) if and only if

$$\begin{aligned} \mathbf{d}_i^T\mathbf{G}\mathbf{d}_j = 0,\;\; \forall \; i\ne j \end{aligned}$$

Let us recall that \(\mathbf{G}\)–conjugate directions are linearly independent.

Lemma 1

If \(\mathbf{G}\) is a symmetric, positive definite, \(n\times n\) matrix, vectors \(\mathbf{d}_i\) (for \(i=1, \ldots ,k\le n\) are nonzero and mutually conjugate with respect to \(\mathbf{G}\), then they are linearly independent.

Proof

Let us assume that the thesis is not valid. Then, there exists a convex linear combination of vectors \(\mathbf{d}_i\) equal to \(\mathbf{0}\)

$$\begin{aligned} \sum _{i=1}^k \alpha ^i \mathbf{d}_i = \mathbf{0}\;\;\hbox {and}\;\; \exists \; j,\;\;\hbox {such that}\;\; \alpha ^j \ne 0. \end{aligned}$$

Let us multiply the equation from both sides by \(\mathbf{d}_j^T\mathbf{G}\). Then,

$$\begin{aligned} \sum _{i=1}^k \alpha ^i \mathbf{d}_j\mathbf{G}\mathbf{d}_i = \alpha ^j \mathbf{d}_j\mathbf{G}\mathbf{d}_j = \mathbf{0} \end{aligned}$$

due to the \(\mathbf{G}\)–conjugacy property. From the last equality we deduce that \(\alpha ^j = 0\), because \(\mathbf{d}_j\mathbf{G}\mathbf{d}_j> 0\) by positive definiteness assumption. We obtained the contradiction. That concludes the proof.

3 Solving QP problem by means of simultaneous directional minimizations

The key property used in this section is formulated as the theorem.

Theorem 1

Let vectors \(\mathbf{d}^i\) (for \(i=1, \ldots ,n\)) be a set of directions mutually conjugate with respect to the symmetric, positive definite matrix \(\mathbf{G}\), where \(\mathbf{G}\) is the second order derivative matrix of problem (1). Let us take any point \(\mathbf{x}^0\in R^n\) and execute n exact directional minimization searches along any vector \(\mathbf{d}^i\), i.e., solve the following one-dimensional minimization problems

$$\begin{aligned} \min _{\alpha \ge 0} \; \left\{ \bar{f}_i(\alpha ) = f(\mathbf{x}^0+\alpha \mathbf{d}^i)\right\} ,\;\;\; \forall \; i=1, \ldots ,n \end{aligned}$$
(6)

and their solutions by \(\alpha ^i\) respectively.

Then point \(\bar{\mathbf{x}}\) defined as

$$\begin{aligned} \bar{\mathbf{x}} = \mathbf{x}^0 + \sum _{i=1}^n \alpha ^i \mathbf{d}^i \end{aligned}$$
(7)

is the optimal solution of the quadratic programming problem (1).

Proof

We shall show that \(\mathbf{G}\tilde{\mathbf{x}} - \mathbf{b} = \mathbf{0}\). To prove the thesis it is sufficient to show that the gradient \(\bar{\mathbf{g}} = \nabla f(\bar{\mathbf{x}})\) (first derivative of function f calculated at the point \(\bar{\mathbf{x}}\)) is orthogonal to all vectors \(\mathbf{d}^i\), i.e.,

$$\begin{aligned} \bar{\mathbf{g}}^T\mathbf{d}^i = 0,\;\; \forall \; i=1, \ldots ,n. \end{aligned}$$

Let us select any of the conjugate vectors \(\mathbf{d}^i\) (let it be \(\mathbf{d}^k\)). Then

$$\begin{aligned} \left( \mathbf{d}^k\right) ^T\nabla f(\bar{\mathbf{x}})= & {} \left( \mathbf{d}^k\right) ^T \left[ \mathbf{G}\left( \mathbf{x}^0 + \displaystyle \sum _{i=1}^n\alpha ^i\mathbf{d}^i\right) - \mathbf{b} \right] \\= & {} \left( \mathbf{d}^k\right) ^T \mathbf{G}\mathbf{x}^0 + \displaystyle \sum _{i=1}^n\alpha ^i \left( \mathbf{d}^k\right) ^T\mathbf{G}\mathbf{d}^i - \left( \mathbf{d}^k\right) ^T\mathbf{b}. \end{aligned}$$

The conjugacy property ensures that inside the sum there is only one term with nonzero value, namely for \(i=k\). Therefore after simple recalculations we obtain

$$\begin{aligned} \left( \mathbf{d}^k\right) ^T\nabla f(\bar{\mathbf{x}})= & {} \left( \mathbf{d}^k\right) ^T \mathbf{G}\mathbf{x}^0 + \alpha ^k \left( \mathbf{d}^k\right) ^T\mathbf{G}\mathbf{d}^k - \left( \mathbf{d}^k\right) ^T\mathbf{b} \\&\\= & {} \left( \mathbf{d}^k\right) ^T \left[ \mathbf{G}\left( \mathbf{x}^0 + \alpha ^k \mathbf{d}^k\right) - \mathbf{b}\right] . \end{aligned}$$

Let us notice that

$$\begin{aligned} \bar{f}'_i(\alpha ) = \left( \nabla f(\mathbf{x}_0 + \alpha \mathbf{d}^i)\right) ^T\mathbf{d}^i \end{aligned}$$

by the rule of differentiating composite functions. Hence

$$\begin{aligned} \left( \mathbf{d}^k\right) ^T\nabla f(\bar{\mathbf{x}}) = f^{k}(\alpha ^k) = 0 \end{aligned}$$

due to the exact directional minimization on the straight line defined by the set of points \(\left\{ \mathbf{x}(\alpha )\in R^n\; | \; \mathbf{x}(\alpha ) = \mathbf{x}_0 + \alpha \mathbf{d}^k, \alpha \in R^1 \right\} \).

From Lemma 1 we deduce that vectors \(\mathbf{d}^i\) (\(i=1, \ldots ,n\)) form the basis of \(R^n\). We proved that the derivative \(\nabla f(\bar{\mathbf{x}})\) is orthogonal to all vectors from that basis. Therefore, it should be equal to \(\mathbf{0}\).

Similar theorem may be found in Fletcher (1987). However, he considered it in the essentially sequential way – directional search into one direction, followed by the move to the new point, then another directional minimization from the second point followed by the move to the third point and so on. Fletcher’s theorem is the basis for the conjugate directions methods in optimization. Here, we consider n independent directional minimizations starting all of them from the same point \(\mathbf{x}^0\).

Theorem 1 suggests an obvious numerical procedure to solve a quadratic programming problem (1) with a symmetric, positive definite matrix \(\mathbf{G}\) in a computational cloud or on a machine with many parallel processors. We may simply run simultaneously n independent directional minimizations starting from the same point \(\mathbf{x}^0\). The crucial problem is the effective generation of the conjugate directions. A recursive procedure of their generation is proposed in the consecutive section.

4 Recursive generation of conjugate directions

We shall start with the vector having only one nonzero entry, namely the first one (for instance equal to 1) and every consecutive vector will have one nonzero entry more than the previous one, i.e., k-th vector will have first k entries possibly not equal to zero, the rest will assume zero value. Furthermore every \(k+1\) vector should be conjugate to all previous vectors with respect to matrix \(\mathbf{G}\).

So, let us assume now that the starting vector \(\mathbf{d}^1=\left[ c, 0, \ldots ,0\right] ^T\), where \(c>0\) is a positive constant, and try to find consecutive vector \(\mathbf{d}^2\) verifying the following requirements:

  • \(\mathbf{d}^2\) is conjugate to \(\mathbf{d}^1\) with respect to \(\mathbf{G}\),

  • only two first entries of \(\mathbf{d}^2\) may have nonzero values, all others are equal to 0, i.e., \(\mathbf{d}^2 = \left[ d_1, d_2, 0, \ldots , 0\right] ^T\).

Then

$$\begin{aligned} \left( \mathbf{d}^1\right) ^T\mathbf{G}\mathbf{d}^1 = c^2G_{1,1} = D_1 \end{aligned}$$

Defining vector \(\mathbf{d}^2\) as a vector with two nonzero entries as follows

$$\begin{aligned} \mathbf{d}^2 = \left[ d^2_1, d^2_2, 0,\ldots ,0\right] ^T, \end{aligned}$$

we shall have the following situation

$$\begin{aligned} \left[ \mathbf{d}^1 \right] = \left[ \begin{array}{c} d_1^1 \\ 0 \\ \vdots \\ 0 \end{array} \right] \;\;\hbox {and}\;\; \mathbf{d}^2 = \left[ \begin{array}{c} d_1^2 \\ d_2^2 \\ 0 \\ \vdots \\ 0 \end{array} \right] . \end{aligned}$$

Vector \(\mathbf{d}^2\) should be \(\mathbf{G}\) conjugate with \(\mathbf{d}^1\), i.e.,

$$\begin{aligned} \left( \mathbf{d}^1\right) ^T\mathbf{G}\mathbf{d}^2 = 0. \end{aligned}$$

Let us calculate appropriate vector \(\mathbf{d}^2\). Due to the choice of \(\mathbf{d}^1\) and \(\mathbf{d}^2\) we have

$$\begin{aligned}&\begin{array}{c} \left[ \mathbf{d}^1\right] ^T\mathbf{G}\mathbf{d}^2 = \left[ \begin{array}{ccccc} d_1^1 &{} 0 &{} 0 &{} \ldots &{} 0 \end{array} \right] \left[ \begin{array}{ccccc} G_{1,1} &{} G_{1,2} &{} G_{1,3} &{} \ldots &{} G_{1,n} \\ G_{2,1} &{} G_{2,2} &{} G_{2,3} &{} \ldots &{} G_{2,n} \\ G_{3,1} &{} G_{3,2} &{} G_{3,3} &{} \ldots &{} G_{3,n} \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ G_{n,1} &{} G_{n,2} &{} G_{n,3} &{} \ldots &{} G_{n,n} \end{array} \right] \left[ \begin{array}{c} d_1^2 \\ d_2^2 \\ 0 \\ \vdots \\ 0 \end{array} \right] \\ \\ = \left[ \begin{array}{cc} d_1^1 &{} 0 \end{array} \right] \left[ \begin{array}{cc} G_{1,1} &{} G_{1,2} \\ G_{2,1} &{} G_{2,2} \end{array} \right] \left[ \begin{array}{c} d_1^2 \\ d_2^2 \\ \end{array} \right] . \end{array} \end{aligned}$$

Now, simple matrix calculations lead to

$$\begin{aligned} \left[ \mathbf{d}^1\right] ^T\mathbf{G}\mathbf{d}^2 = \left[ \begin{array}{ll} d_1^1G_{1,1}&d_1^1G_{1,2} \end{array} \right] \left[ \begin{array}{c} d_1^2 \\ d_2^2 \\ \end{array} \right] = d_1^1G_{1,1}d_1^2 + d_1^1G_{1,2}d_2^2. \end{aligned}$$

In order to preserve the conjugacy property this quantity should be equal to 0

$$\begin{aligned} d_1^1G_{1,1}d_1^2 + d_1^1G_{1,2}d_2^2 = 0. \end{aligned}$$

and therefore coefficients of vector \(\mathbf{d}^2\) should verify the equality

$$\begin{aligned} G_{1,1}d_1^2 + G_{1,2}d_2^2 = 0. \end{aligned}$$
(8)

We’ve got one degree of freedom in Eq. (8). We may take for instance \(d_2^2 = 1\) and solve Eq. (8) with respect to \(d_1^2\).

In what follows we shall derive similar representation for every k. In step number \(k>1\) we permit that only the first k elements of the new direction \(\mathbf{d}^k\) may assume nonzero values, all others are set to 0. Furthermore, we preserve the conjugacy with respect to \(\mathbf{G}\) to all previous directions \(\mathbf{d}^1, \ldots ,\mathbf{d}^{k-1}\) where in every vector \(\mathbf{d}^i\) (\(i<k\)) only the first i coefficients may assume nonzero values.

Let us assume the following block representation of matrix \(\mathbf{G}\)

$$\begin{aligned} \mathbf{G} = \left[ \begin{array}{ll} \mathbf{G}_k &{} \mathbf{G}_r \\ \mathbf{G}_r^T &{} \mathbf{G}_{n-k} \end{array} \right] ,\;\; \mathbf{G} = \left[ \begin{array}{ll} \mathbf{G}_{k-1} &{} \mathbf{g} \\ \mathbf{g}^T &{} a \\ \mathbf{G}_r^T &{} \end{array} \begin{array}{l} \mathbf{G}_r \\ \\ \mathbf{G}_{n-k} \end{array} \right] \end{aligned}$$
(9)

and let \(\mathbf{d}^k\) has the following structure

$$\begin{aligned} \mathbf{d}^k = \left[ \begin{array}{c} \bar{\mathbf{d}}^k \\ \mathbf{0} \end{array} \right] , \;\;\hbox {where}\;\;\bar{\mathbf{d}}^k = \left[ \begin{array}{c} \mathbf{d} \\ c \end{array} \right] \end{aligned}$$
(10)

with \(\mathbf{d}\) being a column vector of size \(k-1\) and c is a scalar.

Similarly \(\mathbf{d}^i\) (\(i<k\)) may be represented as

$$\begin{aligned} \mathbf{d}^i = \left[ \begin{array}{c} \bar{\mathbf{d}}^i \\ \mathbf{0} \end{array} \right] , \end{aligned}$$
(11)

where \(\bar{\mathbf{d}}^i\) belongs to the space .

The conjugacy requirement due to the splitting formulated in Eqs. (911) may be written as

$$\begin{aligned} \left( \mathbf{d}^k\right) ^T\mathbf{G}\mathbf{d}^i = \left[ \mathbf{d}^T, c\right] \left[ \begin{array}{cc} \mathbf{G}_{k-1} &{} \mathbf{g} \\ \mathbf{g}^T &{} a \end{array} \right] \left[ \begin{array}{c} \bar{\mathbf{d}}^i \\ 0 \end{array} \right] = 0, \;\;\forall \; i<k. \end{aligned}$$
(12)

Hence, the considerations may be restricted to the k dimensional space and conjugacy with respect to the \(k\times k\) principal submatrix \(\mathbf{G}_k\). Applying (9) again we shall arrive to the following equivalent form of (12)

$$\begin{aligned} \left( \mathbf{d}^k\right) ^T\mathbf{G}\mathbf{d}^i = \left[ \mathbf{d}^T, c\right] \left[ \begin{array}{c} \mathbf{G}_{k-1}\bar{\mathbf{d}}^i \\ \mathbf{g}^T\bar{\mathbf{d}}^i \end{array} \right] = \mathbf{d}^T\mathbf{G}_{k-1}\bar{\mathbf{d}}^i +c \mathbf{g}^T\bar{\mathbf{d}}^i = 0, \;\;\forall \; i<k. \end{aligned}$$
(13)

Vector \(\mathbf{d}\) belongs to the space spanned on vectors \(\mathbf{d}^1, \ldots ,\mathbf{d}^{k-1}\). Therefore it may be expressed as their linear combination

$$\begin{aligned} \mathbf{d} = \sum _{j=1}^{k-1} \tau _j\bar{\mathbf{d}}^j. \end{aligned}$$

Let us put the above expression of \(\mathbf{d}\) to (13) and calculate coefficients \(\tau _j\) which will ensure verification of the conjugacy condition by vector \(\mathbf{d}^k\)

$$\begin{aligned} \begin{array}{rcl} \left( \mathbf{d}^k\right) ^T\mathbf{G}\mathbf{d}^i &{} = &{} -\displaystyle \sum _{j=1}^{k-1} \tau _j\left[ \bar{\mathbf{d}}^j\right] ^T\mathbf{G}_{k-1}\bar{\mathbf{d}}^i +c \mathbf{g}^T\bar{\mathbf{d}}^i \\ &{} = &{} \tau _i\left[ \bar{\mathbf{d}}^i\right] ^T\mathbf{G}_{k-1}\bar{\mathbf{d}}^i +c \mathbf{g}^T\bar{\mathbf{d}}^i \\ &{} = &{} \tau _i D_i + c \mathbf{g}^T\bar{\mathbf{d}}^i = 0 \end{array} \;\;\; \forall \; i<k, \end{aligned}$$
(14)

where \(D_i = \left[ \bar{\mathbf{d}}^i\right] ^T\mathbf{G}_{k-1}\bar{\mathbf{d}}^i\) for each \(i<k\). Now, formula (14) results in the following values of coefficients

$$\begin{aligned} \tau _i = -c D_i^{-1}\left( \bar{\mathbf{d}}^i\right) ^T\mathbf{g} \end{aligned}$$
(15)

and finally we obtain the formula specifying new vector

$$\begin{aligned} \mathbf{d} = -\sum _{j=1}^{k-1} cD_j^{-1}\left( \bar{\mathbf{d}}^j\right) ^T\mathbf{g}\bar{\mathbf{d}}^j = -\sum _{j=1}^{k-1} c \bar{\mathbf{d}}^j D_j^{-1}\left( \bar{\mathbf{d}}^j\right) ^T\mathbf{g}. \end{aligned}$$
(16)

Hence, we may formulate the following algorithm for generation of vectors mutually conjugate with respect to matrix \(\mathbf{G}\).

Algorithm I

[Initialization] Set \(k=1\). Select c, for instance \(c=1\). Build vector

$$\begin{aligned} \mathbf{d}^1 = \left[ \begin{array}{c} c \\ \mathbf{0} \end{array} \right] . \end{aligned}$$

Calculate \(D_1 = \left( \mathbf{d}^1\right) ^TG_{11}\mathbf{d}^1\).

[Loop]

Increase the dimension, set k = k+1.

while (k < n)

\(\{\)

  1. 1.

    Calculate vector \(\mathbf{d} = -\sum _{j=1}^{k-1} c \bar{\mathbf{d}}^j D_j^{-1}\left( \bar{\mathbf{d}}^j\right) ^T\mathbf{g}\).

  2. 2.

    Build vector \(\mathbf{d}^k\) as follows \(\left( \mathbf{d}^k\right) ^T = \left[ \mathbf{d}^T,\;c, \; \mathbf{0}\right] \).

  3. 3.

    Calculate \(\mathbf{D}_k = \left( \mathbf{d}^k\right) ^TG_{kk}\mathbf{d}^k\).

  4. 4.

    Set \(k = k+1\).

\(\}\)

The induction analysis presented above proves the following lemma

Lemma 2

Vectors \(\mathbf{d}^1\), \(\mathbf{d}^2\), \(\ldots \), \(\mathbf{d}^n\) generated by Algorithm I are mutually conjugate with respect to matrix \(\mathbf{G}\).

4.1 Matrix form of the algorithm for generating conjugate directions

Algorithm presented above may be written in the matrix form.

Let us denote by \(\mathbf{R}^i\) (\(1 \le i \le k-1\)) the rectangular matrix with columns equal to consecutive \(\mathbf{d}^j\) (\(1\le j\le i\)) vectors and \(\bar{\mathbf{R}}^i\) (\(1 \le i \le k-1\)) the rectangular matrix with columns equal to consecutive \(\bar{\mathbf{d}}^j\) (\(1\le j\le i\)), i.e.,

$$\begin{aligned} \mathbf{R}^i = \left[ \begin{array}{lcl} \mathbf{d}^1&\ldots&\mathbf{d}^i \end{array} \right] \;\;\hbox {and} \;\;\bar{\mathbf{R}}^i = \left[ \begin{array}{lcl} \bar{\mathbf{d}}^1&\ldots&\bar{\mathbf{d}}^i \end{array} \right] . \end{aligned}$$

Due to their construction

$$\begin{aligned} \mathbf{R}^i = \left[ \begin{array}{c} \bar{\mathbf{R}}^i \\ \mathbf{0} \end{array} \right] \end{aligned}$$

and

$$\begin{aligned} \left( \mathbf{R}^k\right) ^T\mathbf{G}\mathbf{R}^k = \left( \bar{\mathbf{R}}^k\right) ^T\mathbf{G}_k\bar{\mathbf{R}}^k = \mathbf{D}_k = diag\{ D_1, D_2, \ldots , D_k\}. \end{aligned}$$

Taking the above relation into account we may express formula (16) in the following matrix form

$$\begin{aligned} \mathbf{d} = -c \bar{\mathbf{R}}^{k-1} \mathbf{D}_{k-1}^{-1}\left( \bar{\mathbf{R}}^{k-1}\right) ^T\mathbf{g}. \end{aligned}$$
(17)

Let us introduce the following notation

$$\begin{aligned} \mathbf{L} = \left( \bar{\mathbf{R}}^{k-1}\right) ^T. \end{aligned}$$

Then formula (17) may be expressed as

$$\begin{aligned} \mathbf{d} = -c \mathbf{L}^T \mathbf{D}_{k-1}^{-1}\mathbf{L}\mathbf{g}. \end{aligned}$$
(18)

Below is the formal formulation of the pseudo code of the matrix form of Algorithm I.

Algorithm II

[Initialization] Set \(k=1\). Set \(\mathbf{L}^1 = 1\), select c, for instance \(c=1\).

[Loop]

while (k < n)

\(\{\)

  1. 1.

    Calculate \(\mathbf{D} = \mathbf{L}^k\mathbf{G}_k\left( \mathbf{L}^k\right) ^T\).

  2. 2.

    Increase the dimension by 1; set \(k = k+1\). Calculate vector \(\mathbf{d}\) using formula (18).

  3. 3.

    Build new matrix \(\mathbf{L}^k\) as follows

    $$\begin{aligned}\mathbf{L}^k = \left[ \begin{array}{ll} \mathbf{L}^{k-1} &{} \mathbf{0} \\ \mathbf{d}^T &{} c \end{array} \right] . \end{aligned}$$

\(\}\)

To calculate vector \(\mathbf{d}\) we should realize the matrix/vector multiplication, followed by the eventual multiplication of each component by scalar c and consecutive division by the main-diagonal entry of the diagonal matrix \(\mathbf{D}\). The last two operations can be realized simultaneously.

New lower triangular matrix \(\mathbf{L}^k\) is augmented at each step by the addition of a new row representing transposed vector \(\bar{\mathbf{d}}^k\) and a new column containing exclusively entries equal to 0 above the main diagonal.

5 Generated directions and their relation with the factorization of the inverse matrix

Last matrix \(\mathbf{L}^n\) contains n-rows being transpositions of n mutually \(\mathbf{G}\)-conjugate vectors and verifies the following matrix relation

$$\begin{aligned} \mathbf{L}^n\mathbf{G}\left[ \mathbf{L}^n\right] ^T = \mathbf{D}^n, \end{aligned}$$
(19)

where \(\mathbf{L}^n\) is lower triangular and \(\mathbf{D}^n\) is the diagonal matrix.

Below we shall show that Algorithm II produces the LDL factorization of the inverse of \(\mathbf{G}\).

Lemma 3

The following equality holds

$$\begin{aligned} \mathbf{G}^{-1} = \left[ \mathbf{L}^n\right] ^T\left[ \mathbf{D}^n\right] ^{-1}\mathbf{L}^n = \mathbf{R}^n\left[ \mathbf{D}^n\right] ^{-1}\left[ \mathbf{R}^n\right] ^T. \end{aligned}$$
(20)

Proof

Taking the inverses in Eq. (19) we obtain

$$\begin{aligned} \left[ \mathbf{L}^n\right] ^{-T}\mathbf{G}^{-1}\left[ \mathbf{L}^n\right] ^{-1} = \left[ \mathbf{D}^n\right] ^{-1}. \end{aligned}$$
(21)

Multiplying (21) from the left-hand side by \(\left[ \mathbf{L}^n\right] ^T\) and by \(\mathbf{L}^n\) from the right-hand side we shall obtain the desired expression (20) of the inverse of \(\mathbf{G}\).

It means that the proposed algorithm produces directly an \(\mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\) factorization of the inverse matrix \(\mathbf{G}^{-1}\). It is easily observed that the same statement is valid for the intermediate set of generated vectors.

Lemma 4

The following equality holds for each \(1\le k\le n\)

$$\begin{aligned} \mathbf{G}_{kk}^{-1} = \left[ \mathbf{L}^k\right] ^T\left[ \mathbf{D}^k\right] ^{-1}\mathbf{L}^k = \mathbf{R}^k\left[ \mathbf{D}^k\right] ^{-1}\left[ \mathbf{R}^k\right] ^T. \end{aligned}$$
(22)

Deeper analysis and stability issues of this approach to obtain directly the factorization of the inverse matrix will be pursued elsewhere.

6 Two-phase algorithm

A two-phase algorithm is proposed on the basis of the above theoretical considerations. In the first phase we calculate the conjugate directions. For their calculation we may apply either Algorithm I or Algorithm II formulated above. As follows from the above discussion the proposed algorithm is recursive. We start with the space of dimension equal to 1 and increase it by one at each step. It is also important in this procedure that the only algebraic operations involved are the matrix/vector multiplication and division or multiplication of the matrix entries by an appropriate number.

It is possible to reduce further the computational effort, if we restrict calculation of the entries of the diagonal matrix \(\mathbf{D}\) in step 1 to the last entry on the main diagonal, i.e., \(D_{kk}\). All other entries on the main diagonal are the same as in the previous iteration.

In the second phase the optimal solution of the QP (quadratic programming) problem (1) is found. After closing calculation of the conjugate gradients it is clear, whether the QP problem has the optimal solution or it is unbounded. So, let us now assume that the QP problem is bounded from below.

The algorithm executing n independent simultaneous directional minimizations is presented in Sect. 6.1. It is based on theorem 1.

Second possibility presented in Sect. 6.2 is to solve the stationarity condition. By Lemma 4 in Sect. 5 we know that as a byproduct of the conjugate gradient generation we obtain the factorization of the inverse hessian. Therefore we can find the solution of the stationarity condition (3) by simple matrix/vector multiplications. This approach is discussed in the Sect. 6.2.

6.1 Finding the solution of the QP problem

The key property used in this subsection is the result of Theorem 1.

Similar theorem may be found in Fletcher (1987). However, he considered it in the essentially sequential way – directional search into one direction, followed by the move to the new second point, then another directional minimization from the second point followed by the move to the third point and so on. Fletcher’s theorem is the basis for the conjugate directions methods in nonlinear unconstrained optimization. Here, we consider n independent directional minimizations starting all of them from the same point \(\mathbf{x}^0\).

Theorem 1 suggests an obvious numerical procedure to find the minimum of the quadratic function (1) with a symmetric, positive definite matrix \(\mathbf{G}\) in a computational cloud or on a machine with many parallel processors. We may simply run simultaneously n independent directional minimizations starting from the same point \(\mathbf{x}^0\).

Algorithm III

  1. A0.

    Select the starting point \(\mathbf{x}^0\). \(\mathbf{x}^0=\mathbf{0}\) may be a suitable choice, because then \(\nabla f(\mathbf{x}^0) = -\mathbf{b}\).

  2. A1.

    Execute n independent directional minimizations, i.e., find solutions to all problems

    $$\begin{aligned} \min _\alpha \;\; \left\{ \bar{f}^i(\alpha ) = f(\mathbf{x}^0 + \alpha \mathbf{d}^i)\right\} , \;\; \forall \; i=1, \ldots ,n. \end{aligned}$$
    (23)
  3. A2.

    Calculate the solution of the problem using the formula

    $$\begin{aligned} \bar{\mathbf{x}} = \mathbf{x}^0 + \sum _{i=1}^n \alpha ^i \mathbf{d}^i. \end{aligned}$$
    (24)

Function f in formula (23) is quadratic (see formula (1)). Therefore, each directional step size \(\alpha ^i\) is the stationary point of the derivative of function \(\bar{f}^i\), which is minimized in Step A1. Hence the step size verifies the following equation

$$\begin{aligned} \frac{d\bar{f}^i}{d\alpha }(\alpha ^i) = (\mathbf{G}(\mathbf{x}^0+\alpha ^i\mathbf{d}^I) - \mathbf{b})^T\mathbf{d}^i = 0. \end{aligned}$$
(25)

Simple arithmetic operations on Eq. (25) leads to the following expression of \(\alpha ^i\)

$$\begin{aligned} \alpha ^i = - \frac{(\mathbf{G}\mathbf{x}^0 - \mathbf{b})^T\mathbf{d}^i}{\left( \mathbf{d}^i\right) ^T\mathbf{G}\mathbf{d}^i} = - \frac{\left( \left( \nabla f(\mathbf{x}^0\right) \right) ^T\mathbf{d}^i}{\left( \mathbf{d}^i\right) ^T\mathbf{G}\mathbf{d}^i}. \end{aligned}$$
(26)

What is important, derivative \(\nabla f(\mathbf{x}^0)\) may be calculated once. The same value is used for every step i. We need the following data: \(\nabla f(\mathbf{x}^0)\), \(\mathbf{d}^i\) and \(\mathbf{G}\) to calculate \(\alpha ^i\).

6.2 Finding the solution of the set of equations

Conjugacy condition may be equivalently written as follows

$$\begin{aligned} \mathbf{R}^T\mathbf{G}\mathbf{R} = \mathbf{D}. \end{aligned}$$
(27)

Multiplying Eq. (27) from the left-hand side by the inverse of \(\mathbf{R}^T\) and from the right-hand side by \(\mathbf{R}^{-1}\) we shall obtain

$$\begin{aligned} \mathbf{G} = \mathbf{R}^{-T}\mathbf{D}\mathbf{R}^{-1}. \end{aligned}$$
(28)

Now application of the commonly known rule of calculating the inverse of the product of square nonsingular matrices yields

$$\begin{aligned} \mathbf{G}^{-1} = \mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T. \end{aligned}$$
(29)
Table 1 Problems of size from 14 to 500
Table 2 Problems of size from 500 to 2000

Let us apply formula (29) to equation

$$\begin{aligned} \mathbf{G}\mathbf{x} = \mathbf{b}. \end{aligned}$$

This results in

$$\begin{aligned} \mathbf{x} = \mathbf{G}^{-1}\mathbf{b} = \mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\mathbf{b}. \end{aligned}$$
(30)

In view of equality (30) it is easy to distribute the calculation of \(\mathbf{x}\) verifying Eq. (1). It involves exclusively matrix/vector multiplications and calculation of the image of the matrix \(\mathbf{D}^{-1}\) on some appropriate vector. Matrix \(\mathbf{D}\) is diagonal. Hence multiplication by \(\mathbf{D}^{-1}\) requires only n arithmetic divisions. There are efficient methods to realize matrix/vector multiplication effectively, see for instance Bertsekas and Tsitsiklis (1989).

Table 3 Results for problems with size from 14 to 500 (all components of \(\mathbf{b}\) are equal to 1)
Table 4 Results for problems with size from 501 to 2000 (all components of \(\mathbf{b}\) are equal to 1)

7 Numerical experiments

The sample matrix test problems from the repository of the University of Florida were used for the numerical verification of the proposed algorithm (description may be found in Davis and Hu (2011a)). The repository is available on-line and maintained by Davis and Hu (2016). We restricted calculations to the subset of symmetric, positive definite matrices from the repository. They are stored in three different formats: Rutherford-Boeing format (see Duff et al. (1989a, b, 1997)), in the Matrix Market format Boisvert et al. (1997a) and Boisvert et al. (1997b) and the MATLAB format. In the MATLAB format used by us, the problem set is a single MATLAB variable of struct type, stored in a single MAT-file.

Table 5 Results for problems with size from 14 to 500 (all components of \(\mathbf{x}\) are equal to 1)
Table 6 Results for problems with size from 501 to 2000 (all components of \(\mathbf{x}\) are equal to 1)

Tables 1 and 2 contain information about the test problems. We selected from the repository all problems with symmetric, positive definite matrices of size between 0 and 2000. Many of them are structural problems, where matrix \(\mathbf{G}\) is the so-called stiffness matrix of the structure and is usually symmetric and positive definite. It arises as the result of the finite element discretization of the problem. The details may be found for instance in Felippa (2001) or McGuire et al. (2000).

The computational results are presented in Tables 36. The quantities defined below do appear in particular columns:

figure a
figure b
Table 7 Results for problems with size from 14 to 500 (application of classical Cholesky factorization)

Tables 3 and 4 contain the results for problems with the right-hand side vector \(\mathbf{b}=[1,1, \ldots ,1]^T\). Unfortunately, almost all problems in the repository does not contain vector \(\mathbf{b}\).

Table 8 Results for problems with size from 501 to 2000 (application of classical Cholesky factorization)

Tables 5 and 6 contain results for problems with the same matrices \(\mathbf{G}\) as above. The only difference is that the solution vector of the linear system is assumed to have all components equal to one (\(\hat{\mathbf{x}}_i = 1, \;\; \forall i\)) and vector \(\mathbf{b} = \mathbf{G}\hat{\mathbf{x}}\) respectively. We omitted the factorization accuracy and the decomposition times. They are the same as in Tables 3 and 4 presented above.

Below, in Tables 7 and 8 we present the results obtained by means of the standard Cholesky decomposition of matrix \(\mathbf{G} = \mathbf{L}\mathbf{L}^T\). Having this factorization available two sets of linear equations with triangular matrices are solved. The first one by means of the forward substitution, the second by means of the backward substitution. Those calculations were suggested by one of the reviewers for comparison purposes. Some quantities stored in columns of Tables 7 and 8 have different meaning than in previous tables. Therefore, they are redefined below.

figure c
figure d

The following observations can be drawn on the basis of the results presented in the paper

  • Factorization of the inverse \(\mathbf{G}^{-1}\) is calculated without decomposition of \(\mathbf{G}\).

  • Decomposition of the inverse matrix involves exclusively matrix/vector multiplication, calculation of some scalar products and calculation of the image of an inverse of a diagonal matrix on a vector (it is equivalent to the division of vector components by the main diagonal entries of the matrix).

  • Second stage is easily distributed, what may be useful in cloud computing.

  • Presented results of sequential calculations seem to be encouraging.

  • They failed only on problems: bcsstk20, bcsstk19, bcsstk11, bcsstk12, ex3, ex33, plat1919. Let us stress, that we haven’t used any scaling. Further numerical experimentation is necessary.

  • Variant with the matrix/vector operations was usually faster than the n simultaneous directional minimizations. The accuracy was exactly the same (all digits of the solution and the forward and backward errors were equal).

  • Standard approach (Cholesky decomposition of matrix \(\mathbf{G}\) followed by solving two sets of linear equations with triangular matrices was usually more robust. In any case, when the method proposed in the paper succeeded, the results were similar. The accuracy was sometimes slightly better in our approach, sometimes in the standard one. In cases, when our approach failed, the standard approach generated solutions with better quality indicators. However, they were not fully satisfactory. This comparison is not fully fair, because we compare the results of factorization \(\mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\) of \(\mathbf{G}^{-1}\) with the \(\mathbf{L}\mathbf{L}^T\) decomposition of \(\mathbf{G}\).

8 Conclusions

In the paper we investigated the two-stage approach to solve strictly convex quadratic programming problems. In the first stage we generate the set of vectors mutually conjugate with respect to the hessian of the QP problem. It is followed in the second stage by n independent directional minimizations executed concurrently from the same starting point \(\mathbf{x}^0\). We think that it may be useful for distributed calculations in cloud computing.

Additionally we find in that way a factorization of the inverse matrix \(\mathbf{G}^{-1}\) without calculating factorization of \(\mathbf{G}\) itself. It has the form \(\mathbf{R}\mathbf{D}^{-1}\mathbf{R}^T\), where columns of matrix \(\mathbf{R}\) are the conjugate vectors found by the first stage algorithm. Knowledge of such inverse factorization may be very useful for problems both from optimization (when we have a family of QP problems with different linear terms) and linear algebra problems (with multiple right-hand sides). Our approach may be also used in conjunction with the Bientinesi et al. (2008) one sweep procedure to calculate the inverse of a symmetric, positive definite matrix. It may be applied to find the factorization of the particular Schur block components.

The first stage is essentially sequential. But our second idea of applying n independent directional minimizations of the associated quadratic programming problem permits distribution of calculations in the second stage. It should be stressed that the solution accuracy of two algorithms tested in the second stage was exactly the same. All digits in the reported errors were equal.

There are still some open questions which require further investigation. The first one is, what are the classes of sparse problems, where the approach presented in the paper will be better than the standard factorization of \(\mathbf{G}\) followed by the solution of two sets of linear equations with triangular matrices. Second is how to apply those ideas to the problems with the block structure. May be the fact that columns of factor \(\mathbf{L}\) in the decompositions \(\mathbf{G} = \mathbf{L}\mathbf{D}\mathbf{L}^T\) or \(\mathbf{G} = \mathbf{L}\mathbf{L}^T\) are mutually conjugate with respect to \(\mathbf{G}^{-1}\) would serve as a hint. Another possibility is to combine the block approach of Bientinesi et al. (2008) with our approach applied to particular blocks. Factorization generated by means of our approach may be also used as the starting one in the refinement technique studied by Ogita and Oishi (2012) and Yanagisawa et al. (2014).

The reported computational results should be treated as preliminary. There are of course some issues which should be clarified. The most important are: identification of the suitable sparsity structure and selection of the appropriate values of parameter c. We are not restricted to the choice of \(c=1\), which was used in our test calculations. This may be especially important in establishing the numerical stability of the method. To solve that task the most promising idea would be the normalization of the consecutive conjugate vectors.