Introduction

ABS methods have been used broadly for solving linear and nonlinear systems of equations comprising large number of constraints and variables. In addition, ABS methods provide a unification of the field of finitely terminating methods for the solution of linear systems of equations. ABS methods were introduced by Abaffy, Broyden, and Spedicato initially for solving a determined or underdetermined linear system and later extended for linear least squares, nonlinear equations, optimization problems, Diophantine equations, and fuzzy linear systems [1, 11, 13]. These extended ABS algorithms offer some new approaches that are better than classical ones under several respects. In addition, extensive computational experience has shown tNow, since two new equations are hat ABS methods are implementable in a stable way, being often more accurate than the corresponding initial algorithm. ABS methods can be more effective than some of the other traditional methods. See more about ABS algorithms in [15,16,17,18]. A review of ABS algorithms is observed in [19]. The basic ABS algorithm works on a system of the form:

$$\begin{aligned} Ax=b, \end{aligned}$$
(1)

where \(A=[a_1,\ldots ,a_m]^{\rm T}, a_{i} \in \mathbb {R}^{n}, 1\le i\le m, x \in \mathbb {R}^{n}, b \in \mathbb {R}^{m}, m\le n.\)

There is another notation for (1) as the following form:

$$\begin{aligned} A^{\rm T}x=b, \end{aligned}$$
(2)

where \(A^{\rm T}=[a_1,\ldots ,a_m]^{\rm T}, a_{i} \in \mathbb {R}^{n}, 1\le i\le m, x \in \mathbb {R}^{n}, b \in \mathbb {R}^{m} , m\le n.\)

Notice that systems (1) and (2) are equivalent. Matrix computations are presented in [14]. The basic ABS methods determine solution of the above systems or signify lack of its existence in at most m iterates. Amini et al. proposed two-step ABS algorithms for the general solution of full row rank linear systems of equations [5, 6]. The purpose of our paper is to present two new extended two-step ABS models and study on analysis of error propagation for them. The structure of this paper is organized as follows: Sect. 2 is devoted to the construction of a new two-step ABS model for solving general solution of full row rank linear systems of equations. Rank reducing process is done in two phases, for per iterate. The first phase helps us to have a solution for the ith iteration and the next phase leads to compute general solution of that iteration. In addition, we state and prove related theorems. We present our first two-step ABS algorithm for the general solution of full row rank linear systems of equations in this section. In Sect. 3, to compress required space, we present our second algorithm, such that the number of rows of the Abaffian matrix is reduced by two for per iterate. Thus, considering matrix multiplication, related parameters are selected from proper dimensions. In Sect. 4, we investigate the stability by the backward error analysis techniques due to Broyden’s backward error analysis and Galantai’s method. The errors in the steps are determined in the terms of projectors constructable from the conjugate directions. A class of methods having optimal stability characteristics is defined. Computational complexity and numerical results are discussed in Sect. 5, in detail. In Sect. 6, we summarize our achievements.

A new class of two-step ABS model

The basic ABS algorithm starts with an initial vector \(x_{0} \in \mathbb R^{n}\) and a nonsingular matrix \(H_{0} \in \mathbb R^{n\times n}\) (Spedicato’s parameter). Given that \(x_{i}\) is a solution of the first i equations, the ABS algorithm computes \(x_{i}\) as the solution of the first \(i+1\) equations as the following steps [1]:

  1. 1.

    Determine \(z_{i}\) (Broyden’s parameter), so that \(z_{i}^{\rm T}H_{i}a_{i}\ne 0\) and set \(p_{i}=H_{i}^{\rm T}z_{i}.\)

  2. 2.

    Update the solution by \(x_{i+1}=x_{i}+\alpha _{i}p_{i}\), where the stepsize \(\alpha _{i}\) is given by \(\alpha _{i}=\frac{b_{i}-a_{i}^{\rm T}x_{i}}{a_{i}^{\rm T}p_{i}}.\)

  3. 3.

    Update the Abaffian matrix \(H_{i}\) by \(H_{i+1}=H_{i}-\frac{H_{i}a_{i}w_{i}^{\rm T}H_{i}}{w_{i}^{\rm T}H_{i}a_{i}},\) where \(w_{i}\in \mathbb {R}^n\) (Abaffy’s parameter) is chosen, so that \(w_{i}^{\rm T}H_{i}a_{i}\ne 0\).

Here, we are motivated to study on a method that satisfies two new equations at a time. We consider the system (1) under the assumption that A is full rank in row, i.e., rank(A) = m and \(m\le n\). Suppose that \(m=2l\) (if m is odd, we can add a trivial equation to the system). Take \(A^{2i}= [a_{1}, \ldots , a_{2i} ]^{\rm T},\) \(b^{2i}= [b_{1}, \ldots , b_{2i}]^{\rm T}\) and \(r_{j}(x)=a_{j}^{\rm T}x-b_{j} (j=1,\ldots ,m).\) Assume that we are at the ith iteration and \(x_{i}\) satisfies \(A^{2i}x=b^{2i}\). We determine \(H_{i}\in \mathbb {R}^{n\times n}\), \(z_{i}\in \mathbb {R}^{n}\) and \(\lambda _{i}\in \mathbb {R}\), so that \(x_{i}=x_{i-1}-\lambda _{i}H_{i}^{\rm T}z_{i}\) is a solution of the first 2i equations of the system (1), which is \(A^{2i}x_{i}=b^{2i}.\) As a result, we have \(r_{j}(x_{i})=0, j=1,\ldots ,2i.\) Thus, for \(j=2i-1\) and \(j=2i\), we have

$$\begin{aligned} \left\{ \begin{array}{l} a_{2i-1}^{\rm T} (x_{i-1}-\lambda _{i}H_{i}^{\rm T}z_{i} )-b_{2i-1} = 0,\\ a_{2i}^{\rm T} (x_{i-1}-\lambda _{i}H_{i}^{\rm T}z_{i} )-b_{2i} = 0,\\ \end{array}\right. \end{aligned}$$

or equivalently

$$\begin{aligned} \left\{ \begin{array}{l} \lambda _{i}(H_{i}a_{2i-1})^{\rm T}z_{i} = r_{2i-1}(x_{i-1}),\\ \lambda _{i}(H_{i}a_{2i})^{\rm T}z_{i} = r_{2i}(x_{i-1}).\\ \end{array}\right. \end{aligned}$$

Suppose that \(r_{2i-1}(x_{i-1})\ne 0\) and \(r_{2i}(x_{i-1})\ne 0\). Then, \(\lambda _{i}\) must be nonzero and the above systems are compatible if and only if we take

$$\begin{aligned} \lambda _{i}=\frac{\overline{r}_{2i-1}(x_{i-1})}{(H_{i}a_{2i-1})^{\rm T}z_{i}} =\frac{\overline{r}_{2i}(x_{i-1})}{(H_{i}a_{2i})^{\rm T}z_{i}}, \end{aligned}$$
(3)

where \(\overline{r}_{2i-1}(x_{i-1})=\overline{r}_{2i}(x_{i-1})=r_{2i-1}(x_{i-1}) r_{2i}(x_{i-1})\). There are various ways to satisfy (3). We consider the following model:

$$\begin{aligned} \left\{ \begin{array}{l}{\text {1. Choose\,an\,appropriate\,update\,for\, }}H_{i}, \, {\text { so that}} \, H_{i}a_{2i-1} = H_{i}a_{2i}\ne 0, \\ {\text {2. Select\,a\,vector\,}}z_{i}, \ {\text { so that}} \, z_{i}^{\rm T} \ H_{i}a_{2i}\ne 0.\\ \end{array}\right. \end{aligned}$$

Now, since two new equations are considered in each iterate, we use a rank one update as (4) and another rank one update as (5). Therefore, we have a new rank two update for each iterate. Here, we present \(H_{i}\) and \(H_{l_{i}}\) satisfying the following properties:

$$\begin{aligned} H_{i}a_{2i-1}=H_{i}a_{2i}\ne 0,\quad i=1,\ldots ,l, \end{aligned}$$
(4)

and

$$\begin{aligned} H_{l_{i}}a_{j}=0, \quad j=1,\ldots ,2i. \end{aligned}$$
(5)

Now, assume

$$\begin{aligned} c_{j}=\left\{ \begin{array}{ll} a_{2i}-a_{j}, &\quad j\ne 2i,\\ a_{2i}, &\quad j=2i.\\ \end{array}\right. \end{aligned}$$
(6)

As relations (4)–(6), we will construct (7) and (8), such that

$$\begin{aligned} H_{i}c_{j}=0,\quad j=1,\ldots ,2i-1, \end{aligned}$$
(7)

and

$$\begin{aligned} H_{l_{i}}c_{j}=0,\quad j=1,\ldots ,2i. \end{aligned}$$
(8)

We compute \(H_{i+1}\) from \(H_{l_{i}}\), such that the relations (7) and (8) hold and proceed inductively. We define \(H_{i+1}=H_{l_{i}}+g_{2i+1}d_{2i+1}^{\rm T}\) where \(g_{2i+1},d_{2i+1}\in \mathbb {R}^{n}\). We need to have

$$\begin{aligned} \left\{ \begin{array}{ll} H_{i+1}c_{j}=0, &\quad j=1,\ldots , 2i+1,\\ H_{l_{i}}c_{j}=0, &\quad j=1,\ldots , 2i.\\ \end{array}\right. \end{aligned}$$

Therefore, we have

$$\begin{aligned} \left\{ \begin{array}{ll} (H_{l_{i}}+g_{2i+1}d_{2i+1}^{\rm T} )c_{j}=0, &{} j=1,\ldots ,2i+1,\\ H_{l_{i}}c_{j}=0, &{} j=1,\ldots ,2i.\\ \end{array}\right. \end{aligned}$$

Thus, we must define \(g_{2i+1},d_{2i+1}\in \mathbb { R}^{n}\) in such a way that

$$\begin{aligned} H_{l_{i}}c_{j}+ (d_{2i+1}^{\rm T}c_{j} )g_{2i+1}=0, \quad j=1,\ldots ,2i+1, \end{aligned}$$
(9)

and \(H_{l_{i}}c_{j}=0, j=1,\ldots ,2i\). The condition (9) is satisfied for \(j\le 2i-1\) by the induction hypothesis. By taking \(j=2i+1\) in (9), we get

$$\begin{aligned} (d^{\rm T}_{2i+1}c_{2i+1})g_{2i+1}=-H_{l_i}c_{2i+1}. \end{aligned}$$
(10)

We consider the choice \(g_{2i+1}=-H_{l_{i}}c_{2i+1}\) with \(d^{\rm T}_{2i+1}c_{2i+1}=1,\) which clearly holds in (10). Now, we define \(d_{2i+1}=H_{l_{i}}^{\rm T}{w}_{2i+1}\) for \({w}_{2i+1}\in \mathbb {R}^{n}\), such that

$$\begin{aligned} w^{\rm T}_{2i+1}H_{l_i}c_{2i+1}=1. \end{aligned}$$
(11)

Later, as Theorem 2.3, we will conclude that the above system has solution and \(H_{l_i}\) is well defined. Therefore, the updating formula for \(H_{i}\) is given by the following:

$$\begin{aligned} H_{i+1}=H_{l_{i}}-H_{l_{i}}c_{2i+1}w_{2i+1}^{\rm T}H_{l_{i}}, \end{aligned}$$
(12)

where \(w_{2i+1}\) can be any vector satisfying (11). Now, to satisfy (12) and complete the induction, \(H_{1}\) should be chosen, so that \(H_{1}a_{1}=H_{1}a_{2}\ne 0\) or

$$\begin{aligned} H_{1}c_{1}=0. \end{aligned}$$
(13)

Let \(H_{0}\) be an arbitrary nonsingular matrix. We obtain \(H_{1}\) from \(H_{0}\) using a rank one update. Take \(H_{1}=H_{0}-u_{1}v_{1}^{\rm T}\), where \(u_{1},v_{1}\in \mathbb {R}^{n}\) are chosen, so that (13) is satisfied. Therefore, we have \(H_{0}c_{1}-(v_{1}^{\rm T}c_{1})u_{1}=0.\) The previous equation is satisfied if we set \(u_{1}=H_{0}c_{1}, v_{1}=H_{0}^{\rm T}w_{1},\) and we choose \(w_{1}\in \mathbb {R}^{n}\) satisfying the next condition:

$$\begin{aligned} w_1^{\rm T}H_0c_1=1. \end{aligned}$$
(14)

Clearly, (14) can be held with a proper choice of \(w_{1}\in \mathbb {R}^n\), whenever \(a_{1}\) and \(a_{2}\) are linearly independent. Thus, we have a rank one update as follows:

$$\begin{aligned} H_{1}=H_{0}-H_{0}c_{1}w_{1}^{\rm T}H_{0}, \end{aligned}$$
(15)

where \(w_{1}\) is an arbitrary vector satisfying (14). To compute the general solution and update the second phase for the ith iteration, we introduce a matrix \(H_{l_{i}}\) with properties \(H_{l_{i}}c_{j}=H_{l_{i}}a_{j}=0,\; j=1,\ldots ,2i.\) Therefore, we define the matrix \(H_{l_{i}}\) by a rank one update as the next formula:

$$\begin{aligned} H_{l_{i}}=H_{i}-H_{i}a_{2i}w_{2i}^{\rm T}H_{i}=H_{i}-H_{i}c_{2i}w_{2i}^{\rm T}H_{i},\quad i=1,\ldots ,l. \end{aligned}$$
(16)

Notice that \(w_{2i}\in \mathbb {R}^{n}\) is an arbitrary vector satisfying the following condition:

$$\begin{aligned} w_{2i}^{\rm T}H_{i}a_{2i}=1. \quad (w_{2i}^{\rm T}H_{i}c_{2i}=1). \end{aligned}$$
(17)

Hence, the general solution of the ith iteration is given by \(x_{l_{i}}=x_{i}-H_{l_{i}}^{\rm T}s\), where \(s\in \mathbb {R}^{n}\) is arbitrary. Clearly, the general solution for the last iteration is presented by the following:

$$\begin{aligned} x_{l_{l}}=x_{l}-H_{l_{l}}^{\rm T}s, \end{aligned}$$
(18)

where \(s\in \mathbb {R}^{n}\) is arbitrary.

Lemma 2.1

The vectors \(a_{1},\ldots ,a_{m}\) are linearly independent if and only if the vectors \(c_{1},\ldots ,c_{m}\) are linearly independent.

Therefore, we proved the following theorem.

Theorem 2.2

Assume that we have \(m=2l\) arbitrary linearly independent vectors \(a_{1}, \ldots ,a_{m}\in \mathbb {R}^{n}\) and an arbitrary nonsingular matrix \(H_{0}\in \mathbb {R}^{n\times n}\). Let \(H_{1}\) be generated by (15) and \(w_{1}\) is satisfying (14), and the sequence of matrices \(H_{i}\), \(i=2,\ldots ,l\), be generated by the following:

$$\begin{aligned} H_{i}=H_{l_{i-1}}-H_{l_{i-1}}c_{2i-1}w_{2i-1}^{\rm T}H_{l_{i-1}}, \end{aligned}$$
(19)

with \(w_{2i-1}\in \mathbb {R}^{n}\) satisfying the following condition:

$$\begin{aligned} w^{\rm T}_{2i-1}H_{l_{i-1}}c_{2i-1}=1. \end{aligned}$$
(20)

In addition, let the sequence of matrices \(H_{l_{1}},\ldots ,H_{l_{l}}\) be generated by (16) with \(w_{2i}\in \mathbb {R}^{n}\) satisfying (17). Then, when we are at the ith iteration, the following properties (1)–(4) hold for \(i=1,\ldots ,l\).

  1. (1)

    \(H_{i}a_{2i-1}=H_ia_{2i}\ne 0\).

  2. (2)

    \(H_{i}a_{j}=0, \; j=1,\ldots ,2(i-1)\).

  3. (3)

    \(H_{l_i}a_{j}=0,\; j=1,\ldots ,2i\).

  4. (4)

    \(H_{l_i}c_{j}=0,\; j=1,\ldots ,2i\).

we are at the i th iteration.

Theorem 2.3

Assume that \(a_{1},\ldots ,a_{m}\) are linearly independent vectors in \(\mathbb {R}^{n}\). Let \(H_{0}\in \mathbb {R}^{n\times n}\) be an arbitrary nonsingular matrix and \(H_{1}\) be defined by (15), with \(w_{1}\in \mathbb {R}^n\) satisfying (14), and for \(i=2,\ldots ,l\), the sequence of matrices \(H_{i}\) be generated by (19) with \(w_{2i-1}\in \mathbb {R}^{n}\) satisfying (20). Then, for all i, \(1\le i\le l\), and j, \(2i\le j\le m\), the vectors \(H_{i}a_{j}\) are nonzero and linearly independent.

Proof

We proceed by induction. For \(i=1\), the theorem is true. Since if \(\sum \nolimits _{j=2}^{m}\alpha _{j}H_{1}a_{j}=0\), we have

$$\begin{aligned} \sum _{j=2}^{m}\alpha _{j} (H_{0}-H_{0}c_{1}w_{1}^{\rm T}H_{0} )a_{j}=0, \end{aligned}$$
$$\begin{aligned} \sum _{j=2}^{m}\alpha _{j}H_{0}a_{j}- \left( \sum _{j=2}^{m}\alpha _{j}w_{1}^{\rm T}H_{0}a_{j} \right) H_{0}c_{1}=0, \end{aligned}$$

or

$$\begin{aligned} \beta _{1}H_{0}a_{1}-(\beta _1-\alpha _2)H_0a_2+ \sum _{j=3}^{m}\alpha _{j}H_{0}a_{j}=0, \end{aligned}$$

where \(\beta _{1}=\sum \nolimits _{j=2}^{m}\alpha _jw_1^{\rm T}H_0a_j.\) Now, since \(a_{1},\ldots ,a_{m}\) are nonzero and linearly independent and \(H_{0}\) is nonsingular, \(H_{0}a_{j}\) for \(1\le j\le m\) are nonzero and linearly independent.

Hence, \(\beta _{1}=\alpha _{2}=\alpha _{3}=\cdots =\alpha _{m}=0\). Therefore, the vectors \(H_{1}a_{j}\), for \(2\le j\le m\), are nonzero and linearly independent. Now, we assume that the theorem is true up to \(1\le i\le l-1\), and then, we prove it for \(i+1\). From (12), for \(2i+2\le j\le m\), we have

$$\begin{aligned} H_{i+1}a_{j}=H_{l_{i}}a_{j}-(w_{2i+1}^{\rm T}H_{l_{i}}a_{j})H_{l_{i}}c_{2i+1}. \end{aligned}$$
(21)

We need to show that the relation

$$\begin{aligned} \sum _{j=2i+2}^{m}\alpha _{j}H_{i+1}a_{j}=0, \end{aligned}$$
(22)

implies that \(\alpha _{j}=0\), for \(2i+2\le j\le m\). Using (21), we can write (22) as follows:

$$\begin{aligned} \sum _{j=2i+2}^{m}\alpha _{j}H_{l_{i}}a_{j}-\beta '_{1} H_{l_{i}}c_{2i+1}=0, \end{aligned}$$

where \(\beta '_{1}=\sum \nolimits _{j=2i+2}^{m}\alpha _{j}w_{2i+1}^{\rm T}H_{l_{i}}a_{j}\). Thus, we have \(\sum _{j=2i+2}^{m}\alpha _{j}H_{l_{i}}a_{j}-\beta '_{1}H_{l_{i}}(a_{2i+2}-a_{2i+1})=0.\) As a result

$$\begin{aligned} \sum _{j=2i+3}^m\alpha _{j}H_{l_{i}}a_{j}+\beta '_{1}H_{l_{i}}a_{2i+1}-(\beta '_{1} -\alpha _{2i+2})H_{l_{i}}a_{2i+2}=0. \end{aligned}$$
(23)

By the induction hypothesis, the vectors \(H_{i}a_{j}\), for \(2i\le j\le m\), are nonzero and linearly independent. Now, as the assumption of the induction, we are going to prove that the vectors \(H_{l_i}a_j\) are nonzero and linearly independent for \(2i+1\le j\le m\). Using (16), we have

$$\begin{aligned} H_{l_{i}}a_j=H_{i}a_j-H_{i}a_{2i}w_{2i}^{\rm T}H_{i}a_j. \end{aligned}$$
(24)

We must prove the relation

$$\begin{aligned} \sum ^m_{j=2i+1}\alpha '_jH_{l_i}a_j=0, \end{aligned}$$
(25)

implies that \(\alpha '_j=0\), for \(j\ge 2i+1\). Using (24), we can write (25) as follows:

$$\begin{aligned} \sum ^m_{j=2i+1}\alpha '_jH_ia_j-\left( \sum ^m_{j=2i+1}\alpha '_jw^{\rm T}_{2i}H_ia_j\right) H_ia_{2i}=0. \end{aligned}$$

As the linearly independence of \(H_ia_{j}\), for \(2i\le j\le m\), we conclude that \(\alpha '_j=0\), for \(j\ge 2i+1\). Consequently, for relation (23), we have \(\beta '_{1}=\alpha _{2i+2}=\alpha _{2i+3}=\cdots =\alpha _{m}=0.\) Hence, the vectors \(H_{i+1}a_{j}\), for \(2i+2\le j\le m\) are nonzero and linearly independent. \(\square\)

Corollary 2.4

Considering the assumptions of Theorem 2.3, the following statement are true.

  1. (i)

    When we are at the ith iteration, we have \(H_{i}a_{2i-1}=H_{i}a_{2i}\ne 0.\) In addition, there exists \(z_i\in \mathbb {R}^n\) and \(w_{2i}\in \mathbb {R}^n\), such that \(z_{i}^{\rm T}H_{i}a_{2i}\ne 0\) and \(w_{2i}^{\rm T}H_{i}a_{2i}\ne 0\).

  2. (ii)

    Each of the systems (14) and (20) has solution.

  3. (iii)

    \(H_i\), \(H_{l_i}\), \(x_i\), and \(x_{l_i}\) are well defined for \(i=1,\ldots ,l.\)

  4. (iv)

    By taking \(H_{0}=H_{l_{0}}\) and considering the sequence of matrices \(H_{l_{0}},\ldots ,H_{l_{i-1}}\), \(0\le i\le l-1\), we have \(H_{l_i}a_{j}\ne 0\), \(j> i\). Thus, by taking \(j=2i+1\), there exist \(z'_j\in \mathbb {R}^n\), such that \({z'}_{j}^{\rm T}H_{l_ i}a_{j}\ne 0\). (It is noted again that when we choose \(j=2i\), we are at the ith step. In fact for the ith iteration, we have two choices \(j=2i-1\) and \(j=2i\).)

Theorem 2.5

For the matrices \(H_{i}\) generated by (15) and (19) and the matrices \(H_{l_{i}}\) given by (16), we have

$$\begin{aligned} \dim R(H_{i})&= n-2i+1, \quad 1\le i\le l,\\ \dim N(H_{i})&= 2i-1,\quad 1\le i\le l,\\ \dim R(H_{l_{i}})&= n-2i,\quad 1\le i\le l,\\ \dim N(H_{l_{i}})&= 2i,\quad 1\le i\le l,\\ \dim R(H_{l_{l}})&= n-m,\\ \dim N(H_{l_{l}})&= m. \end{aligned}$$

The notations R and N stand for the range and nullspace, respectively.

Theorem 2.6

Consider the matrix \(P^i=(\tilde{p}_1,\ldots ,\tilde{p}_i), i\le \frac{m}{2}\), where \(\tilde{p}_i=(p'_i,p_i)\) and \(p_i=H_i^{\rm T}z_i\) is obtained from the first-phase rank reducing process of the ith iteration. The sequence of \(p'_i=H_{l_{i-1}}^{\rm T}z'_i\) is obtained from \(H_0^{\rm T}z'_1,\ldots ,H_{l_{i-1}}^{\rm T}z'_i\). In addition, take \(A^i=(a_1,\ldots ,a_{2i})\), \(i\le m\). Then, the matrix \(L^i\) defined by \(L^i =(A^i)^{\rm T}P^i\) is lower triangular and nonsingular.

Proof

As \((L ^i)_{j,k}=a^{\rm T}_j \tilde{p}_k\), Theorem 2.2 implies that \((L ^i)_{j,k}\) is zero for \(j<k\), and thus, \(L^i\) is lower triangular. As Theorem 2.3 and Corollary 2.4, we have \(a^{\rm T}_j \tilde{p}_k\ne 0\), for \(j=k\). Consequently \(L^i\) is nonsingular. \(\square\)

Corollary 2.7

If \(m=n\), the matrix A can be factorized in the form \(A=LS\), with \(L=L^n\) and \(S=(P^n)^{-1}\).

Remark 2.8

For the matrices \(H_{i}\) generated by (15) and (19), we have \(x_{i}=x_{i-1}-\lambda _{i}H_{i}^{\rm T}z_{i}, (i=1,\ldots ,l),\) where \(x_{i}\) is a solution of the first 2i equations of the system and \(\lambda _{i}\) will be discussed in Algorithms 1 and 2. The matrix \(H_{0}\) can be any arbitrary matrix, but to reduce computational complexity, and for stability reasons, we choose \(H_{0}\) unitary matrix \(I_{n}\). Numerical stability and computational complexity will be discussed in Sects. 4 and 5, respectively.

Algorithm 1

Assume that \(A_{m\times n}=\big [a_1,a_2,\ldots ,a_m\big ]^{\rm T}\) is a full row rank linear matrix, where \(m=2l\) and \(Ax=b\) is a compatible full row rank linear system, \(x\in \mathbb {R}^n\) and \(b\in \mathbb {R}^m\).

  1. 1.

    Let \(x_{0}\in \mathbb {R}^{n}\) be an arbitrary vector and choose \(H_{0}=I_{n}\in \mathbb {R}^{n\times n}\) (unitary matrix). Set \(i=1\).

  2. 2.
    1. (a)

      Compute \(r_{1}(x_{0})=a_{1}^{\rm T}x_{0}-b_{1}\) and \(r_2(x_0)=a_{2}^{\rm T}x_{0}-b_{2}.\)

    2. (b)

      If \(r_{1}(x_{0})r_{2}(x_{0})\ne 0\), we let

      $$\begin{aligned} \left\{ \begin{array}{ll} a_{1}=r_{2}(x_{0})a_{1},\\ b_{1}=r_{2}(x_{0})b_{1},\\ \end{array}\right. \quad \left\{ \begin{array}{ll} a_{2}=r_{1}(x_{0})a_{2},\\ b_{2}=r_{1}(x_{0}))b_{2}.\\ \end{array}\right. \end{aligned}$$
    3. (c)

      If \(r_1(x_0) r_2(x_0) =0\) and one of the residual values is nonzero, without loss of generality, we assume that \(r_2(x_{0})\ne 0\) and we let

      $$\begin{aligned} \left\{ \begin{array}{l} a_{1}=a_{2}+a_{1},\\ b_{1}=b_{2}+b_{1},\\ \end{array}\right. \quad \left\{ \begin{array}{l} a_{2}=a_2,\\ b_{2}=b_2.\end{array}\right. \end{aligned}$$
    4. (d)

      If \(r_{1}(x_{0})r_{2}(x_{0})=0\), and both of the residual values are zero, \(x_{1}\) will be \(x_{0}\) and go to (5).

  3. 3.
    1. (a)

      Take \(c_{1}=a_{2}-a_{1}\).

    2. (b)

      Select \(w_{1}\in \mathbb {R}^{n}\), such that \(w_1^{\rm T}H_0c_1=1\) and compute \(H_1=H_0-H_0c_1w_1^{\rm T}H_0.\)

    3. (c)

      Select \(w_{2}\in \mathbb {R}^n\), such that \(w_{2}^{\rm T}H_{1}a_{2}=1\) and compute \(H_{l_{1}}=H_{1}-H_{1}a_{2}w_{2}^{\rm T}H_{1}.\)

    4. (d)

      Select \(z_{1}\in \mathbb {R}^{n}\), so that \(z_{1}^{\rm T}H_{1}a_{2}\ne 0\) and compute \(\lambda _{1}=\frac{r_{1}(x_{0}) r_{2}(x_{0})}{z_{1}^{\rm T}H_{1}a_{2}}\), if \(r_{1}(x_{0}) r_{2}(x_{0}) \ne 0,\) \(\lambda _{1}=\frac{r_2(x_{0})}{z_1^{\rm T}H_1a_2}\), if \(r_{1}(x_{0})=0\) and \(r_{2}(x_{0})\ne 0 .\)

  4. 4.

    Take \(x_{1}=x_{0}-\lambda _{1}H_{1}^{\rm T}z_{1}\).

  5. 5.

    Set \(i=2\) and go to 6.

  6. 6.

    While \(i\le \frac{m}{2}\), do steps 6(a)–8(b).

    1. (a)

      Compute \(r_{2i-1}(x_{i-1})=a_{2i-1}^{\rm T}x_{i-1}-b_{2i-1}\) and \(r_{2i}(x_{i-1})=a^{\rm T}_{2i}x_{i-1}-b_{2i}\).

    2. (b)

      If \(r_{2i-1}(x_{i-1}) r_{2i}(x_{i-1})\ne 0\), then we let

      $$\begin{aligned} \left\{ \begin{array}{l} a_{2i-1}=r_{2i}(x_{i-1})a_{2i-1},\\ b_{2i-1}=r_{2i}(x_{i-1})b_{2i-1}, \end{array}\right. \quad \left\{ \begin{array}{l} a_{2i}=r_{2i-1}(x_{i-1})a_{2i},\\ b_{2i}=r_{2i-1}(x_{i-1})b_{2i}. \end{array}\right. \end{aligned}$$
    3. (c)

      If \(r_{2i-1}(x_{i-1})r_{2i}(x_{i-1})=0\) and one of the residual values is nonzero, without loss of generality, we assume that \(r_{2i}(x_{i-1})\ne 0\) and we let

      $$\begin{aligned} \left\{ \begin{array}{l}a_{2i-1}=a_{2i}+a_{2i-1},\\ b_{2i-1}=b_{2i}+b_{2i-1},\end{array}\right. \quad \left\{ \begin{array}{l}a_{2i}=a_{2i},\\ b_{2i}=b_{2i}.\end{array}\right. \end{aligned}$$
    4. (d)

      If \(r_{2i-1}(x_{i-1}) r_{2i}(x_{i-1})=0\) and both the residual values are zero, \(x_{i}\) will be \(x_{i-1}\) and go to 8(b).

  7. 7.
    1. (a)

      Take \(c_{2i-1}=a_{2i}-a_{2i-1}\).

    2. (b)

      Select \(w_{2i-1}\in \mathbb {R}^{n}\), such that \(w^{\rm T}_{2i-1}H_{l_{i-1}}c_{2i-1}=1,\) and compute \(H_i=H_{l_{i-1}}-H_{l_{i-1}}c_{2i-1}w^{\rm T}_{2i-1}H_{l_{i-1}}.\)

    3. (c)

      Select \(w_{2i}\in \mathbb {R}^n\), such that \(w_{2i}^{\rm T}H_{i}a_{2i}=1\) and compute \(H_{l_{i}}=H_{i}-H_{i}a_{2i}w_{2i}^{\rm T}H_{i}.\)

    4. (d)

      Select \(z_{i}\in \mathbb {R}^n\), so that \(z_{i}^{\rm T}H_{i}a_{2i}\ne 0\), and compute \(\lambda _{i}=\frac{r_{2i-1}(x_{i-1}) r_{2i}(x_{i-1})}{z_{i}^{\rm T}H_{i}a_{2i}}\), if \(r_{2i-1}(x_{i-1})r_{2i}(x_{i-1})\ne 0\), \(\lambda _{i}=\frac{r_{2i}(x_{i-1})}{z_{i}^{\rm T}H_{i}a_{2i}}\), if \(r_{2i-1}(x_{i-1})= 0\) and \(r_{2i}(x_{i-1})\ne 0\).

  8. 8.
    1. (a)

      Take \(x_{i}=x_{i-1}-\lambda _{i}H_{i}^{\rm T}z_{i}\).

    2. (b)

      Set \(i=i+1.\) Endwhile.

  9. 9.

    Stop (\(x_{l}\) is a solution of the system). From (18), we can compute general solution of the system after the final iterate by \(x_{l_{l}}=x_{l}-H_{l_{l}}^{\rm T}s\) where \(s\in \mathbb {R}^{n}\) is arbitrary.

Remark 2.9

To reduce computational complexity of Algorithm 1, we propose the following tactics:

  1. 1.

    Taking \(x_{0}\) is proper in step (1) as Algorithm 1.

  2. 2.

    To compute \(w_1\in \mathbb {R}^n\), we take \(t_j=H_0c_1\), and then, we define \(w_1\) as follows:

    $$\begin{aligned} w_1=\left\{ \begin{array}{ll}\dfrac{1}{ {\rm sign}(t_{j_M})t_{j_M}},&{}i=j_M,\\ \\ 0,&{}i\ne j_M.\end{array}\right. \end{aligned}$$

    Then, we take \(t_{j_M}=\max \{|t_j|: j\in \{1,\ldots ,n\}, {\text { such that }} w_1^{\rm T}t_j=1 \}.\) To determine the other \(w_{2i-1}\in \mathbb {R}^n\), \(i=2,\ldots ,l\), we let \(t'_j=H_{l_{i-1}}c_{2i-1}\), and then, we continue by the similar way.

  3. 3.

    We can choose \(w_{2i}=z_i\in \mathbb {R}^n\), \(i=1,\ldots ,l\), by defining the next parameters \(d_j=H_ia_{2i}\), and

    $$\begin{aligned} w_{2i}=z_i=\left\{ \begin{array}{ll}\dfrac{1}{\text{ sign }(d_{j_M})d_{j_M}},&{}i=j_M,\\ \\ 0,&{}i\ne j_M.\end{array}\right. \end{aligned}$$

    Now, we take \(d_{j_M}=\max \{|d_j|: j\in \{1,\ldots ,n\}, \ {\text {such that}} \ w_{2i}^{\rm T}d_j=z_i^{\rm T}d_j=1\}.\)

Now, assume that the vectors \(a_{1},\ldots ,a_{m}\) are linearly independent. According to Theorem 2.5, we have \(N(H_{i})=2i-1\) and \(N(H_{l_{i}})=2i\). Therefore, \(2i-1\) rows of matrix \(H_{i}\) are depend on the other rows of \(H_{i}\) and 2i rows of the matrix \(H_{l_{i}}\) are depend on the other rows of \(H_{l_{i}}\). As we have defined the matrix \(H_{i}\) and \(H_{l_{i}}\) in such a way that exactly \(2i-1\) rows of \(H_{i}\) and 2i rows of matrix \(H_{l_{i}}\) are the zero vectors, we can delete these zero vectors using appropriate operator. Next, we will show how to reduced the number of rows of the Abaffian matrix by two in every iterate and economize the space needed for our new model.

New compression two-step ABS algorithm

Algorithm 2

Assume that \(A_{m\times n}= [a_1,a_2,\ldots ,a_m ]^{\rm T}\) is a full row rank linear matrix, where \(m=2l\) and \(Ax=b\) is a compatible full row rank linear system, \(x\in \mathbb {R}^n\) and \(b\in \mathbb {R}^m\).

  1. 1.

    Let \(x_{0}\in \mathbb {R}^{n}\) be an arbitrary vector and take \(H_{0}=I_{n}\). Set \(i=1\).

  2. 2.
    1. (a)

      Compute \(r_{1}(x_{0})=a_{1}^{\rm T}x_{0}-b_{1}\) and \(r_2(x_0)=a_{2}^{\rm T}x_{0}-b_{2}.\)

    2. (b)

      If \(r_{1}(x_{0})r_{2}(x_{0})\ne 0\), we let

      $$\begin{aligned} \left\{ \begin{array}{l} a_{1}=r_{2}(x_{0})a_{1},\\ b_{1}=r_{2}(x_{0})b_{1},\\ \end{array}\right. \quad \left\{ \begin{array}{l} a_{2}=r_{1}(x_{0})a_{2},\\ b_{2}=r_{1}(x_{0})b_{2}.\\ \end{array}\right. \end{aligned}$$
    3. (c)

      If \(r_1(x_0) r_2(x_0) =0\) and one of the residual values is nonzero, without loss of generality, we assume that \(r_2(x_{0})\ne 0\) and we let

      $$\begin{aligned} \left\{ \begin{array}{l} a_{1}=a_{2}+a_{1},\\ b_{1}=b_{2}+b_{1},\\ \end{array}\right. \quad \left\{ \begin{array}{l} a_{2}=a_2,\\ b_{2}=b_2.\end{array}\right. \end{aligned}$$
    4. (d)

      If \(r_{1}(x_{0})r_{2}(x_{0})=0\), and both the residual values are zero, \(x_{1}\) will be \(x_{0}\) and go to (5).

  3. 3.
    1. (a)

      Take \(c_{1}=a_{2}-a_{1}\).

    2. (b)

      Select \(w_{1}\in \mathbb {R}^n\), such that \(w_1^{\rm T}H_0c_1=1\) and compute \(H_1=D(H_0-H_0c_1w_1^{\rm T}H_0).\) (We define D as an operator that delete the zero rows of a matrix).

    3. (c)

      Select \(w_{2}\in \mathbb {R}^{n-1}\), such that \(w_{2}^{\rm T}H_{1}a_{2}=1\) and compute \(H_{l_{1}}=D(H_{1}-H_{1}a_{2}w_{2}^{\rm T}H_{1}).\)

    4. (d)

      Select \(z_{1}\in \mathbb {R}^{n-1}\), so that \(z_{1}^{\rm T}H_{1}a_{2}\ne 0\) and compute \(\lambda _{1}=\frac{r_{1}(x_{0}) r_{2}(x_{0})}{z_{1}^{\rm T}H_{1}a_{2}}\), if \(r_{1}(x_{0}) r_{2}(x_{0}) \ne 0,\) \(\lambda _{1}=\frac{r_2(x_{0})}{z_1^{\rm T}H_1a_2}\), if \(r_{1}(x_{0})=0\) and \(r_{2}(x_{0})\ne 0 .\)

  4. 4.

    Take \(x_{1}=x_{0}-\lambda _{1}H_{1}^{\rm T}z_{1}\).

  5. 5.

    Set \(i=2\) and go to 6.

  6. 6.

    While \(i\le \frac{m}{2}\), do steps 6(a)–8(b).

    1. (a)

      Compute \(r_{2i-1}(x_{i-1})=a_{2i-1}^{\rm T}x_{i-1}-b_{2i-1}\) and \(r_{2i}(x_{i-1})=a^{\rm T}_{2i}x_{i-1}-b_{2i}\).

    2. (b)

      If \(r_{2i-1}(x_{i-1}) r_{2i}(x_{i-1})\ne 0\), then we let

      $$\begin{aligned} \left\{ \begin{array}{l} a_{2i-1}=r_{2i}(x_{i-1})a_{2i-1},\\ b_{2i-1}=r_{2i}(x_{i-1})b_{2i-1}, \end{array}\right. \quad \left\{ \begin{array}{l} a_{2i}=r_{2i-1}(x_{i-1})a_{2i},\\ b_{2i}=r_{2i-1}(x_{i-1})b_{2i}. \end{array}\right. \end{aligned}$$
    3. (c)

      If \(r_{2i-1}(x_{i-1})r_{2i}(x_{i-1})=0\) and one of the residual values is nonzero, without loss of generality, we assume that \(r_{2i}(x_{i-1})\ne 0\) and we let

      $$\begin{aligned} \left\{ \begin{array}{l}a_{2i-1}=a_{2i}+a_{2i-1},\\ b_{2i-1}=b_{2i}+b_{2i-1},\end{array}\right. \quad \left\{ \begin{array}{l}a_{2i}=a_{2i},\\ b_{2i}=b_{2i}.\end{array}\right. \end{aligned}$$
    4. (d)

      If \(r_{2i-1}(x_{i-1}) r_{2i}(x_{i-1})=0\) and both of the residual values are zero, \(x_{i}\) will be \(x_{i-1}\) and go to 8(b).

  7. 7.
    1. (a)

      Take \(c_{2i-1}=a_{2i}-a_{2i-1}\).

    2. (b)

      Select \(w_{2i-1}\in \mathbb {R}^{n-(2i-2)}\), such that \(w^{\rm T}_{2i-1}H_{l_{i-1}}c_{2i-1}=1,\) and compute \(H_i=D(H_{l_{i-1}}-H_{l_{i-1}}c_{2i-1}w^{\rm T}_{2i-1}H_{l_{i-1}}).\)

    3. (c)

      Select \(w_{2i}\in \mathbb {R}^{n-(2i-1)}\), such that \(w_{2i}^{\rm T}H_{i}a_{2i}=1\) and compute \(H_{l_{i}}=D(H_{i}-H_{i}a_{2i}w_{2i}^{\rm T}H_{i}).\)

    4. (d)

      Select \(z_{i}\in \mathbb {R}^{n-(2i-1)}\), so that \(z_{i}^{\rm T}H_{i}a_{2i}\ne 0\), and compute \(\lambda _{i}=\frac{r_{2i-1}(x_{i-1}) r_{2i}(x_{i-1})}{z_{i}^{\rm T}H_{i}a_{2i}}\), if \(r_{2i-1}(x_{i-1})r_{2i}(x_{i-1})\ne 0\), \(\lambda _{i}=\frac{r_{2i}(x_{i-1})}{z_{i}^{\rm T}H_{i}a_{2i}}\), if \(r_{2i-1}(x_{i-1})= 0\) and \(r_{2i}(x_{i-1})\ne 0\).

  8. 8.
    1. (a)

      Take \(x_{i}=x_{i-1}-\lambda _{i}H_{i}^{\rm T}z_{i}\).

    2. (b)

      Set \(i=i+1.\) Endwhile.

  9. 9.

    Stop (\(x_{l}\) is a solution of the system). General solution of the system is obtained after the final iterate by \(x_{l_{l}}=x_{l}-H_{l_{l}}^{\rm T}s\), where \(s\in \mathbb {R}^{n}\) is arbitrary.

Remark 3.1

We can present the stepsize for Algorithms 1 and 2 as the following form:

$$\begin{aligned} \lambda _i=\frac{a^{\rm T}_{2i-1}x_{i-1}-b_{2i-1}}{z_i^{\rm T}H_ia_{2i}}=\frac{a^{\rm T}_{2i}x_{i-1}-b_{2i}}{z_i^{\rm T}H_ia_{2i}}=\frac{\tilde{r_{i}}}{z_i^{\rm T}H_ia_{2i}}. \end{aligned}$$

Notice that \(a_{2i-1}, a_{2i}, b_{2i-1}\), \(b_{2i}\) are updated in steps 2 and 6 and \(\tilde{r_{i}}\) is defined the updated residual value of the ith step. Thus, we have the exact values presented in 3(d) and 7(d) for the stepsize. Notice that dimension of \(H_{i}\) and \(z_{i}\) is compressed in Algorithm 2.

Remark 3.2

To reduce computational complexity of Algorithm 2, we propose the following tactics.

  1. 1.

    Taking \(x_{0}\) is proper in step (1) as our new algorithm.

  2. 2.

    To compute \(w_1\in \mathbb {R}^n\), we take \(t_j=H_0c_1\), and then, we define \(w_1\) as follows:

    $$\begin{aligned} w_1=\left\{ \begin{array}{ll}\dfrac{1}{ {\rm sign}(t_{j_M})t_{j_M}},&{}i=j_M,\\ \\ 0,&{}i\ne j_M.\end{array}\right. \end{aligned}$$

    Then, we take \(t_{j_M}=\max \{|t_j|: j\in \{1,\ldots ,n\}, {\text { such that }} w_1^{\rm T}t_j=1 \}.\) To compute the other \(w_{2i-1}\in \mathbb {R}^{n-(2i-2)}\), \(i=2,\ldots ,l\), we let \(t'_j=H_{l_{i-1}}c_{2i-1}\), and then, we continue the similar way.

  3. 3.

    We can choose \(w_{2i}=z_i\in \mathbb {R}^{n-(2i-1)}\), \(i=1,\ldots ,l\), by defining the next parameters \(d_j=H_ia_{2i}\), and

    $$\begin{aligned} w_{2i}=z_i=\left\{ \begin{array}{ll}\dfrac{1}{\text{ sign }(d_{j_M})d_{j_M}},&{}i=j_M,\\ \\ 0,&{}i\ne j_M.\end{array}\right. \end{aligned}$$

    Now, we take \(d_{j_M}=\max \{|d_j|: j\in \{1,\ldots ,n-(2i-1)\}, \text{ such } \text{ that } w_{2i}^{\rm T}d_j=z_i^{\rm T}d_j=1\}.\)

Notice that by reducing the numbers of rows of the Abaffian matrix by two, for per step, we choose the other parameters as proper dimensions.

Analysis of error propagation in the new versions of two-step ABS models

We investigate the stability of Algorithm 1, for nonsingular system (2), by the backward error analysis technique due to Broyden and Galantai as [8, 9, 12]. Some basic results given by Broyden [9] and Galantai [12] are extended here. Galantai’s method has important role in our work. Systems (1) and (2) are equivalent, but for proceeding with Galantai’s method, we choose system (2) in all of this section. We prove the analysis of error propagation for nonsingular systems, and then, we will conclude that it is valid for every full row rank system. Galantai studied the numerical stability of the ABS class for linear systems of the form:

$$\begin{aligned} A^{\rm T}x=b, \quad (A\in \mathbb {R}^{n\times n}) \end{aligned}$$

n where the matrix \(A^{\rm T}\) is nonsingular. \(P=[p_1,\ldots ,p_n]\) and \(V=[v_1,\ldots ,v_n]\) are \(n\times n\)-type nonsingular matrices with column vectors \(p_j\) and \(v_j (j=1,\ldots ,n)\). Denote by I the unit matrix of \(n\times n\) type. The ABS class has the following form:

Algorithm 3

Let \(x_0\in \mathbb {R}^n\) be arbitrary

For \(k=1,\ldots ,n\);

Compute:

\(\alpha _k=\frac{v^{\rm T}_k(A^{\rm T}x_{k-1}-b)}{p^{\rm T} Av_k}\);

\(x_k=x_{k-1}-\alpha _kp_{k-1}\);

end for;

where the ABS update algorithm is given by

Set \(H_1=I\);

For \(k=1,\ldots ,n\);

Compute:

\(p_k=H_k^{\rm T} z_k; (p^{\rm T}_kAv_k\ne 0)\)

\(H_{k+1}=H_k-\frac{H_k Av_k w^{\rm T}_kH_k}{(w^{\rm T}_k H_k Av_k)}; (w^{\rm T}_k H_k Av_k\ne 0)\)

end for.

Algorithm 3 is finitely terminated in n steps. The matrix V is scaling the system \(A^{\rm T}x=b\). As [20], the pair (PV) is said to be the \(A^{\rm T}\)-conjugate if \(V^{\rm T}A^TP=L\) is the lower triangular. The ABS algorithm generates all \(A^{\rm T}\)-conjugate directions for suitable choices of parameters [2]. Therefore, the ABS class of methods coincides with those studied by Stewart [20]. Results on the numerical stability of conjugate direction methods are given in [9, 21]. A stability analysis for decent methods is given in [7]. See also [3, 4, 10]. As Corollary 2.7, we conclude that the pair (PI) is \(A^{\rm T}\)-conjugate for Algorithm 1, considering nonsingular matrix of system (2). Using the basic idea of Broyden’s backward error analysis method, we present Algorithm 1 as the following form:

$$\begin{aligned} X_k=\Psi _{k-1}(X_{k-1}), \quad \left( k=1,\ldots ,\frac{n}{2}\right) , \end{aligned}$$
(26)

where \(X_k\) is the solution of the problem. Assume that an error \(\xi _j\) occurs at the jth step and that is error propagates further. It is also assumed that no other source of error occurs. The exact solution \(X_{\frac{n}{2}}\) is given by the following:

$$\begin{aligned} X_{\frac{n}{2}}=\Psi _{\frac{n}{2}-1} \{\Psi _{\frac{n}{2}-2} \{\ldots \{\Psi _j (X_j)\}\ldots \} \}=\Omega ^{\frac{n}{2}-j}(X_j), \end{aligned}$$
(27)

while the perturbed solution \(X'_{\frac{n}{2}}\) is given by the following:

$$\begin{aligned} X'_{\frac{n}{2}}=\Omega ^{\frac{n}{2}-j}(X_j+\xi _j). \end{aligned}$$
(28)

If the quantity \(||X_{\frac{n}{2}}-X'_{\frac{n}{2}}||\) is large, then the algorithm (26) must be very unstable. As Broyden ideas, it must be small for stable algorithms. Consequently, \(||X_{\frac{n}{2}}-X'_{\frac{n}{2}}||\) is a measure of stability for our new version of algorithm.

Now, we recall some of the important parameters of Algorithm 1 for the system \(A^{\rm T}x=b\), \(A\in \mathbb {R}^{n\times n}\), where the matrix \(A^{\rm T}\) is nonsingular considering system (2). Let \(P=[\tilde{p}_1,\ldots ,\tilde{p}_\frac{n}{2}]\), with column vectors defined from Theorem 2.6 as the form \(\tilde{p}_i=(p'_i,p_i)\). We know that, \(p_i=H_i^{\rm T}z_i\) is obtained the first-phase rank reducing process of the ith iteration. Thus, P is \(n\times n\)-type nonsingular matrix. Here, we take \(V=I_n=[v_1,\ldots ,v_{\frac{k}{2}}]\) with \(v_{k}=e_k=(e_{2k-1},e_{2k})\), \((k=1,\ldots ,\frac{n}{2})\). Therefore, \(V=I_n=[e_1,\ldots ,e_n]\) is an \(n\times n\) identify matrix. As Corollary 2.4, we have \(p^{\rm T}_kAe_{2k-1}=p^{\rm T}_kAe_{2k}\ne 0\). Taking \(v_{k}\) for the stepsize means that we can arbitrary choose one of the values of \(e_{2k-1}\) or \(e_{2k}\). Also from Remark 3.1, we have

$$\begin{aligned} \lambda _k=\frac{e^{\rm T}_{2k-1}(A^{\rm T}x_{k-1}-b)}{p^{\rm T}_kAe_{2k-1}}=\frac{e^{\rm T}_{2k}(A^{\rm T}x_{k-1}-b)}{p^{\rm T}_kAe_{2k}}. \end{aligned}$$

We choose \(H_{0}=I_n\) and we present \(\lambda _k,x_k,p_k\), and \(H_k\) as the following form:

$$\begin{aligned} \lambda _k=\frac{v_k^{\rm T}(A^{\rm T}x_{k-1}-b)}{p^{\rm T}_k Av_k},\end{aligned}$$
(29)
$$\begin{aligned} x_k=x_{k-1}-\lambda _kp_k, \end{aligned}$$
(30)
$$\begin{aligned} p_k=H_k^{\rm T} z_k, \quad (p^{\rm T}_kAv_k\ne 0)\end{aligned}$$
(31)
$$\begin{aligned} H_k=H_{l_{k-1}}-H_{l_{k-1}}A(v_{2k}-v_{2k-1})w^{\rm T}_{2k-1}H_{l_{k-1}},\quad (w^{\rm T}_{2k-1}H_{l_{k-1}}A(v_{2k}-v_{2k-1})=1), \end{aligned}$$
(32)

where \(H_{l_{k}}=H_{k}-H_{k}Av_{2k}w^{\rm T}_{2k}H_{k}\), \(w^{\rm T}_{2k}H^{\rm T}_{k}Av_{2k}=1\). Updating vectors and residual values can be computed as Algorithm 1, for \(k=1,\ldots ,\frac{n}{2}\). Next, we investigate the stability of the conjugate direction methods of the form (29) and (30) with a special emphasize on the ABS update (31) and (32). It is noted again that for (29)–(32) and the others related parameters, we choose V the unitary matrix and the pair (PI) is \(A^{\rm T}\) conjugate for a nonsingular matrix as system (2). Considering these points, Algorithm 1 coincides with those studies in [12] for a unitary V and \(\frac{n}{2}\) steps. Some basic results of [9, 12, 20] are extended here. Let us introduce the following notation:

$$\begin{aligned} P_k=\frac{p_kv^{\rm T}_kA^{\rm T}}{v_k^{\rm T}A^Tp_k}. \end{aligned}$$
(33)

It is easily verified that \(P^2_k=P_k, i.e., P_k\) is a projector. The matrix \(P_k\) is a rank one projector onto the space spanned by \(p_k\) along the orthogonal complement of the space spanned by \(Av_k\). Therefore, we have \(R(P_k)=R(p_k)\) and \(N(p_k)=R^\bot (Av_k)\). Note that the projectors \(P_1,\ldots ,P_n\) are called conjugate by Stewart [20], if we have \(P_tP_j=0 (t<j)\). The following lemma is immediate consequent of Stewart’s definition [20].

Lemma 4.1

Let \(P_1,P_2,\ldots\) be conjugate projectors. for \(i\le k\) we have:

$$\begin{aligned} P_i(I-P_k)(I-P_{k-1})\cdots (I-P_1)=0\cdot \end{aligned}$$
(34)

Proof

By conjugacy, for \(i<j\), we have

$$\begin{aligned} P_i(I-P_j)=P_i-P_iP_j=P_i, \end{aligned}$$
(35)

and since \(P_i\) is a projector,

$$\begin{aligned} P_i(I-P_i)=P_i-P^2_i=P_i-P_i=0. \end{aligned}$$
(36)

Together (35) and (36), we have (34). \(\square\)

With the notation \(P_k\) as (33), the method of the class (29) and (30) has the following form (motivating techniques in [12, 20] and extending them for our model):

\(\displaystyle x_k=(I-P_k)x_{k-1}+d_k. \quad \left( d_k=\frac{p_kv_k^{\rm T}b}{v^{\rm T}_k A^{\rm T}p_k}\right)\)

Denote by \(x^*\) the solution of the linear system. By taking \(E_k=x^*-x_k\), we have the recursion \(E_k=(I-P_k)E_{k-1}\) with the following solution:

$$\begin{aligned} E_k=(I-P_k)\cdots (I-P_1)E_0. \end{aligned}$$

Now, we introduce the notation

$$\begin{aligned} (I-P_k)\cdots (I-P_j)=Q_{k,j}, \end{aligned}$$

for \(k\ge j\). Now, let us suppose that an error occurs at the \((k-1)\)th step and only at it. The perturbed results of the \((k-1)\)th step are denoted by \(x'_{k-1}\). The perturbed results of further steps are denoted by \(x'_j(j=k,\ldots ,\frac{n}{2})\). Then, we have

$$\begin{aligned} x'_n=\prod ^{\frac{n}{2}-k}_{j=0}(I-P_{\frac{n}{2}-j})x'_{k-1}+\sum ^{\frac{n}{2}}_{j=k}\left( \prod ^{\frac{n}{2}-j-1}_{t=0}(I-P_{\frac{n}{2}-t})\right) d_j, \end{aligned}$$

from which it follows that the error occurring in the final iteration:

$$\begin{aligned} x^*-x'_{\frac{n}{2}}=x_{\frac{n}{2}}-x'_{\frac{n}{2}}=\prod ^{\frac{n}{2}-k}_{j=0}(I-P_{\frac{n}{2}-j})(x_{k-1}-x'_{k-1})=Q_{\frac{n}{2},k}(x_{k-1}-x'_{k-1}). \end{aligned}$$

The matrix \(Q_{\frac{n}{2},k}\) can be considered as the error matrix. Hence, we have the error bound

$$\begin{aligned} ||x_{\frac{n}{2}}-x'_{\frac{n}{2}}||\le ||Q_{\frac{n}{2},k}|| ||x_{k-1}-x'_{k-1}||. \end{aligned}$$
(37)

A method of the classes (29) and (30) is considered to be optimal in the sense of Broyden [9], if \(||Q_{\frac{n}{2},k}||\) is minimal for all k. First, we characterize \(Q_{\frac{n}{2},k}\). Using the \(A^{\rm T}\)-conjugate property, we have \(P_tP_j=0 (t<j)\). Therefore, \(P_tQ_{\frac{n}{2},k}=0 (k\le t\le \frac{n}{2})\) and \((I-P_t)Q_{\frac{n}{2},k}=Q_{\frac{n}{2},k} (k\le t\le \frac{n}{2})\), which means that \(Q_{\frac{n}{2},k}Q_{\frac{n}{2},k}=Q_{\frac{n}{2},k}\), i.e., \(Q_{\frac{n}{2},k}\) is a projector. By observing that \(R(I-P_k)=N(P_k)\) and \(N(I-P_k)=R(P_k)\), we conclude that

$$\begin{aligned} R(Q_{\frac{n}{2},k})=\bigcap ^{\frac{n}{2}}_{j=k}N(P_j), \quad N(Q_{\frac{n}{2},k})=\sum ^{\frac{n}{2}}_{j=k}R(P_j). \end{aligned}$$

Thus, we have

$$\begin{aligned} R(Q_{\frac{n}{2},k})=\bigcap ^{\frac{n}{2}}_{j=k}R^\bot (Av_j)=R^\bot (AV^{\frac{n}{2}-k+1 \mid }), \end{aligned}$$

and

$$\begin{aligned} N(Q_{\frac{n}{2},k})=\sum ^{\frac{n}{2}}_{j=k}R(P_j)=R(P^{\frac{n}{2}-k+1 \mid })=R^\bot (AV^{\mid k-1}). \end{aligned}$$

Here, motivating Householder notations, for any matrix A, we define the matrices \(A^{\overline{k}}\), \(A^{| k}\), \(A^{\underline{k}}\), and \(A^{k \mid }\), the submatrices consisting of, respectively, the first 2k rows, the first 2k columns, the last 2k rows, and the last 2k columns of A.

The following lemmas show that the symmetric projectors have a minimum property both in the Frobenius and in the spectral norm. See the proof of the following lemmas in [12].

Lemma 4.2

If \(A^2=A\), \(A\ne 0\), and \({\rm rank}(A)=m\), then \(||A||_{F}\ge \sqrt{m}\) and \(||A||=\sqrt{m}\) if and only if A is symmetric.

Lemma 4.3

If \(A^2=A\) and \(A\ne 0\), then \(||A||_{sp}\ge 1\) and \(||A||=1\) if and only if \(A=A^{\rm T}\).

Theorem 4.4

The method of classes (29) and (30) is optimal in the sense of Broyden if and only if (39) holds or equivalently \(V^{\rm T}A^TAV=D\) satisfies for a diagonal matrix D.

Proof

A projector P is symmetric if and only if \(R(P)=N^\bot (P)\). Hence, \(Q_{\frac{n}{2},k}\) is symmetric (and has minimal norm) if and only if

$$\begin{aligned} R(AV^{| k-1})=R^\bot (AV^{\frac{n}{2}-k+1 |})\cdot \end{aligned}$$
(38)

A method is optimal in Broyden’s sense if and if only if (38) is satisfied for all k. The latter condition is equivalent to

$$\begin{aligned} Av_i\bot Av_j.\quad (i\ne j). \end{aligned}$$
(39)

In the matrix formulation, it means that \(V^{\rm T}A^TAV=D\), where D is diagonal. Notice that, since we choose V the unitary matrix, the theorem is true if and only if \(A^{\rm T}A=D\) holds. \(\square\)

This result was originally obtained by Broyden in [9] in a different way for Algorithm 3, but we extended the proof of Galantai [12] for Algorithm 1.

Theorem 4.5

For the error propagation (26)–(28) and (37), the classes (29) and (30) yield the bound

$$\begin{aligned} ||x_{\frac{n}{2}}-x'_{\frac{n}{2}}||\le k(A)||x_{k-1}-x'_{k-1}||, \end{aligned}$$

where k(A) is the condition number of matrix A.

Proof

Since \(Q_{\frac{n}{2},k}\) is a projector onto \(R^\bot (AV^{\frac{n}{2}-k+1 |})\) along \(R^\bot (AV^{| k-1})\), it can be represented in the form:

$$\begin{aligned} Q_{\frac{n}{2},k}=(A^{-T}V^{-T})^{| k-1}(V^{\rm T}A^T)^{\overline{k-1}}, \end{aligned}$$
(40)

As \(||B^{| k}||\le ||B||\) and \(||B^{\overline{k}}||\le ||B||\) are both in the Frobenius and spectral norms, we may bound \(Q_{\frac{n}{2},k}\) by \(||Q_{\frac{n}{2},k}||\le ||A^{-T}V^{-T}|| ||V^{\rm T}A^T||\le k(A)k(V)\), which is exactly \(||Q_{\frac{n}{2},k}||\le k(A)k(V)\). Since we choose V the unitary matrix, \(k(V)=1\) and \(||Q_{\frac{n}{2},k}||\le k(A)\). As (37), we conclude that

$$\begin{aligned} ||x_{\frac{n}{2}}-x'_{\frac{n}{2}}||\le k(A)||x_{k-1}-x'_{k-1}||. \end{aligned}$$

\(\square\)

Theorem 4.6

The residual error \(r'_k=A^{\rm T}(x_k-x'_k)\) as Algorithm 1 is minimal for all k.

Proof

Defining the residual perturbation as \(r'_k=A^{\rm T}(x_k-x'_k)\), we have \(r'_\frac{n}{2}=A^{\rm T}Q_{\frac{n}{2},k}A^{-T}r'_k\) for the model (26)–(28) and (37). Using the relation \((AB)^{|k}(CD)^{\overline{k}}=A\{B^{|k}C^{\overline{k}}\}D\) and (40), we have

$$\begin{aligned} A^{\rm T}Q_{\frac{n}{2},k}A^{-T}=(V^{-T})^{| k-1}(V^{\rm T})^{\overline{k-1}} \end{aligned}$$
(41)

is a projector onto \(R((V^{-T})^{| k-1})\) along \(R((V^{-T})^{\frac{n}{2}-k+1 |})\). The bound \(||r'_\frac{n}{2}||\le || A^{\rm T}Q_{\frac{n}{2},k}A^{-T}|| ||r'_k||\) is minimal if and only if

$$\begin{aligned} R((V^{-T})^{| k-1})=R^\bot ((V^{-T})^{\frac{n}{2}-k+1 |})\cdot \end{aligned}$$

The classes of the (29) and (30) can be called optimal for the residual perturbation \(r'_k\) if (41) holds for all k. This condition is obviously satisfied if and only if \((V^{-T})^{\rm T}(V^{-T})=D^{-1}\) for a suitable diagonal matrix D from which the condition \(V^{\rm T}V=D\) follows. Since we choose V as a unitary matrix, \(r'_k\) is minimal for all k. \(\square\)

Corollary 4.7

Assume that instead of (29) and (30) we have the next recursion

$$\begin{aligned} x'_k= (I-P_k)(x'_{k-1}+\xi _{k-1} )+d_k, \left( k=1,\ldots ,\frac{n}{2}\right) , \end{aligned}$$

where \(\xi _{k-1}\) denote the error which occurred at the \((k-1)\)-th iteration. Then, we have \(x_{\frac{n}{2}}-x'_{\frac{n}{2}}=\sum ^{\frac{n}{2}}_{k=1}Q_{\frac{n}{2}k}\xi _{k-1}\) from which the bound

$$\begin{aligned} ||x_{\frac{n}{2}}-x'_{\frac{n}{2}}||\le \sum ^{\frac{n}{2}}_{k=1}||Q_{\frac{n}{2},k}|| ||\xi _{k-1}||\le k(A)\sum ^n_{k=1}||\xi _{k-1}|| \end{aligned}$$

follows. For the optimal method, k(A) is obviously replaced by \(\sqrt{m}\) or by 1 depending on the norm chosen.

Corollary 4.8

Since the achievements of this section are valid for every nonsingular linear system of (2), it is valid for full row rank systems of this class. Algorithm 2 just deletes the zero rows of the Abaffian matrix as Algorithm 1 and consequently reduces dimensions of the related parameters. Thus, the results of analysis of error propagation are true for the new compression two-step ABS scheme. Notice that systems (1) and (2) are the equivalent. Moreover, the points 2 and 3, as Remarks 2.9 and 3.2, are useful to increase stability of our algorithms.

Computational complexity and numerical results

At first, considering Remark 2.9, we compute the number of multiplications, for Algorithm 1, but for arbitrary nonsingular matrix \(H_{0}\), as follows (Table 1):

Table 1 Number of multiplications required to solve m linear equations in \(l=\frac{m}{2}\) iterates, for the main parameters

Hence, the total number of multiplications for the l iterates is

$$\begin{aligned} N&=\sum ^l_{i=1} [2n+1+2n+2+(2n+1)(n-2i+2)+(2n+1)(n-2i+1)+2n ],\\ N&=2mn^2-m^2n+O(mn)-O(m^2)+O(m). \end{aligned}$$

The total number for Huang’s method is \(\frac{3}{2}mn^2+O(mn)\) multiplications and \(3mn^2-\frac{7}{4}m^2n+\frac{1}{6}m^3+O(mn)+O(m^2)+O(n^2)\) for the two-step methods presented by Amini et al. [5, 6]. Our computations indicate that our numerical results are better than Amini et al.’s two-step algorithms. In addition, when \(n<2m\), we need cheaper number of multiplications up to the corresponding Huang’s algorithm. In addition, when m tends to n, the total number for our two-step method is \(n^3\) multiplications, while it is \(\frac{3}{2}n^3\) for Huang’s algorithm and \(\frac{17}{12}n^3\) for those two-step methods given by Amini et al. [5, 6]. Obviously, when m and n are not too large, the lower order terms have influence on our results.

Remark 5.1

Now, considering Algorithm 1 with Remark 2.9, setting of \(H_{0}\) the identity matrix \(I_{n}\), we have the better results. In this case, for the ith iteration, the number of multiplications of the matrix \(H_{i}\) is \((n-2i+2)(4i-1)\) and it is \((n-2i+1)(4i+1)\) multiplications for the matrix \(H_{l_{i}}\). Therefore, our algorithm needs \(nm^2-\frac{2}{3}m^3+O(nm)+O(m^2)\) multiplications. Hence, for \(m=n\), \(\frac{1}{3}n^3\), multiplications plus lower order terms are needed. Notice that the main storage requirement is the storage of \(H_{l_{i}}\), which has at most \(\frac{1}{4}n^2\) positions. This is half of the storage needed by the Gaussian elimination method. Our new version ABS algorithm is computationally better than the classical Gaussian elimination method, having the same arithmetic cost but using less memory and no pivoting is necessary.

Remark 5.2

The operator D is used to compress and economize our required space, via deleting the zero rows of the Abaffian matrix by two for per iteration. Therefore, it has not any effect on the number of multiplications, and Algorithms 1 and 2 have the same numbers of multiplications. Thus, both of our works are better than Amini et al.’s two-step ABS algorithms and Huang’s method. Furthermore, we have good competition with the classical Gaussian elimination method.

Conclusion

This paper presents two new version of ABS models for the general solution of compatible full row rank linear systems of equations in at most \(\left[ \frac{m+1}{2}\right]\) iterates. For the second algorithm, the number of rows of the Abaffian matrix is reduced by two for every iteration and the space is compressed as much as possible. A theory of conjugate projectors is developed and used for the analysis of error propagation. We evaluate the numerical stability of our schemes using Broyden’s backward error analysis method and Galantai’s technique. In addition, both our models need less computational complexity up to those corresponding two-step ABS algorithms and Huang’s method. Our new two-step ABS algorithms is computationally better than the classical Gaussian elimination method, having the same arithmetic cost but using less memory and no pivoting is necessary.