Global GPBiCGstab(L) method for solving linear matrix equations

Horiuchi, Itsuki; Aihara, Kensuke; Suzuki, Toshio; Ishiwata, Emiko

doi:10.1007/s11075-022-01415-7

Global GPBiCGstab(L) method for solving linear matrix equations

Original Paper
Open access
Published: 18 November 2022

Volume 93, pages 295–319, (2023)
Cite this article

Download PDF

You have full access to this open access article

Numerical Algorithms Aims and scope Submit manuscript

Global GPBiCGstab(L) method for solving linear matrix equations

Download PDF

Itsuki Horiuchi¹,
Kensuke Aihara ORCID: orcid.org/0000-0002-4802-4090²,
Toshio Suzuki³ &
…
Emiko Ishiwata³

1819 Accesses
1 Citation
Explore all metrics

Abstract

Global Krylov subspace methods are effective iterative solvers for large linear matrix equations. Several Lanczos-type product methods (LTPMs) for solving standard linear systems of equations have been extended to their global versions. However, the GPBiCGstab(L) method, which unifies two well-known LTPMs (i.e., BiCGstab(L) and GPBiCG methods), has been developed recently, and it has been shown that this novel method has superior convergence when compared to the conventional LTPMs. In the present study, we therefore extend the GPBiCGstab(L) method to its global version. Herein, we present not only a naive extension of the original GPBiCGstab(L) algorithm but also its alternative implementation. This variant enables the preconditioning technique to be applied stably and efficiently. Numerical experiments were performed, and the results demonstrate the effectiveness of the proposed global GPBiCGstab(L) method.

An efficient GPU version of the preconditioned GMRES method

Article 25 October 2018

Implicitly restarted global Krylov subspace methods for matrix equations $AXB = C$

Article 26 June 2023

GPU Implementation of Krylov Solvers for Block-Tridiagonal Eigenvalue Problems

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In many fields of scientific computing, it is important to solve linear matrix equations of the form

$$ \begin{array}{@{}rcl@{}} \mathcal{A}(X) = B, \end{array} $$

(1)

where $\mathcal {A}: \mathbb {R}^{n\times s} \rightarrow \mathbb {R}^{n\times s}$ is a linear operator, and $X, B \in \mathbb {R}^{n\times s}$. In the present work, we focus mainly on the case where the representation of $\mathcal {A}$ is a large sparse nonsymmetric matrix and s ≪ n.

The most basic form of (1) is the standard linear system of equations Ax = b, where $A \in \mathbb {R}^{n\times n}$ and $\boldsymbol {b} \in \mathbb {R}^{n}$. For this problem, short-recurrence Krylov subspace methods, such as the conjugate gradient squared (CGS) method [1], bi-conjugate gradient stabilized (BiCGSTAB) method [2], and BiCGStab2 method [3], are typical iterative solvers that are widely used. The CGS, BiCGSTAB, and BiCGStab2 methods are known to belong to the so-called Lanczos-type product methods (LTPMs) (also called hybrid BiCG methods), wherein the residuals are defined by the product of the stabilizing polynomials and BiCG residuals. The BiCGstab(L) method [4] and GPBiCG method [5] also belong to LTPMs and can be viewed as two different generalizations of the abovementioned typical methods. Moreover, the GPBiCGstab(L) method [6], which unifies BiCGstab(L) and GPBiCG, has been developed recently. Numerical experiments have been performed to demonstrate that the GPBiCGstab(L) method has superior convergence when compared to the conventional LTPMs, especially for linear systems with complex spectra.

Because $\mathcal {A}$ is defined on the finite-dimensional space, there always exists a standard linear system of equations that is mathematically equivalent to (1), to which LTPMs can be applied theoretically. However, it is often difficult to construct the representation matrix of $\mathcal {A}$, i.e., the coefficient matrix of the converted linear system. For example, it is well known that the Sylvester equation

$$ \begin{array}{@{}rcl@{}} \mathcal{A}(X) = AX - XC = B,\quad A \in \mathbb{R}^{n\times n},\quad C \in \mathbb{R}^{s\times s} \end{array} $$

(2)

can be converted to a standard linear system $\tilde {A} \boldsymbol {x} = \tilde {\boldsymbol {b}}$ with $\tilde {A} := I_{s} \otimes A - C^{\top } \otimes I_{n} \in \mathbb {R}^{ns\times ns}$ and $\tilde {\boldsymbol {b}} := \textbf {vec}(B) \in \mathbb {R}^{ns}$, but it is costly to construct $\tilde {A}$ explicitly when ns is large. Here, ⊗ denotes the Kronecker product, I_k is the identity matrix of order k, and $\textbf {vec}: \mathbb {R}^{n\times s} \rightarrow \mathbb {R}^{ns}$ is the vectorization operator, i.e., $\textbf {vec}(V) = [\boldsymbol {v}_{1}^{\top }, \boldsymbol {v}_{2}^{\top }, \dots , \boldsymbol {v}_{s}^{\top }]^{\top } \in \mathbb {R}^{ns}$ for $V = [\boldsymbol {v}_{1}, \boldsymbol {v}_{2}, \dots , \boldsymbol {v}_{s}] \in \mathbb {R}^{n\times s}$. More generally, there are also cases where the representation matrix of $\mathcal {A}$ cannot be expressed explicitly. Therefore, application of iterative solvers to the matrix equations without explicitly using the representation matrix is desirable. The global Krylov subspace methods (cf., e.g., [7,8,9,10,11,12]) are known to be such matrix-free approaches and generate approximate solutions using the matrix Krylov subspace $\mathcal K_{k}(\mathcal {A}, R_{0}) := \left \{ {\sum }_{i=0}^{k-1} c_{i} \mathcal {A}^{i}(R_{0}) \mid c_{i} \in \mathbb {R} \right \}$, where $R_{0} := B-\mathcal {A}(X_{0})$ is the initial residual with an initial guess X₀, and $\mathcal {A}^{i}$ indicates that $\mathcal {A}$ acts i times. Thus, the global methods can be implemented without using the representation matrix if the linear transformation $\mathcal {A}(V)$ is available for an arbitrary $V \in \mathbb {R}^{n\times s}$.

Several LTPMs have been extended thus far to their global versions. For example, the Gl-CGS-type methods [13, 14], Gl-BiCGSTAB method [9], and Gl-GPBiCG method [15], have been proposed, where “Gl-” represents the global version of the method. Moreover, relating to the methods based on the global Lanczos process [9], the global quasi-minimal residual (Gl-QMR) method [16] and the global bi-conjugate residual (Gl-BiCR) type methods [17] have also been developed. However, to the best of our knowledge, the global versions of the classical BiCGstab(L) and recent GPBiCGstab(L) methods have not been studied previously. We therefore develop a novel global GPBiCGstab(L) method (including Gl-BiCGstab(L)) for further convergence improvement. Then, based on the aforementioned works, we restrict the target matrix equation to a large sparse nonsymmetric linear system with multiple right-hand sides AX = B and discuss the preconditioned algorithms of Gl-GPBiCGstab(L). It is comparatively easy to extend the original GPBiCGstab(L) algorithm [6, Algorithm 3] to its global version naively. However, when applying the so-called right preconditioning to this naive Gl-GPBiCGstab(L), several concerns regarding numerical stability and computational costs are noted. To overcome these problems, we also derive a refined variant of Gl-GPBiCGstab(L), which is mathematically equivalent to the naive version but uses alternative recursion formulas to update the iteration matrices. This refined variant enables right preconditioning to be applied with greater robustness and fewer computational costs. Through numerical experiments on model problems (i.e., linear system AX = B and the Sylvester equation AX − XC = B), we show that the refined Gl-GPBiCGstab(L) (with or without preconditioning) is more effective than other typical Gl-LTPMs.

The remainder of this paper is organized as follows. In Section 2, we describe a naive Gl-GPBiCGstab(L) algorithm for (1) by extending the original GPBiCGstab(L). In Section 3, we derive a refined variant of Gl-GPBiCGstab(L) by partly using different recursion formulas. In Section 4, we present the preconditioned algorithms of the naive and refined Gl-GPBiCGstab(L) for the representative matrix equation AX = B as well as note their advantages and disadvantages. In Section 5, numerical experiments are presented to demonstrate that the proposed methods have superior convergence when compared to the conventional ones. Finally, some concluding remarks and a note on the future directions are given in Section 6.

2 Global GPBiCGstab(L) algorithm

This section presents the extension of the GPBiCGstab(L) algorithm proposed in [6] to its global version for solving linear matrix equations of the form (1).

2.1 Extension to global algorithms

We first describe a generic method for constructing global algorithms. When an algorithm of the LTPMs for the standard linear system of equations is given, its global version can be derived directly through the following simple steps (see also [9, 10]):

1.
Applying the given LTPM algorithm to $\tilde {A}\boldsymbol {x} = \tilde {\boldsymbol {b}}$ that is equivalent to (1), where $\tilde {A} \in \mathbb {R}^{ns\times ns}$ is a representation matrix of $\mathcal {A}$ and $\tilde {\boldsymbol {b}} := \textbf {vec}(B)$.
2.
Rewriting the operations in the algorithm using the following equivalent reformulations. For $\boldsymbol {x}, \boldsymbol {y} \in \mathbb {R}^{ns}$ and $a \in \mathbb {R}$,
$$ \begin{array}{@{}rcl@{}} \text{linear transformation: }\tilde{A} \boldsymbol{x} &\longrightarrow& \mathcal{A}(X), \end{array} $$
(3)
$$ \begin{array}{@{}rcl@{}} \text{vector update (AXPY): } a\boldsymbol{x} + \boldsymbol{y} &\longrightarrow& aX+Y, \end{array} $$
(4)
$$ \begin{array}{@{}rcl@{}} \text{inner product (DOT): }(\boldsymbol{x}, \boldsymbol{y}) := \boldsymbol{x}^{\top} \boldsymbol{y} &\longrightarrow & \langle X, Y\rangle_{F} := \text{tr}(X^{\top} Y), \end{array} $$
(5)
where $X, Y \in \mathbb {R}^{n\times s}$ are the matrices satisfying x = vec(X) and y = vec(Y ). Note that the 2-norm ∥x∥₂ is also converted to the associated Frobenius norm (F-norm) $\|X\|_{F} := \sqrt {\langle X, X\rangle _{F}}$.

From (3), the resulting global algorithm can be implemented without using the representation matrix $\tilde {A}$ explicitly. This is the main advantage of the global methods in actual computations. Moreover, when the transformation by $\mathcal {A}$ is based mainly on matrix–matrix products, we can also implement it with high-performance level-3 BLAS routines.

2.2 Simple derivation

Herein, we describe an algorithm of the Gl-GPBiCGstab(L) method obtained by the simple transformations noted above. For details regarding the GPBiCGstab(L) method itself, we refer to [6] and the next section.

By applying the original GPBiCGstab(L) algorithm [6, Algorithm 3] to $\tilde {A}\boldsymbol {x} = \tilde {\boldsymbol {b}}$ and rewriting the iteration process with (3)–(4), the naive extension to the Gl-GPBiCGstab(L) algorithm is obtained as shown in Algorithm 1. The notations in the algorithm are based on MATLAB conventions. For example, the variables R and P consist of n × s submatrices R_i and P_i for $i=0,1,\dots , j$, respectively; that is, $\mathrm {R} = [\mathrm {R}_{0}; \mathrm {R}_{1};{\dots } ;\mathrm {R}_{j}]$ and $\mathrm {P} = [\mathrm {P}_{0}; \mathrm {P}_{1};{\dots } ; \mathrm {P}_{j}]$, respectively, where $[\mathrm {V}_{0}; \mathrm {V}_{1}; \dots ; \mathrm {V}_{j}] := [\mathrm {V}_{0}^{\top },\mathrm {V}_{1}^{\top },\dots , \mathrm {V}_{j}^{\top }]^{\top }$, and the submatrix V_i theoretically corresponds to $\mathcal {A}^{i}(\mathrm {V}_{0})$.

Remark 1

When using MATLAB, there are several approaches for storing the variables comprising n × s submatrices, such as R and P above. One method is to construct $\mathrm {R} = [\mathrm {R}_{0}; \mathrm {R}_{1};{\dots } ;\mathrm {R}_{j}]$ directly as an n(j + 1) × s long rectangular matrix. Another approach is to use a multidimensional array. Because the matrix R can be viewed as a three-dimensional tensor, it can be stored in an n × s × (j + 1) three-dimensional array. In this case, the i th submatrix R_i is accessed as R(:,:,i) in MATLAB. Alternatively, we can use a structure array. For example, the matrix R can be defined using the MATLAB command ‘struct’ as follows:

$$ \begin{array}{@{}rcl@{}} \text{R\_struct} = \text{struct}(`\text{R}^{\prime}, \{\text{R}_{0}, \text{R}_{1}, \dots, \text{R}_{j}\}), \end{array} $$

where the first argument ‘R’ is the field name and second argument is a cell array containing the submatrices. In this case, the i th submatrix R_i is accessed as R_struct(i).R. Because the implementation using the structure array is faster than the other approaches in our experience, we use this in the actual computations.

Line 19 of Algorithm 1 requires solving a minimization problem

$$ \begin{array}{@{}rcl@{}} \underset{\zeta_{i}, \eta}{\min} \left\| \mathrm{R}_{0} - \sum\limits_{i=1}^{L} \zeta_{i}\mathrm{R}_{i} - \eta \mathrm{Y} \right\|_{F} \end{array} $$

for L + 1 scalar variables $\zeta _{1},\dots ,\zeta _{L}$ and η, where $\mathrm {R}_{i} \in \mathbb {R}^{n\times s}$ for $i=0,1,\dots ,L$ and $\mathrm {Y} \in \mathbb {R}^{n\times s}$ are given in the iteration process. Note that the integer L is a predetermined constant in Gl-GPBiCGstab(L). The above problem is equivalent to the standard least-squares problem

$$ \begin{array}{@{}rcl@{}} &&\quad\quad\quad\quad\quad \underset{\boldsymbol{\zeta}, \eta}{\min} \left\| \mathbf{vec}(\mathrm{R}_{0}) - M \left[\begin{array}{cc} \boldsymbol{\zeta}\\ \eta \end{array}\right] \right\|_{2}, \\ M & := & \left[\mathbf{vec}(\mathrm{R}_{1}), \dots, \mathbf{vec}(\mathrm{R}_{L}), \mathbf{vec}(\mathrm{Y})\right],\quad \boldsymbol{\zeta} := [\zeta_{1},\dots,\zeta_{L}]^{\top}, \end{array} $$

(6)

and can be converted to the normal equation

$$ \begin{array}{@{}rcl@{}} M^{\top} M \left[\begin{array}{cc} \boldsymbol{\zeta}\\ \eta \end{array}\right] = M^{\top} \mathbf{vec}(\mathrm{R}_{0}). \end{array} $$

(7)

As is well known, we can use a direct method such as Cholesky factorization to solve (7), and a more stable approach such as QR factorization is useful for solving the least-squares problem (6) directly when M is ill-conditioned.

Note that the first iteration of GPBiCGstab(L) corresponds to that of BiCGstab(L) and that η needs to be 0. Hence, the first iteration (i.e., one BiCGstab(L) iteration) is displayed separately from the main GPBiCGstab(L) iterations in the original algorithm [6, Algorithm 3], whereas we display all iterations in the same form using a branch at line 19 in Algorithm 1.

3 Alternative implementation of Gl-GPBiCGstab(L)

In this section, we present the derivation of another Gl-GPBiCGstab(L) algorithm that is mathematically equivalent to Algorithm 1 but exploits alternative recursion formulas for updating the iteration matrices. The resulting algorithm uses a slightly simpler expression and is also useful for designing a robust right preconditioned Gl-GPBiCGstab(L) with fewer additional costs, as shown in the next section.

3.1 Basic concepts

Similar to standard LTPMs, the global versions of the LTPMs can be characterized by generating the residuals R_k defined by the form

$$ \begin{array}{@{}rcl@{}} R_{k} := H_{k}(\mathcal{A})\left[ R_{k}^{gbcg} \right], \end{array} $$

(8)

where H_k(λ) is the so-called stabilizing polynomial of degree k satisfying H_k(0) = 1, and $R_{k}^{gbcg}$ is the k th Gl-BiCG residual that is generated by the coupled two-term recurrences

$$ \begin{array}{@{}rcl@{}} R_{k+1}^{gbcg} &=& R_{k}^{gbcg} - \alpha_{k} \mathcal{A}\left( P_{k}^{gbcg}\right), \end{array} $$

(9)

$$ \begin{array}{@{}rcl@{}} P_{k+1}^{gbcg} &=& R_{k+1}^{gbcg} - \beta_{k} P_{k}^{gbcg} \end{array} $$

(10)

with recurrence coefficients $\alpha _{k}, \beta _{k} \in \mathbb {R}$ and direction matrix $P_{k}^{gbcg} \in \mathbb {R}^{n\times s}$ of Gl-BiCG [9]. Note that $H_{k}(\mathcal {A})$ is a linear operator of the polynomial form, i.e., $H_{k}(\mathcal {A}) = \mathcal I - \omega _{1}\mathcal {A} - {\dots } - \omega _{k} \mathcal {A}^{k}$ with $\omega _{j} \in \mathbb {R}$ and an identity operator $\mathcal I$, and we define its operation for $X \in \mathbb {R}^{n\times s}$ as

$$ \begin{array}{@{}rcl@{}} H_{k}(\mathcal{A})[X] := X - \omega_{1}\mathcal{A}(X) - {\dots} - \omega_{k} \mathcal{A}^{k}(X). \end{array} $$

Specific choices of the stabilizing polynomials give specific Gl-LTPMs.

The original GPBiCGstab(L) [6] is a novel framework of the standard LTPMs using the following comprehensive stabilizing polynomials.

$$ \begin{array}{@{}rcl@{}} H_{k+L}(\lambda) &:=& \left( 1-\zeta_{k,1} \lambda - {\dots} -\zeta_{k,L} \lambda^{L}\right)H_{k}(\lambda) - \eta_{k} \lambda G_{k-1}(\lambda), \end{array} $$

(11)

$$ \begin{array}{@{}rcl@{}} G_{k-1}(\lambda) &:=& \frac{H_{k-L}(\lambda) - H_{k}(\lambda)}{\lambda}, \end{array} $$

(12)

where H₀(λ) := 1, η₀ := 0, and k is a multiple of L. The L + 1 coefficients $\zeta _{k,1}, \dots , \zeta _{k,L}, \eta _{k} \in \mathbb {R}$ are independent parameters. The stabilizing polynomials are reduced to those of BiCGstab(L) if η_k is always set to 0 and can be equivalent to those of GPBiCG when L = 1. Thus, we can also derive CGS, BiCGSTAB, and BiCGStab2 from the above framework. In the practical computation of GPBiCGstab(L), the L + 1 parameters are determined by locally minimizing the residual norms in each iteration.

In the following subsections, we show derivation of the Gl-GPBiCGstab(L) algorithm that generates residuals (8) with stabilizing polynomials (11) and (12). The derivation is somewhat different from the one in [6].

3.2 Recursion formulas for updating the iteration matrices

Herein, the operators $H_{k}(\mathcal {A})$ and $G_{k-1}(\mathcal {A})$ are denoted as H_k and G_k− 1, respectively, and we omit the parentheses for simplicity; specifically, $H_{k}(\mathcal {A})[X]$, $G_{k-1}(\mathcal {A})[X]$, and $\mathcal {A}(X)$ are expressed as H_kX, G_k− 1X, and $\mathcal {A}X$, respectively.

Let X_k, $R_{k} = \mathrm {H}_{k}R_{k}^{gbcg}$, and $P_{k} = \mathrm {H}_{k}P_{k}^{gbcg}$ be the approximation, residual, and direction matrix of Gl-GPBiCGstab(L), respectively. These matrices are updated to X_k+L, $R_{k+L} = \mathrm {H}_{k+L}R_{k+L}^{gbcg}$, and $P_{k+L} = \mathrm {H}_{k+L}P_{k+L}^{gbcg}$, respectively, and the processes are counted as one cycle below. We now introduce the following auxiliary matrices for $j=1,2,\dots ,L$.

$$ \begin{array}{@{}rcl@{}} R_{k}^{(j)}:=\mathrm{H}_{k}R_{k+j}^{gbcg},\quad P_{k}^{(j)}:=\mathrm{H}_{k}P_{k+j}^{gbcg}, \end{array} $$

(13)

$$ \begin{array}{@{}rcl@{}} Y_{k}^{(j)}:=\mathcal{A}\mathrm{G}_{k-1}R_{k+j}^{gbcg},\quad U_{k}^{(j)}:=\mathcal{A}\mathrm{G}_{k-1}P_{k+j}^{gbcg},\quad Z_{k}^{(j)} := \mathrm{G}_{k-1}R_{k+j}^{gbcg}. \end{array} $$

(14)

From (11), (13), and (14), the updated residual and direction matrices can be expanded as

$$ \begin{array}{@{}rcl@{}} R_{k+L} &=& R_{k}^{(L)} - \zeta_{k,1}\mathcal{A}R_{k}^{(L)} - {\dots} - \zeta_{k,L}\mathcal{A}^{L}R_{k}^{(L)} - \eta_{k}Y_{k}^{(L)}, \end{array} $$

(15)

$$ \begin{array}{@{}rcl@{}} P_{k+L} &=& P_{k}^{(L)} - \zeta_{k,1}\mathcal{A}P_{k}^{(L)} - {\dots} - \zeta_{k,L}\mathcal{A}^{L}P_{k}^{(L)} - \eta_{k}U_{k}^{(L)}, \end{array} $$

(16)

and the associated approximation can be expressed as

$$ \begin{array}{@{}rcl@{}} X_{k+L} = X_{k}^{(L)} + \zeta_{k,1}R_{k}^{(L)} + {\dots} + \zeta_{k,L}\mathcal{A}^{L-1}R_{k}^{(L)} + \eta_{k}Z_{k}^{(L)}, \end{array} $$

(17)

where we note that $Y_{k}^{(L)} = \mathcal {A}Z_{k}^{(L)}$ holds. When the auxiliary matrices in (15)–(17) are obtained, we can determine the L + 1 parameters $\zeta _{k,1}, \dots , \zeta _{k,L}, \eta _{k}$ to minimize the residual norm ∥R_k+L∥_F, as described in Section 2.2.

As mentioned in [6], the stabilizing polynomial H_k+L(λ) consists of two parts: multiplication of an L th degree polynomial with the previous H_k(λ) and relaxation term − η_kλG_k− 1(λ). We therefore describe the generation of auxiliary matrices for each part separately.

3.2.1 Multiplication of the L th degree polynomial

We describe an iteration process to generate the auxiliary matrices $\mathcal {A}^{i}R_{k}^{(L)}$ and $\mathcal {A}^{i}P_{k}^{(L)}$ for $i=0,1,\dots ,L$, and the associated approximation $X_{k}^{(L)}$ in (15)–(17). This process corresponds to the well-known L-times-BiCG steps in BiCGstab(L), and the description below is based on that in [6, Section 2.2].

Assuming that the approximation $X_{k}^{(0)} := X_{k}$, the corresponding residual $R_{k}^{(0)} := R_{k}$, and direction matrix $P_{k}^{(0)} := P_{k}$ are given at the beginning of the cycle. Then, the j th repetition ($j=1,2,\dots ,L$) of the BiCG steps is described as follows.

From the Gl-BiCG recursions (9) and (10), applying operators $\mathcal {A}^{i}\mathrm {H}_{k}$ to $R_{k+j}^{gbcg}$ for $i=0,1,\dots , j-1$ and to $P_{k+j}^{gbcg}$ for $i=0,1,\dots , j$, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{A}^{i}\mathrm{H}_{k}R_{k+j}^{gbcg} &=& \mathcal{A}^{i}\mathrm{H}_{k}R_{k+j-1}^{gbcg} - \alpha_{k+j-1} \mathcal{A}^{i+1}\mathrm{H}_{k} P_{k+j-1}^{gbcg},\quad i=0,1,\dots, j-1, \\ \mathcal{A}^{i}\mathrm{H}_{k}P_{k+j}^{gbcg} &=& \mathcal{A}^{i}\mathrm{H}_{k}R_{k+j}^{gbcg} - \beta_{k+j-1} \mathcal{A}^{i}\mathrm{H}_{k}P_{k+j-1}^{gbcg},\quad i=0,1,\dots, j, \end{array} $$

where it is noted that $\mathrm {H}_{k} \mathcal {A}V = \mathcal {A} \mathrm {H}_{k}V$ holds for an arbitrary $V \in \mathbb {R}^{n\times s}$. Rewriting the above recursions with the auxiliary matrices (13) gives

$$ \begin{array}{@{}rcl@{}} \mathcal{A}^{i}R_{k}^{(j)} &=& \mathcal{A}^{i}R_{k}^{(j-1)} - \alpha_{k}^{(j-1)} \mathcal{A}^{i+1}P_{k}^{(j-1)},\quad i=0,1,\dots, j-1, \end{array} $$

(18)

$$ \begin{array}{@{}rcl@{}} \mathcal{A}^{i}P_{k}^{(j)} &=& \mathcal{A}^{i}R_{k}^{(j)} - \beta_{k}^{(j-1)} \mathcal{A}^{i}P_{k}^{(j-1)},\quad i=0,1,\dots, j, \end{array} $$

(19)

where $\alpha _{k}^{(j)} := \alpha _{k+j}$ and $\beta _{k}^{(j)} := \beta _{k+j}$. The approximation $X_{k}^{(j)}$ associated with $R_{k}^{(j)}$ can be expressed as

$$ \begin{array}{@{}rcl@{}} X_{k}^{(j)} = X_{k}^{(j-1)} + \alpha_{k}^{(j-1)}P_{k}^{(j-1)}. \end{array} $$

(20)

Following [6, Eqs. (14) and (15)] and using the iteration matrices, we compute the Gl-BiCG coefficients $\alpha _{k}^{(j)}$ and $\beta _{k}^{(j)}$ as follows:

$$ \begin{array}{@{}rcl@{}} &&\quad\alpha_{k}^{(j)} = \frac{\rho_{k}^{(j)}}{\sigma_{k}^{(j)}},\quad \beta_{k}^{(j)} = \frac{\rho_{k}^{(j+1)}}{\sigma_{k}^{(j)}},\\ \rho_{k}^{(j)}&:=&\langle \tilde{R}_{0}, \mathcal{A}^{j}R_{k}^{(j)}\rangle,\quad \sigma_{k}^{(j)} := \langle \tilde{R}_{0}, \mathcal{A}^{j+1} P_{k}^{(j)}\rangle, \end{array} $$

(21)

where $\tilde {R}_{0}$ is the initial shadow residual.

Thus, when $X_{k}^{(j-1)}$, $\mathcal {A}^{i}R_{k}^{(j-1)}$, and $\mathcal {A}^{i}P_{k}^{(j-1)}$ for $i=0,1,\dots ,j-1$ are given at the beginning of the j th repetition; the new matrices $X_{k}^{(j)}$, $\mathcal {A}^{i}R_{k}^{(j)}$, and $\mathcal {A}^{i}P_{k}^{(j)}$ for $i=0,1,\dots ,j$ can be generated using (18)–(21). This updating scheme is summarized as follows:

$$ \begin{array}{cccccc} P_{k}^{(j-1)} \quad & X_{k}^{(j-1)}\quad & \quad\xrightarrow{\text{Eq.}(20)}\quad & \quad X_{k}^{(j)}\quad & \quad & \quad P_{k}^{(j)} \\ & & & &\quad \overset{\text{Eq.}(19)}{\nearrow}_{i=0} \quad & \\ \mathcal{A}P_{k}^{(j-1)} \quad& \quad R_{k}^{(j-1)} \quad & \xrightarrow[i=0]{\text{Eq.}(18)} \quad & \quad R_{k}^{(j)} \quad & \quad & \quad \mathcal{A}P_{k}^{(j)} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ \mathcal{A}^{j-1}P_{k}^{(j-1)} \quad & \quad \mathcal{A}^{j-2}R_{k}^{(j-1)}\quad & \quad \xrightarrow[i=j-2]{\text{Eq.}(18)} \quad & \quad \mathcal{A}^{j-2}R_{k}^{(j)}\quad & \quad & \quad \mathcal{A}^{j-1}P_{k}^{(j)} \\ \downarrow^{\text{Apply} \mathcal{A}} \quad& & & & \quad\overset{\text{Eq.}(19)}{\nearrow}_{i=j-1} & \\ \mathcal{A}^{j}P_{k}^{(j-1)}\quad &\quad \mathcal{A}^{j-1}R_{k}^{(j-1)} \quad & \quad \xrightarrow[i=j-1]{\text{Eq.}(18)} \quad & \mathcal{A}^{j-1}R_{k}^{(j)} \quad & \quad & \quad \mathcal{A}^{j}P_{k}^{(j)} \\ & & & \quad\downarrow^{\text{Apply} \mathcal{A}} \quad & \quad \overset{\text{Eq.}(19)}{\nearrow}_{i=j}\quad & \\ & & & \quad \mathcal{A}^{j}R_{k}^{(j)} \quad & & \end{array} $$

(22)

Note that $\mathcal {A}^{j}P_{k}^{(j-1)}$ and $\mathcal {A}^{j}R_{k}^{(j)}$ are obtained by explicitly applying $\mathcal {A}$ to $\mathcal {A}^{j-1}P_{k}^{(j-1)}$ and $\mathcal {A}^{j-1}R_{k}^{(j)}$, respectively. By repeating scheme (22) for $j=1,2,\dots , L$, we obtain the auxiliary matrices $X_{k}^{(L)}$, $\mathcal {A}^{i}R_{k}^{(L)},$ and $\mathcal {A}^{i}P_{k}^{(L)}$ for $i=0,1,\dots ,L$. This process is identical to that used in Algorithm 1.

3.2.2 Relaxation

Next, we describe the iterations to generate the auxiliary matrices $Y_{k}^{(L)}$, $U_{k}^{(L)}$, and $Z_{k}^{(L)}$ for (15)–(17). This process corresponds to [6, Section 3.1], but we exploit different recursion formulas, and the resulting algorithm will thus be different from Algorithm 1.

For now, we consider the generation of $Y_{k}^{(L)} = \mathcal {A}\mathrm {G}_{k-1}R_{k+L}^{gbcg}$ and $U_{k}^{(L)} = \mathcal {A}\mathrm {G}_{k-1}P_{k+L}^{gbcg}$. We introduce additional auxiliary matrices

$$ \begin{array}{@{}rcl@{}} S_{k}^{(j)} := \mathrm{H}_{k-L}R_{k+j}^{gbcg},\quad Q_{k}^{(j)} := \mathrm{H}_{k-L}P_{k+j}^{gbcg} \end{array} $$

for $j=0,1,\dots ,L$, where $S_{k}^{(0)} = R_{k-L}^{(L)}$ and $Q_{k}^{(0)} = P_{k-L}^{(L)}$ hold. Using the relation $\mathcal {A}\mathrm {G}_{k-1} = \mathrm {H}_{k-L} - \mathrm {H}_{k}$, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{A}\mathrm{G}_{k-1}R_{k+L}^{gbcg} &=& \mathrm{H}_{k-L}R_{k+L}^{gbcg} - \mathrm{H}_{k}R_{k+L}^{gbcg},\\ \mathcal{A}\mathrm{G}_{k-1}P_{k+L}^{gbcg} &=& \mathrm{H}_{k-L}P_{k+L}^{gbcg} - \mathrm{H}_{k}P_{k+L}^{gbcg}, \end{array} $$

which can be rewritten as follows:

$$ \begin{array}{@{}rcl@{}} Y_{k}^{(L)} &=& S_{k}^{(L)} - R_{k}^{(L)}, \end{array} $$

(23)

$$ \begin{array}{@{}rcl@{}} U_{k}^{(L)} &=& Q_{k}^{(L)} - P_{k}^{(L)}. \end{array} $$

(24)

Because $R_{k}^{(L)}$ and $P_{k}^{(L)}$ are generated by L repetitions of (22) with i = 0, we consider the method of obtaining $S_{k}^{(L)}$ and $Q_{k}^{(L)}$. At the j th repetition, applying $\mathcal {A}^{i}\mathrm {H}_{k-L}$ for $i=0,1,\dots , L-j$ to the Gl-BiCG recursions, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{A}^{i}\mathrm{H}_{k-L}R_{k+j}^{gbcg} &=& \mathcal{A}^{i}\mathrm{H}_{k-L}R_{k+j-1}^{gbcg} - \alpha_{k+j-1} \mathcal{A}^{i+1}\mathrm{H}_{k-L}P_{k+j-1}^{gbcg}, \\ \mathcal{A}^{i}\mathrm{H}_{k-L}P_{k+j}^{gbcg} &=& \mathcal{A}^{i}\mathrm{H}_{k-L}R_{k+j}^{gbcg} - \beta_{k+j-1} \mathcal{A}^{i}\mathrm{H}_{k-L}P_{k+j-1}^{gbcg}. \end{array} $$

Rewriting these recursions with the auxiliary matrices gives

$$ \begin{array}{@{}rcl@{}} \mathcal{A}^{i}S_{k}^{(j)} &=& \mathcal{A}^{i}S_{k}^{(j-1)} - \alpha_{k}^{(j-1)} \mathcal{A}^{i+1}Q_{k}^{(j-1)}, \end{array} $$

(25)

$$ \begin{array}{@{}rcl@{}} \mathcal{A}^{i}Q_{k}^{(j)} &=& \mathcal{A}^{i}S_{k}^{(j)} - \beta_{k}^{(j-1)} \mathcal{A}^{i}Q_{k}^{(j-1)}. \end{array} $$

(26)

Thus, when $\mathcal {A}^{i}S_{k}^{(j-1)}$ for $i=0,1,\dots ,L-j$ and $\mathcal {A}^{i}Q_{k}^{(j-1)}$ for $i=0,1,\dots ,L-j+1$ are given at the beginning of the j th repetition, the new matrices $\mathcal {A}^{i}S_{k}^{(j)}$ and $\mathcal {A}^{i}Q_{k}^{(j)}$ for $i=0,1,\dots ,L-j$ can be generated using (21), (25), and (26). This is summarized by the following scheme:

$$ \begin{array}{cccccc} Q_{k}^{(j-1)} & & & & & Q_{k}^{(j)} \\ & & & & \overset{\text{Eq.}(26)}{\nearrow}_{i=0} & \\ \mathcal{A}Q_{k}^{(j-1)} & S_{k}^{(j-1)} & \xrightarrow[i=0]{\text{Eq.}(25)} & S_{k}^{(j)} & & \mathcal{A}Q_{k}^{(j)} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ \mathcal{A}^{L-j}Q_{k}^{(j-1)} & \mathcal{A}^{L-j-1}S_{k}^{(j-1)} & \xrightarrow[i=L-j-1]{\text{Eq.}(25)} & \mathcal{A}^{L-j-1}S_{k}^{(j)} & & \mathcal{A}^{L-j}Q_{k}^{(j)} \\ & & & & \overset{\text{Eq.}(26)}{\nearrow}_{i=L-j} & \\ \mathcal{A}^{L-j+1}Q_{k}^{(j-1)} & \mathcal{A}^{L-j}S_{k}^{(j-1)} & \xrightarrow[i=L-j]{\text{Eq.}(25)} & \mathcal{A}^{L-j}S_{k}^{(j)} & & \end{array} $$

(27)

Here, the starting matrices $\mathcal {A}^{i}S_{k}^{(0)} = \mathcal {A}^{i}R_{k-L}^{(L)}$ for $i=0,1,\dots ,L-1$ and $\mathcal {A}^{i}Q_{k}^{(0)} = \mathcal {A}^{i}P_{k-L}^{(L)}$ for $i=0,1,\dots ,L$ are generated in the previous cycle. We note that when j increases, the number of matrix updates decreases in the scheme (27), whereas it increases in the scheme (22). Thus, repeating (27) for $j=1,2,\dots ,L$ gives $S_{k}^{(L)}$ and $Q_{k}^{(L)}$.

We now consider the computation of $Z_{k}^{(L)} = \mathrm {G}_{k-1}R_{k+L}^{gbcg}$. From $\mathcal {A}\mathrm {G}_{k-1} = \mathrm {H}_{k-L} - \mathrm {H}_{k}$ and the Gl-BiCG recursion for the residuals, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{A}\mathrm{G}_{k-1}P_{k+j-1}^{gbcg} &=& \mathrm{H}_{k-L}P_{k+j-1}^{gbcg} - \mathrm{H}_{k}P_{k+j-1}^{gbcg},\\ \mathrm{G}_{k-1}R_{k+j}^{gbcg} &=& \mathrm{G}_{k-1}R_{k+j-1}^{gbcg} - \alpha_{k+j-1} \mathcal{A}\mathrm{G}_{k-1}P_{k+j-1}^{gbcg}, \end{array} $$

and rewriting them gives

$$ \begin{array}{@{}rcl@{}} U_{k}^{(j-1)} &=& Q_{k}^{(j-1)} - P_{k}^{(j-1)}, \end{array} $$

(28)

$$ \begin{array}{@{}rcl@{}} Z_{k}^{(j)} &=& Z_{k}^{(j-1)} - \alpha_{k}^{(j-1)} U_{k}^{(j-1)}. \end{array} $$

(29)

Therefore, we generate $Z_{k}^{(L)}$ by repeating (29) with (28) for $j=1,2,\dots ,L$, where $U_{k}^{(0)} := Q_{k}^{(0)} - P_{k}^{(0)}$ and $Z_{k}^{(0)} := \mathrm {G}_{k-1}R_{k}^{gbcg}$. Because it holds that

$$ \begin{array}{@{}rcl@{}} \mathrm{G}_{k+L-1} = \zeta_{k,1}\mathrm{H}_{k} + \zeta_{k,2}\mathcal{A}\mathrm{H}_{k} + {\dots} + \zeta_{k,L}\mathcal{A}^{L-1}\mathrm{H}_{k} + \eta_{k}\mathrm{G}_{k-1}, \end{array} $$

application to $R_{k+L}^{gbcg}$ produces a recursion formula for computing the next starting matrix $Z_{k+L}^{(0)}$ as follows:

$$ \begin{array}{@{}rcl@{}} Z_{k+L}^{(0)} = \zeta_{k,1}R_{k}^{(L)} + \zeta_{k,2}\mathcal{A}R_{k}^{(L)} + {\dots} + \zeta_{k,L}\mathcal{A}^{L-1}R_{k}^{(L)} + \eta_{k}Z_{k}^{(L)}. \end{array} $$

(30)

We now have all the auxiliary matrices required for the cycles.

3.3 Refined Gl-GPBiCGstab(L) algorithm

By combining (15)–(30), we obtain the refined Gl-GPBiCGstab(L) algorithm shown in Algorithm 2. We refer to Section 2 for the notation. Algorithms 1 and 2 generate the same approximations in exact arithmetic, but they use different formulas, especially for computing Y and U, and their numerical behaviors are different in finite-precision arithmetic. The total computational costs of Algorithms 1 and 2 are identical. However, Algorithm 2 has slightly simpler implementation because unlike Algorithm 1, there is no branch in the loops with j and no recursions of Y and U. Moreover, this simplicity is useful when applying preconditioning. As we show in later sections, Algorithm 2 with right preconditioning is numerically more stable and has a lower computational cost than that of Algorithm 1.

We briefly describe the related Gl-LTPMs derived from Gl-GPBiCGstab(L). If the parameter η is set to 0 in Algorithms 1 and 2, the computations of relaxation vanish, and both algorithms reduce to the same Gl-BiCGstab(L) algorithm. To the best of our knowledge, this is also a new Gl-LTPM. However, Algorithms 1 and 2 with L = 1 are mathematically equivalent to Gl-GPBiCG [15, Algorithm 2] (and also Gl-GPBiCG-plus [15, Algorithm 4]), but they use different implementations. As in the case of standard LTPMs, the global versions of CGS, BiCGSTAB, and BiCGStab2 can be derived from the framework of Gl-GPBiCG; therefore, Gl-GPBiCGstab(L) includes them as well. Thus, Algorithms 1 and 2 can be reduced to various global versions and have different implementations. However, because the main purpose of this study is to develop an effective method for Gl-GPBiCGstab(L), we will not discuss the details of the simplified algorithms further or their implementations. We consider several key algorithms and compare their convergence later in numerical experiments.

4 Preconditioning for linear systems with multiple right-hand sides

In this section, we focus on linear systems with multiple right-hand sides

$$ \begin{array}{@{}rcl@{}} AX = B,\quad A\in \mathbb{R}^{n\times n},\quad B \in \mathbb{R}^{n\times s}, \end{array} $$

(31)

which constitute one of the most important types of linear matrix equations [7,8,9,10]. Here, A is a large sparse nonsymmetric and nonsingular matrix. Because the preconditioning technique is useful for enhancing the convergence of Gl-LTPMs when solving (31), we discuss the preconditioned algorithms of Gl-GPBiCGstab(L).

There are several approaches for converting the above system into a well-conditioned system using a preconditioner K ≈ A. Multiplying (31) by K^− 1 from the left side gives the left preconditioned system

$$ \begin{array}{@{}rcl@{}} \hat{A}X = \hat{B},\quad \hat{A} := K^{-1}A,\quad \hat{B}:= K^{-1}B. \end{array} $$

In general, the left preconditioned algorithm is easy to implement by replacing only the multiplications by A with those by K^− 1A, but the residuals generated in the iterations change to $\hat {R}_{k} = K^{-1}R_{k}$; we therefore need special care in setting the stopping rule. Conversely, the right preconditioned system is given in the form

$$ \begin{array}{@{}rcl@{}} \hat{A}\hat{X} = B,\quad \hat{A} := AK^{-1},\quad \hat{X}:= KX. \end{array} $$

(32)

In this case, the residuals coincide with the standard ones, i.e., $\hat {R}_{k} = R_{k}$, and the approximations generated in the iterations change to $\hat {X}_{k} = KX_{k}$. The approximation $X_{k} = K^{-1}\hat {X}_{k}$ needs to be computed accurately once the iterations are terminated. Alternatively, to capture the approximations during the iterations, it is well known that we can update X_k recursively instead of $\hat {X}_{k}$ through some changes in the variables (cf., e.g., [2, 18]). We describe such algorithms of Gl-GPBiCGstab(L) with right preconditioning.

4.1 Naive right preconditioned algorithm

Algorithm 3 is a naive right preconditioned Gl-GPBiCGstab(L) algorithm that is obtained by applying Algorithm 1 to (32). Similar to [2, 18], several variables are changed to update X_k recursively, and the variables with the hat symbol ‘$\hat {~}$ ’ denote that K^− 1 acts on its underlying variables (without hats). For example, we have the following relations in Algorithm 3:

$$ \begin{array}{@{}rcl@{}} \mathrm{R} & = & \left[\mathrm{R}_{0}; \mathrm{R}_{1}; {\dots} ;\mathrm{R}_{j}\right] = \left[R_{k}^{(j)}; \left( AK^{-1}\right)R_{k}^{(j)}; {\dots} ;\left( AK^{-1}\right)^{j}R_{k}^{(j)}\right],\\ \hat{\mathrm{R}} & = & \left[\hat{\mathrm{R}}_{0}; \hat{\mathrm{R}}_{1}; {\dots} ;\hat{\mathrm{R}}_{j}\right] = \left[\!K^{-1}R_{k}^{(j)}; K^{-1}\left( AK^{-1}\right)R_{k}^{(j)}; {\dots} ;K^{-1}\left( AK^{-1}\right)^{j}R_{k}^{(j)}\!\right], \end{array} $$

$$ \begin{array}{@{}rcl@{}} \mathrm{P} & = & [\mathrm{P}_{0}; \mathrm{P}_{1}; {\dots} ;\mathrm{P}_{j-1}] = \left[\left( AK^{-1}\right)P_{k}^{(j)}; \left( AK^{-1}\right)^{2}P_{k}^{(j)}; {\dots} ;\left( AK^{-1}\right)^{j}P_{k}^{(j)}\right],\\ \hat{\mathrm{P}} & = & \left[\hat{\mathrm{P}}_{0}; \hat{\mathrm{P}}_{1}; {\dots} ;\hat{\mathrm{P}}_{j}\right] = \left[\!K^{-1}P_{k}^{(j)}; K^{-1}\left( AK^{-1}\right)P_{k}^{(j)}; {\dots} ;K^{-1}\left( AK^{-1}\right)^{j}P_{k}^{(j)}\right], \end{array} $$

where we note that $P_{k}^{(j)}$ itself is not needed for updating X_k, and we set $\mathrm {P}_{0} := (AK^{-1})P_{k}^{(j)}$. Similar relationships also hold between other variables with and without hats.

We briefly describe some concerns regarding Algorithm 3. Because $\hat {\mathrm {R}}$ is computed by a linear combination of $\hat {\mathrm {R}}_{i}$ for $i=0,1,\dots ,L$ and $\hat {\mathrm {Y}}$ in line 26, it may differ significantly from K^− 1R owing to accumulation of the rounding errors. In our experience, this causes numerical instabilities, such as stagnation and divergence of the residual norms in the late stages of the iterations (see Section 5.1 below). In our preliminary experiments, we confirmed that the convergence can be improved by computing $\hat {\mathrm {R}}$ explicitly with the form K^− 1R. However, this approach is costly because it requires additional multiplications by K^− 1. We note that because $\hat {\mathrm {Y}}$ and $\hat {\mathrm {U}}$ are computed by their coupled recursions with the starting matrix $\hat {\mathrm {Y}} = \hat {\mathrm {R}}^{\prime } - \hat {\mathrm {R}}$ given in line 26, it is difficult to remove this line without using additional multiplications with K^− 1.

4.2 Refined right preconditioned algorithm

We can overcome the above difficulty by exploiting Algorithm 2. Algorithm 4 presents the refined right preconditioned Gl-GPBiCGstab(L) algorithm from applying Algorithm 2 to (32). Here, because $\hat {\mathrm {U}}$ is computed directly in the form $\hat {\mathrm {U}} = \hat {\mathrm {Q}}_{0} - \hat {\mathrm {P}}_{0}$, we do not need $\hat {\mathrm {Y}}$, which enables removal of the linear combination for $\hat {\mathrm {R}}$. Then, unlike Algorithm 3, multiplication by K^− 1 can be used to compute the following variables in the j th repetition.

$$ \begin{array}{@{}rcl@{}} \hat{\mathrm{R}}_{j-1} &=& K^{-1}\mathrm{R}_{j-1} = K^{-1}\left( AK^{-1}\right)^{j-1}R_{k}^{(j)},\\ \hat{\mathrm{P}}_{j} &=& K^{-1}\mathrm{P}_{j-1} = K^{-1}\left( AK^{-1}\right)^{j}P_{k}^{(j)}. \end{array} $$

Hence, $\hat {\mathrm {R}}$ is always a good approximation of K^− 1R in finite-precision arithmetic, and we expect that Algorithm 4 is numerically more stable than Algorithm 3.

Note that because multiplications with K^− 1 are used in different parts between Algorithms 3 and 4, the algorithms with η = 0 result in different formulations of the right preconditioned Gl-BiCGstab(L).

4.3 Computational costs and memory requirements

We compare the computational costs and memory requirements between the related right preconditioned algorithms. We consider the original Gl-GPBiCG and Gl-GPBiCG-plus with right preconditioning given in [15] as well as the naive and refined variants of Gl-BiCGstab(L) and Gl-GPBiCGstab(L) with right preconditioning. Note that the evaluations for the naive and refined Gl-GPBiCG derived from Gl-GPBiCGstab(L) are obtained by substituting L = 1 in the results of Algorithms 3 and 4, respectively.

Below, “MM” denotes a matrix–matrix product with A. Although Gl-GPBiCG and Gl-GPBiCG-plus require two MMs per iteration while the others require 2L MMs per cycle, all methods require one MM per increase by one Krylov dimension on average. A single multiplication by K^− 1 is also required per MM for all the methods. The number of AXPYs and DOTs per MM as well as their memory requirements are summarized in Table 1, where AXPY and DOT correspond to the forms (4) and (5), respectively. Following [6], the form aX or X + Y is counted as 1/2 AXPYs. The computational costs for checking the stopping rule are not included. The memory requirements indicate the amount of n × s matrices that must be stored in the algorithms. The memories for A and B are not counted.

Table 1 Computational cost per MM and memory requirements with right preconditioning

Full size table

From Table 1, we see that Algorithm 4 has an additional advantage that it requires lower computational cost and memory requirements than Algorithm 3. The costs and memories of Algorithm 4 for a modest value of L, e.g., L ≤ 4, are not high compared with those of the Gl-GPBiCG and Gl-BiCGstab(L) algorithms.

5 Numerical experiments

We present numerical experiments to show the effectiveness of the proposed Gl-GPBiCGstab(L) method. Numerical calculations were carried out in double-precision floating-point arithmetic on a PC (Intel Core i7-1185G7 CPU with 32 GB of RAM) equipped with MATLAB 2021a. The right-hand side $B \in \mathbb {R}^{n\times s}$ was given by a random matrix. The initial guess X₀ and initial shadow residual $\tilde {R}_{0}$ were set to O and R₀ (= B), respectively. The least-squares problem (6) was converted to a normal equation (7) and solved using the backslash command in MATLAB. The other computational conditions were set individually for each example, as noted below.

5.1 Comparison of the naive and refined algorithms

We first compare convergence between the naive and refined Gl-GPBiCGstab(L) algorithms with and without preconditioning. Algorithms 1–4 were applied to (31). Following [6], the coefficient matrix A was given by the following Toeplitz matrix:

$$ \begin{array}{@{}rcl@{}} A := \left[\begin{array}{cccc} 2 & 1 & & \\ 0 & 2 & 1 & \\ 0 & 0 & 2 & {\ddots} \\ 0 & 0 & 0 & {\ddots} \\ 1.4 & 0 & 0 & {\ddots} \\ & 1.4 & 0 & {\ddots} \\ & & 1.4 & {\ddots} \\ & & & \ddots \end{array}\right] \in \mathbb{R}^{500 \times 500}. \end{array} $$

The iterations were stopped when the relative residual norms ∥R_k∥_F/∥B∥_F were less than 10^− 14. The parameters s and L were set to s = 1,2,4,8,16,32 and L = 2,4,8, respectively. The maximum number of MMs was set to 2n. ILU(0) [19] was used as the preconditioner in Algorithms 3 and 4.

Figure 1 shows the convergence histories of the relative residual norms of Algorithms 3 and 4, i.e., the naive and refined Gl-GPBiCGstab(L) with right preconditioning, for s = 1 and 32. The plots indicate the number of MMs along the horizontal axis versus $\log _{10}$ of the relative residual F-norm along the vertical axis. Note that “MV” (matrix–vector product) and 2-norm are utilized for s = 1 instead of MM and F-norm, respectively. Table 2 shows the number of MMs required for successful convergence of the four algorithms. The symbol ‡ in Table 2 indicates that the residual norms stagnate or diverge at the late stages of the iterations, as displayed in Fig. 1.

Table 2 Number of MMs for Algorithms 1–4 for the Toeplitz matrix

Full size table

From Fig. 1 and Table 2, we observe the following. The numbers of MMs required for successful convergence of Algorithms 1 and 2 (without preconditioning) are comparable for each value of s and L. In contrast, in the preconditioned case, Algorithm 4 achieves faster convergence than in the non-preconditioned case for all s and L, whereas Algorithm 3 often does not converge. As noted in Section 4.1, $\hat {\mathrm {R}}$ in Algorithm 3 may differ from K^− 1R because of rounding errors, and this difference is expected to increase with increasing L. Actually, Algorithm 3 becomes unstable as parameter L increases for a fixed s. Similar results have been observed for other matrices. Moreover, because stagnation of the residual norms occurs even when s = 1, there is an inherent problem in the original GPBiCGstab(L) with right preconditioning, but the numerical instability seems to grow with increasing s. For the above reasons, we conclude that our refined Gl-GPBiCGstab(L) algorithm is more robust than the naive version with right preconditioning.

5.2 Experiments for linear systems with multiple right-hand sides

Next, we apply several Gl-LTPMs with preconditioning to (31) and compare their convergences. We use Gl-BiCGSTAB, Gl-GPBiCG, Gl-BiCGstab(L), and Gl-GPBiCGstab(L), which can be obtained from Algorithm 4. For Gl-GPBiCG, we also use the Gl-GPBiCG-plus implementation [15, Algorithm 5]. Table 3 displays the abbreviations of the solvers and their corresponding algorithms. Right preconditioning was applied to all methods, and the ILU(0) preconditioner was used.

Table 3 Solvers used in Section 5.2

Full size table

Table 4 Characteristics of test matrices for linear systems with multiple right-hand sides (31)

Full size table

We consider test matrices from the SuiteSparse Matrix Collection [20]. Table 4 shows the dimension (n), number of nonzero entries (nnz), and 2-norm condition number (κ₂) of each matrix. The condition number is displayed only if it is given in the above collection. The iterations are stopped when the relative residual norms ∥R_k∥_F/∥B∥_F are less than 10^− 10. The maximum number of MMs was set to 2n. For Gl-BiCGstab(L) and Gl-GPBiCGstab(L), we use L = 2,4. The number of right-hand sides s is set to 16.

Figure 2 displays the convergence histories of the relative residual norms of Gl-LTPMs with right preconditioning for sme3Db and garon2. We refer to Section 5.1 for the plots of the figures. Table 5 shows the number of MMs required for successful convergence (MMs), computation time (Time), and explicitly computed relative residual norm (referred to as the true relative residual norm) ∥B − AX_k∥_F/∥B∥_F (TRR) at the time of termination of the solvers. Following [6], the smallest MMs and Time are displayed in boldface font in italics, and the second smallest values are displayed only in italics for each problem. “Inf”’ and “NaN” in the table indicate that the iteration values are “infinity” and “not a number”, respectively, when iterating with MATLAB and that the iterations cannot proceed thereafter.

From Fig. 2 and Table 5, we observe the following. The fastest convergence in terms of number of MMs is achieved by the proposed Gl-GPBiCGstab(L) for all problems. In particular, Gl-GPBiCGstab(4) often converges faster than not only the other Gl-LTPMs but also Gl-GPBiCGstab(2). As noted in [6], moderately increasing L seems to be useful for reducing the number of MMs required for successful convergence. These are important metrics of the effectiveness of Gl-GPBiCGstab(L) as a global Krylov subspace method because the number of MMs coincides with the dimensions of the matrix Krylov subspace. In terms of the computational time, Gl-BiCGstab(L) and Gl-GPBiCGstab(L) are comparable to the conventional Gl-GPBiCG-plus on average. Comparing the true relative residual norms, there are cases where Gl-BiCGstab(L) and Gl-GPBiCGstab(L) generate slightly more accurate approximate solutions than Gl-GPBiCG-plus.

Table 5 Numbers of MMs, computation times, and true relative residual norms for Gl-LTPMs with right preconditioning for linear systems with multiple right-hand sides (31)

Full size table

We also remark on the numerical results briefly when the stopping criterion was set to ∥R_k∥_F/∥B∥_F < 10^− 12, because we obtained slightly different observations from the above. With respect to the convergence speed in terms of the number of MMs, Gl-BiCGstab(4) and Gl-GPBiCGstab(4) are superior to other Gl-LTPMs. Moreover, only these two solvers converge for all the test matrices; other conventional Gl-LTPMs fail to converge at the late stages of the iterations for some problems. On the other hand, the attainable accuracy in terms of the true residual norm is limited for all the solvers, that is, a so-called large residual gap (the difference between R_k and B − AX_k) appears in most cases. This problem would be improved by combining Gl-LTPMs with the techniques described in [21].

5.3 Experiments for the Sylvester equation

Herein, we present the numerical results of the Gl-LTPMs for the Sylvester equation (2). The solvers shown in Table 6 are applied to (2) without preconditioning.

Table 6 Solvers used in Section 5.3

Full size table

Note that linear operation by $\mathcal {A}$ to $X \in \mathbb {R}^{n\times s}$ is defined as $\mathcal {A}(X) := AX - XC$ for the given matrices $A\in \mathbb {R}^{n\times n}$ and $C\in \mathbb {R}^{s\times s}$, and “MM” is replaced by “OP” to denote an operation with $\mathcal {A}$. Motivated by [15, 22], we set the test matrices A and C as displayed in Table 7. The matrices are derived from the SuiteSparse Matrix Collection [20], except for the tridiagonal matrix in problems #11 and #12. The tridiagonal matrix C = [c_ij] is defined as c_i+ 1,i := 11, c_ii := − 2, and c_i,i+ 1 := − 9 for each i (otherwise, c_ij := 0). The value s is determined by the size of C, and the maximum number of OPs was set to 2sn. All other computational conditions were similar to those in Section 5.2.

Table 7 Characteristics of the test matrices for the Sylvester equation (2)

Full size table

Figure 3 displays the convergence histories of the relative residual norms of the Gl-LTPMs for problems #9 and #11. The number of OPs and $\log _{10}$ of the relative residual F-norms are plotted on the horizontal and vertical axes, respectively. Table 8 shows the number of OPs required for successful convergence (OPs), computation time (Time), and true relative residual norm $\|B-\mathcal {A}(X_{k})\|_{F}/\|B\|_{F}$ (TRR) at the time of termination. The other notations are as defined in Table 5.

Table 8 Numbers of OPs, computation times, and true relative residual norms for Gl-LTPMs for the Sylvester equation (2)

Full size table

From Fig. 3 and Table 8, we observe the following. Similar to the results in Section 5.2, Gl-GPBiCGstab(L) (especially, with L = 4) often converges faster than the other Gl-LTPMs with respect to the number of OPs. The conventional Gl-GPBiCG-plus is efficient in terms of the computational time. However, Gl-GPBiCG-plus does not converge for problem #11, whereas Gl-BiCGstab(L) and Gl-GPBiCGstab(L) converge. Similar situations have occasionally occurred in our experience, and the proposed methods appear to be more robust than the conventional Gl-LTPMs.

We also note that the numerical results with the stopping criterion ∥R_k∥_F/∥B∥_F < 10^− 12 are similar to the above. However, Gl-BiCGstab(4) and Gl-GPBiCGstab(4) have a slightly larger residual gap for problems #9 and #10. We will not further discuss the residual gap in the present study, but will seek more refined algorithms of Gl-LTPMs based on [21] in the future.

6 Concluding remarks

We propose a novel global Lanczos-type method Gl-GPBiCGstab(L) for solving linear matrix equations. The original GPBiCGstab(L) can be easily extended to its global version naively, but such a method has numerical instabilities when right preconditioning is applied for solving linear systems with multiple right-hand sides. Therefore, we reconstruct Gl-GPBiCGstab(L) using alternative recursion formulas to update the iteration matrices. The resulting refined algorithm with right preconditioning has greater robustness and lower computational cost than the naive version. Moreover, the results of numerical experiments show that the refined Gl-GPBiCGstab(L) converges quickly and stably compared with other Gl-LTPMs for linear systems with multiple right-hand sides and the Sylvester equation.

Based on the above results, there are two main prospects for the proposed approach. First, we can apply the proposed approach to more difficult classes of matrix equations, such as the general coupled matrix equations including the generalized Sylvester equation [11]. Second, the proposed method can be used to discuss block Krylov subspace methods, such as the block BiCGstab(L) [23] and block GPBiCG [24] methods. Because these block-type methods are effective approaches, especially when solving linear systems with multiple right-hand sides, it is natural that a block version of GPBiCGstab(L) be developed, as noted in [6]. We have not elaborated on these points in the present work but expect to discuss them in future works.

References

Sonneveld, P.: CGS, a fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 10(1), 36–52 (1989). https://doi.org/10.1137/0910004
Article MathSciNet MATH Google Scholar
van der Vorst, H.A.: Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 13(2), 631–644 (1992). https://doi.org/10.1137/0913035
Article MathSciNet MATH Google Scholar
Gutknecht, M.H.: Variants of BiCGSTAB for matrices with complex spectrum. SIAM J. Sci. Comput. 14(5), 1020–1033 (1993). https://doi.org/10.1137/0914062
Article MathSciNet MATH Google Scholar
Sleijpen, G.L.G., Fokkema, D.R.: BiCGstab(L) for linear equations involving unsymmetric matrices with complex spectrum. Electron. Trans. Numer. Anal. 1, 11–32 (1993)
MathSciNet MATH Google Scholar
Zhang, S. -L.: GPBi-CG: Generalized product-type methods based on Bi-CG for solving nonsymmetric linear systems. SIAM J. Sci. Comput. 18 (2), 537–551 (1997). https://doi.org/10.1137/S1064827592236313 https://doi.org/10.1137/S1064827592236313
Article MathSciNet MATH Google Scholar
Aihara, K.: GPBi-CGstab(L): A Lanczos-type product method unifying Bi-CGstab(L) and GPBi-CG. Numer. Linear Algebra Appl. 27(3), 2298 (2020). https://doi.org/10.1002/nla.2298
Article MathSciNet MATH Google Scholar
Jbilou, K., Messaoudi, A., Sadok, H.: Global FOM and GMRES algorithms for matrix equations. Appl. Numer. Math. 31(1), 49–63 (1999). https://doi.org/10.1016/S0168-9274(98)00094-4
Article MathSciNet MATH Google Scholar
Heyouni, M.: The global Hessenberg and CMRH methods for linear systems with multiple right-hand sides. Numer. Algorithms 26(4), 317–332 (2001). https://doi.org/10.1023/A:1016603612931
Article MathSciNet MATH Google Scholar
Jbilou, K., Sadok, H., Tinzefte, A.: Oblique projection methods for linear systems with multiple right-hand sides. Electron. Trans. Numer. Anal. 20, 119–138 (2005)
MathSciNet MATH Google Scholar
Heyouni, M., Essai, A.: Matrix Krylov subspace methods for linear systems with multiple right-hand sides. Numer. Algorithms 40(2), 137–156 (2005). https://doi.org/10.1007/s11075-005-1526-2
Article MathSciNet MATH Google Scholar
Beik, F.P.A., Salkuyeh, D.K.: On the global Krylov subspace methods for solving general coupled matrix equations. Comput. Math. Appl. 62(12), 4605–4613 (2011). https://doi.org/10.1016/j.camwa.2011.10.043 https://doi.org/10.1016/j.camwa.2011.10.043
Article MathSciNet MATH Google Scholar
Ebadi, G., Alipour, N., Vuik, C.: Deflated and augmented global Krylov subspace methods for the matrix equations. Appl. Numer. Math. 99, 137–150 (2016). https://doi.org/10.1016/j.apnum.2015.08.010 https://doi.org/10.1016/j.apnum.2015.08.010
Article MathSciNet MATH Google Scholar
Zhang, J., Dai, H.: Global CGS algorithm for linear systems with multiple right-hand sides (in Chinese). Numer. Math. A: J. Chin. Univ. 30(4), 390–399 (2008)
MATH Google Scholar
Zhang, J., Dai, H., Zhao, J.: Generalized global conjugate gradient squared algorithm. Appl. Math. Comput. 216(12), 3694–3706 (2010). https://doi.org/10.1016/j.amc.2010.05.026
Article MathSciNet MATH Google Scholar
Zhang, J., Dai, H.: Global GPBiCG method for complex non-Hermitian linear systems with multiple right-hand sides. Comp. Appl. Math. 35 (1), 171–185 (2016). https://doi.org/10.1007/s40314-014-0188-x https://doi.org/10.1007/s40314-014-0188-x
Article MathSciNet MATH Google Scholar
Wang, Y., Gu, G.: Global quasi-minimal residual method for the Sylvester equations. J. Shanghai Univ. 11(1), 52–57 (2007). https://doi.org/10.1007/s11741-007-0109-y
Article MathSciNet MATH Google Scholar
Zhang, J., Dai, H., Zhao, J.: A new family of global methods for linear systems with multiple right-hand sides. J. Comput. Appl. Math. 236(6), 1562–1575 (2011). https://doi.org/10.1016/j.cam.2011.09.020 https://doi.org/10.1016/j.cam.2011.09.020
Article MathSciNet MATH Google Scholar
Aihara, K., Abe, K., Ishiwata, E.: Preconditioned IDRStab algorithms for solving nonsymmetric linear systems. IAENG Int. J. Appl. Math. 45 (3), 164–174 (2015)
MathSciNet Google Scholar
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)
Book MATH Google Scholar
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011). https://doi.org/10.1145/2049662.2049663
MathSciNet MATH Google Scholar
Aihara, K., Imakura, A., Morikuni, K.: Cross-interactive residual smoothing for global and block Lanczos-type solvers for linear systems with multiple right-hand sides. SIAM J. Matrix Anal. Appl. 43(3), 1308–1330 (2022). https://doi.org/10.1137/21M1436774
Article MathSciNet MATH Google Scholar
El Guennouni, A., Jbilou, K., Riquet, A.J.: Block Krylov subspace methods for solving large Sylvester equations. Numer. Algorithms 29(1–3), 75–96 (2002)
Article MathSciNet MATH Google Scholar
Saito, S., Tadano, H., Imakura, A.: Development of the block BiCGSTAB(ℓ) method for solving linear systems with multiple right hand sides. JSIAM Lett. 6, 65–68 (2014). https://doi.org/10.14495/jsiaml.6.65 https://doi.org/10.14495/jsiaml.6.65
Article MathSciNet MATH Google Scholar
Taherian, A., Toutounian, F.: Block GPBi-CG method for solving nonsymmetric linear systems with multiple right-hand sides and its convergence analysis. Numer. Algorithms 88(4), 1831–1850 (2021). https://doi.org/10.1007/s11075-021-01097-7
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the reviewers for their careful reading and constructive comments.

Funding

This study was supported by grant number JP21K11925 from the Grants-in-Aid for Scientific Research Program (KAKENHI) of the Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

MAEDA CORPORATION, 2-10-2 Fujimi, Chiyoda-ku, 102-8151, Tokyo, Japan
Itsuki Horiuchi
Department of Computer Science, Tokyo City University, 1-28-1 Tamazutsumi, Setagaya-ku, 158-8557, Tokyo, Japan
Kensuke Aihara
Department of Applied Mathematics, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, 162-8601, Tokyo, Japan
Toshio Suzuki & Emiko Ishiwata

Authors

Itsuki Horiuchi
View author publications
You can also search for this author in PubMed Google Scholar
Kensuke Aihara
View author publications
You can also search for this author in PubMed Google Scholar
Toshio Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Emiko Ishiwata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Itsuki Horiuchi or Kensuke Aihara.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Data availability

All data generated or analyzed during this study are available in this published article or in the references.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Toshio Suzuki and Emiko Ishiwata contributed equally to this work.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Horiuchi, I., Aihara, K., Suzuki, T. et al. Global GPBiCGstab(L) method for solving linear matrix equations. Numer Algor 93, 295–319 (2023). https://doi.org/10.1007/s11075-022-01415-7

Download citation

Received: 17 May 2022
Accepted: 12 September 2022
Published: 18 November 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11075-022-01415-7

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Global GPBiCGstab(L) method for solving linear matrix equations

Abstract

Similar content being viewed by others

An efficient GPU version of the preconditioned GMRES method

Implicitly restarted global Krylov subspace methods for matrix equations \(AXB = C\)

GPU Implementation of Krylov Solvers for Block-Tridiagonal Eigenvalue Problems

1 Introduction

2 Global GPBiCGstab(L) algorithm

2.1 Extension to global algorithms

2.2 Simple derivation

Remark 1

3 Alternative implementation of Gl-GPBiCGstab(L)

3.1 Basic concepts

3.2 Recursion formulas for updating the iteration matrices

3.2.1 Multiplication of the L th degree polynomial

3.2.2 Relaxation

3.3 Refined Gl-GPBiCGstab(L) algorithm

4 Preconditioning for linear systems with multiple right-hand sides

4.1 Naive right preconditioned algorithm

4.2 Refined right preconditioned algorithm

4.3 Computational costs and memory requirements

5 Numerical experiments

5.1 Comparison of the naive and refined algorithms

5.2 Experiments for linear systems with multiple right-hand sides

5.3 Experiments for the Sylvester equation

6 Concluding remarks

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Data availability

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation