Global GPBiCGstab(L) method for solving linear matrix equations

Global Krylov subspace methods are effective iterative solvers for large linear matrix equations. Several Lanczos-type product methods (LTPMs) for solving standard linear systems of equations have been extended to their global versions. However, the GPBiCGstab(L) method, which unifies two well-known LTPMs (i.e., BiCGstab(L) and GPBiCG methods), has been developed recently, and it has been shown that this novel method has superior convergence when compared to the conventional LTPMs. In the present study, we therefore extend the GPBiCGstab(L) method to its global version. Herein, we present not only a naive extension of the original GPBiCGstab(L) algorithm but also its alternative implementation. This variant enables the preconditioning technique to be applied stably and efficiently. Numerical experiments were performed, and the results demonstrate the effectiveness of the proposed global GPBiCGstab(L) method.


Introduction
In many fields of scientific computing, it is important to solve linear matrix equations of the form where A : R n×s → R n×s is a linear operator, and X, B ∈ R n×s . In the present work, we focus mainly on the case where the representation of A is a large sparse nonsymmetric matrix and s n. The most basic form of (1) is the standard linear system of equations Ax = b, where A ∈ R n×n and b ∈ R n . For this problem, short-recurrence Krylov subspace methods, such as the conjugate gradient squared (CGS) method [1], bi-conjugate gradient stabilized (BiCGSTAB) method [2], and BiCGStab2 method [3], are typical iterative solvers that are widely used. The CGS, BiCGSTAB, and BiCGStab2 methods are known to belong to the so-called Lanczos-type product methods (LTPMs) (also called hybrid BiCG methods), wherein the residuals are defined by the product of the stabilizing polynomials and BiCG residuals. The BiCGstab(L) method [4] and GPBiCG method [5] also belong to LTPMs and can be viewed as two different generalizations of the abovementioned typical methods. Moreover, the GPBiCGstab(L) method [6], which unifies BiCGstab(L) and GPBiCG, has been developed recently. Numerical experiments have been performed to demonstrate that the GPBiCGstab(L) method has superior convergence when compared to the conventional LTPMs, especially for linear systems with complex spectra.
Because A is defined on the finite-dimensional space, there always exists a standard linear system of equations that is mathematically equivalent to (1), to which LTPMs can be applied theoretically. However, it is often difficult to construct the representation matrix of A, i.e., the coefficient matrix of the converted linear system. For example, it is well known that the Sylvester equation A(X) = AX − XC = B, A ∈ R n×n , C ∈ R s×s (2) can be converted to a standard linear systemÃx =b withÃ := I s ⊗ A − C ⊗ I n ∈ R ns×ns andb := vec(B) ∈ R ns , but it is costly to constructÃ explicitly when ns is large. Here, ⊗ denotes the Kronecker product, I k is the identity matrix of order k, and vec : R n×s → R ns is the vectorization operator, i.e., vec(V ) = [v 1 , v 2 , . . . , v s ] ∈ R ns for V = [v 1 , v 2 , . . . , v s ] ∈ R n×s . More generally, there are also cases where the representation matrix of A cannot be expressed explicitly. Therefore, application of iterative solvers to the matrix equations without explicitly using the representation matrix is desirable. The global Krylov subspace methods (cf., e.g., [7][8][9][10][11][12]) are known to be such matrix-free approaches and generate approximate solutions using the matrix Krylov subspace K k (A, R 0 ) := k−1 i=0 c i A i (R 0 ) | c i ∈ R , where R 0 := B − A(X 0 ) is the initial residual with an initial guess X 0 , and A i indicates that A acts i times. Thus, the global methods can be implemented without using the representation matrix if the linear transformation A(V ) is available for an arbitrary V ∈ R n×s . Several LTPMs have been extended thus far to their global versions. For example, the Gl-CGS-type methods [13,14], Gl-BiCGSTAB method [9], and Gl-GPBiCG method [15], have been proposed, where "Gl-" represents the global version of the method. Moreover, relating to the methods based on the global Lanczos process [9], the global quasi-minimal residual (Gl-QMR) method [16] and the global bi-conjugate residual (Gl-BiCR) type methods [17] have also been developed. However, to the best of our knowledge, the global versions of the classical BiCGstab(L) and recent GPBiCGstab(L) methods have not been studied previously. We therefore develop a novel global GPBiCGstab(L) method (including Gl-BiCGstab(L)) for further convergence improvement. Then, based on the aforementioned works, we restrict the target matrix equation to a large sparse nonsymmetric linear system with multiple right-hand sides AX = B and discuss the preconditioned algorithms of Gl-GPBiCGstab(L). It is comparatively easy to extend the original GPBiCGstab(L) algorithm [6, Algorithm 3] to its global version naively. However, when applying the so-called right preconditioning to this naive Gl-GPBiCGstab(L), several concerns regarding numerical stability and computational costs are noted. To overcome these problems, we also derive a refined variant of Gl-GPBiCGstab(L), which is mathematically equivalent to the naive version but uses alternative recursion formulas to update the iteration matrices. This refined variant enables right preconditioning to be applied with greater robustness and fewer computational costs. Through numerical experiments on model problems (i.e., linear system AX = B and the Sylvester equation AX − XC = B), we show that the refined Gl-GPBiCGstab(L) (with or without preconditioning) is more effective than other typical Gl-LTPMs.
The remainder of this paper is organized as follows. In Section 2, we describe a naive Gl-GPBiCGstab(L) algorithm for (1) by extending the original GPBiCGstab(L). In Section 3, we derive a refined variant of Gl-GPBiCGstab(L) by partly using different recursion formulas. In Section 4, we present the preconditioned algorithms of the naive and refined Gl-GPBiCGstab(L) for the representative matrix equation AX = B as well as note their advantages and disadvantages. In Section 5, numerical experiments are presented to demonstrate that the proposed methods have superior convergence when compared to the conventional ones. Finally, some concluding remarks and a note on the future directions are given in Section 6.

Global GPBiCGstab(L) algorithm
This section presents the extension of the GPBiCGstab(L) algorithm proposed in [6] to its global version for solving linear matrix equations of the form (1).

Extension to global algorithms
We first describe a generic method for constructing global algorithms. When an algorithm of the LTPMs for the standard linear system of equations is given, its global version can be derived directly through the following simple steps (see also [9,10]): 1. Applying the given LTPM algorithm toÃx =b that is equivalent to (1), wherẽ A ∈ R ns×ns is a representation matrix of A andb := vec(B). 2. Rewriting the operations in the algorithm using the following equivalent reformulations. For x, y ∈ R ns and a ∈ R, where X, Y ∈ R n×s are the matrices satisfying x = vec(X) and y = vec(Y ).
Note that the 2-norm x 2 is also converted to the associated Frobenius norm (F-norm) X F := √ X, X F .
From (3), the resulting global algorithm can be implemented without using the representation matrixÃ explicitly. This is the main advantage of the global methods in actual computations. Moreover, when the transformation by A is based mainly on matrix-matrix products, we can also implement it with high-performance level-3 BLAS routines.

Simple derivation
Herein, we describe an algorithm of the Gl-GPBiCGstab(L) method obtained by the simple transformations noted above. For details regarding the GPBiCGstab(L) method itself, we refer to [6] and the next section. By applying the original GPBiCGstab(L) algorithm [6, Algorithm 3] toÃx =b and rewriting the iteration process with (3)-(4), the naive extension to the Gl-GPBiCGstab(L) algorithm is obtained as shown in Algorithm 1. The notations in the algorithm are based on MATLAB conventions. For example, the variables R and P consist of n × s submatrices R i and P i for i = 0, 1, . . . , j, respectively; that is, R = [R 0 ; R 1 ; . . . ; R j ] and P = [P 0 ; P 1 ; . . . ; P j ], respectively, where [V 0 ; V 1 ; . . . ; V j ] := [V 0 , V 1 , . . . , V j ] , and the submatrix V i theoretically corresponds to A i (V 0 ). Remark 1 When using MATLAB, there are several approaches for storing the variables comprising n × s submatrices, such as R and P above. One method is to construct R = [R 0 ; R 1 ; . . . ; R j ] directly as an n(j + 1) × s long rectangular matrix. Another approach is to use a multidimensional array. Because the matrix R can be viewed as a three-dimensional tensor, it can be stored in an n × s × (j + 1) three-dimensional array. In this case, the ith submatrix R i is accessed as R(:, :, i) in MATLAB. Alternatively, we can use a structure array. For example, the matrix R can be defined using the MATLAB command 'struct' as follows: where the first argument 'R' is the field name and second argument is a cell array containing the submatrices. In this case, the ith submatrix R i is accessed as R struct(i).R. Because the implementation using the structure array is faster than the other approaches in our experience, we use this in the actual computations. Line 19 of Algorithm 1 requires solving a minimization problem for L + 1 scalar variables ζ 1 , . . . , ζ L and η, where R i ∈ R n×s for i = 0, 1, . . . , L and Y ∈ R n×s are given in the iteration process. Note that the integer L is a predetermined constant in Gl-GPBiCGstab(L). The above problem is equivalent to the standard least-squares problem and can be converted to the normal equation As is well known, we can use a direct method such as Cholesky factorization to solve (7), and a more stable approach such as QR factorization is useful for solving the least-squares problem (6) directly when M is ill-conditioned. Note that the first iteration of GPBiCGstab(L) corresponds to that of BiCGstab(L) and that η needs to be 0. Hence, the first iteration (i.e., one BiCGstab(L) iteration) Algorithm 1 Gl-GPBiCGstab(L) for (1) (naive) is displayed separately from the main GPBiCGstab(L) iterations in the original algorithm [6, Algorithm 3], whereas we display all iterations in the same form using a branch at line 19 in Algorithm 1.

Alternative implementation of Gl-GPBiCGstab(L)
In this section, we present the derivation of another Gl-GPBiCGstab(L) algorithm that is mathematically equivalent to Algorithm 1 but exploits alternative recursion formulas for updating the iteration matrices. The resulting algorithm uses a slightly simpler expression and is also useful for designing a robust right preconditioned Gl-GPBiCGstab(L) with fewer additional costs, as shown in the next section.

Basic concepts
Similar to standard LTPMs, the global versions of the LTPMs can be characterized by generating the residuals R k defined by the form where H k (λ) is the so-called stabilizing polynomial of degree k satisfying H k (0) = 1, and R gbcg k is the kth Gl-BiCG residual that is generated by the coupled two-term recurrences with recurrence coefficients α k , β k ∈ R and direction matrix P gbcg k ∈ R n×s of Gl-BiCG [9]. Note that H k (A) is a linear operator of the polynomial form, i.e., H k (A) = I − ω 1 A − · · · − ω k A k with ω j ∈ R and an identity operator I, and we define its operation for X ∈ R n×s as Specific choices of the stabilizing polynomials give specific Gl-LTPMs.
The original GPBiCGstab(L) [6] is a novel framework of the standard LTPMs using the following comprehensive stabilizing polynomials.
where H 0 (λ) := 1, η 0 := 0, and k is a multiple of L. The L + 1 coefficients ζ k,1 , . . . , ζ k,L , η k ∈ R are independent parameters. The stabilizing polynomials are reduced to those of BiCGstab(L) if η k is always set to 0 and can be equivalent to those of GPBiCG when L = 1. Thus, we can also derive CGS, BiCGSTAB, and BiCGStab2 from the above framework. In the practical computation of GPBiCGstab(L), the L + 1 parameters are determined by locally minimizing the residual norms in each iteration. In the following subsections, we show derivation of the Gl-GPBiCGstab(L) algorithm that generates residuals (8) with stabilizing polynomials (11) and (12). The derivation is somewhat different from the one in [6].

Recursion formulas for updating the iteration matrices
Herein, the operators H k (A) and G k−1 (A) are denoted as H k and G k−1 , respectively, and we omit the parentheses for simplicity; specifically, , and P k = H k P gbcg k be the approximation, residual, and direction matrix of Gl-GPBiCGstab(L), respectively. These matrices are updated to X k+L , R k+L = H k+L R gbcg k+L , and P k+L = H k+L P gbcg k+L , respectively, and the processes are counted as one cycle below. We now introduce the following auxiliary matrices for j = 1, 2, . . . , L.
From (11), (13), and (14), the updated residual and direction matrices can be expanded as and the associated approximation can be expressed as where we note that Y k holds. When the auxiliary matrices in (15)-(17) are obtained, we can determine the L + 1 parameters ζ k,1 , . . . , ζ k,L , η k to minimize the residual norm R k+L F , as described in Section 2.2.
As mentioned in [6], the stabilizing polynomial H k+L (λ) consists of two parts: multiplication of an Lth degree polynomial with the previous H k (λ) and relaxation term −η k λG k−1 (λ). We therefore describe the generation of auxiliary matrices for each part separately.

Multiplication of the L th degree polynomial
We describe an iteration process to generate the auxiliary matrices This process corresponds to the well-known L-times-BiCG steps in BiCGstab(L), and the description below is based on that in [6,Section 2.2].
Assuming that the approximation X (0) k := X k , the corresponding residual R (0) k := R k , and direction matrix P (0) k := P k are given at the beginning of the cycle. Then, the j th repetition (j = 1, 2, . . . , L) of the BiCG steps is described as follows.
From the Gl-BiCG recursions (9) and (10), applying operators A i H k to R gbcg k+j for i = 0, 1, . . . , j − 1 and to P gbcg k+j for i = 0, 1, . . . , j, we have where it is noted that H k AV = AH k V holds for an arbitrary V ∈ R n×s . Rewriting the above recursions with the auxiliary matrices (13) gives where α (j ) k := α k+j and β (j ) can be expressed as Following [6,Eqs. (14) and (15)] and using the iteration matrices, we compute the Gl-BiCG coefficients α (j ) k and β (j ) k as follows: whereR 0 is the initial shadow residual. Thus, when X . . , j − 1 are given at the beginning of the j th repetition; the new matrices X for i = 0, 1, . . . , j can be generated using (18)- (21). This updating scheme is summarized as follows: k , respectively. By repeating scheme (22) for j = 1, 2, . . . , L, we obtain the auxiliary matrices X This process is identical to that used in Algorithm 1.

Relaxation
Next, we describe the iterations to generate the auxiliary matrices Y  (15)- (17). This process corresponds to [6, Section 3.1], but we exploit different recursion formulas, and the resulting algorithm will thus be different from Algorithm 1.
For now, we consider the generation of Y (L) We introduce additional auxiliary matrices which can be rewritten as follows: Because R Rewriting these recursions with the auxiliary matrices gives  (26). This is summarized by the following scheme: and rewriting them gives Therefore, we generate Z . Because it holds that application to R gbcg k+L produces a recursion formula for computing the next starting matrix Z (0) k+L as follows: We now have all the auxiliary matrices required for the cycles.

Refined Gl-GPBiCGstab(L ) algorithm
By combining (15)-(30), we obtain the refined Gl-GPBiCGstab(L) algorithm shown in Algorithm 2. We refer to Section 2 for the notation. Algorithms 1 and 2 generate the same approximations in exact arithmetic, but they use different formulas, especially for computing Y and U, and their numerical behaviors are different in finite-precision arithmetic. The total computational costs of Algorithms 1 and 2 are identical. However, Algorithm 2 has slightly simpler implementation because unlike Algorithm 1, there is no branch in the loops with j and no recursions of Y and U. Moreover, this simplicity is useful when applying preconditioning. As we show in later sections, Algorithm 2 with right preconditioning is numerically more stable and has a lower computational cost than that of Algorithm 1.

Algorithm 2 Gl-GPBiCGstab(L) for (1) (refined)
We briefly describe the related Gl-LTPMs derived from Gl-GPBiCGstab(L). If the parameter η is set to 0 in Algorithms 1 and 2, the computations of relaxation vanish, and both algorithms reduce to the same Gl-BiCGstab(L) algorithm. To the best of our knowledge, this is also a new Gl-LTPM. However, Algorithms 1 and 2 with L = 1 are mathematically equivalent to Gl-GPBiCG [15, Algorithm 2] (and also Gl-GPBiCGplus [15, Algorithm 4]), but they use different implementations. As in the case of standard LTPMs, the global versions of CGS, BiCGSTAB, and BiCGStab2 can be derived from the framework of Gl-GPBiCG; therefore, Gl-GPBiCGstab(L) includes them as well. Thus, Algorithms 1 and 2 can be reduced to various global versions and have different implementations. However, because the main purpose of this study is to develop an effective method for Gl-GPBiCGstab(L), we will not discuss the details of the simplified algorithms further or their implementations. We consider several key algorithms and compare their convergence later in numerical experiments.

Preconditioning for linear systems with multiple right-hand sides
In this section, we focus on linear systems with multiple right-hand sides which constitute one of the most important types of linear matrix equations [7][8][9][10].
Here, A is a large sparse nonsymmetric and nonsingular matrix. Because the preconditioning technique is useful for enhancing the convergence of Gl-LTPMs when solving (31), we discuss the preconditioned algorithms of Gl-GPBiCGstab(L). There are several approaches for converting the above system into a wellconditioned system using a preconditioner K ≈ A. Multiplying (31) by K −1 from the left side gives the left preconditioned system AX =B,Â := K −1 A,B := K −1 B.
In general, the left preconditioned algorithm is easy to implement by replacing only the multiplications by A with those by K −1 A, but the residuals generated in the iterations change toR k = K −1 R k ; we therefore need special care in setting the stopping rule. Conversely, the right preconditioned system is given in the form In this case, the residuals coincide with the standard ones, i.e.,R k = R k , and the approximations generated in the iterations change toX k = KX k . The approximation X k = K −1X k needs to be computed accurately once the iterations are terminated. Alternatively, to capture the approximations during the iterations, it is well known that we can update X k recursively instead ofX k through some changes in the variables (cf., e.g., [2,18]). We describe such algorithms of Gl-GPBiCGstab(L) with right preconditioning.

Naive right preconditioned algorithm
Algorithm 3 is a naive right preconditioned Gl-GPBiCGstab(L) algorithm that is obtained by applying Algorithm 1 to (32). Similar to [2,18], several variables are changed to update X k recursively, and the variables with the hat symbol 'ˆ' denote that K −1 acts on its underlying variables (without hats). For example, we have the following relations in Algorithm 3: where we note that P (j ) k itself is not needed for updating X k , and we set P 0 := (AK −1 )P (j ) k . Similar relationships also hold between other variables with and without hats.
We briefly describe some concerns regarding Algorithm 3. BecauseR is computed by a linear combination ofR i for i = 0, 1, . . . , L andŶ in line 26, it may differ significantly from K −1 R owing to accumulation of the rounding errors. In our experience, this causes numerical instabilities, such as stagnation and divergence of the residual norms in the late stages of the iterations (see Section 5.1 below). In our preliminary experiments, we confirmed that the convergence can be improved by computingR explicitly with the form K −1 R. However, this approach is costly because it requires additional multiplications by K −1 . We note that becauseŶ andÛ are computed by their coupled recursions with the starting matrixŶ =R −R given in line 26, it is difficult to remove this line without using additional multiplications with K −1 .

Refined right preconditioned algorithm
We can overcome the above difficulty by exploiting Algorithm 2. Algorithm 4 presents the refined right preconditioned Gl-GPBiCGstab(L) algorithm from applying Algorithm 2 to (32). Here, becauseÛ is computed directly in the formÛ = Q 0 −P 0 , we do not needŶ, which enables removal of the linear combination for R. Then, unlike Algorithm 3, multiplication by K −1 can be used to compute the following variables in the j th repetition.
Hence,R is always a good approximation of K −1 R in finite-precision arithmetic, and we expect that Algorithm 4 is numerically more stable than Algorithm 3. Note that because multiplications with K −1 are used in different parts between Algorithms 3 and 4, the algorithms with η = 0 result in different formulations of the right preconditioned Gl-BiCGstab(L).

Computational costs and memory requirements
We compare the computational costs and memory requirements between the related right preconditioned algorithms. We consider the original Gl-GPBiCG and  The computational costs for checking the stopping rule are not included. The memory requirements indicate the amount of n × s matrices that must be stored in the algorithms. The memories for A and B are not counted. From Table 1, we see that Algorithm 4 has an additional advantage that it requires lower computational cost and memory requirements than Algorithm 3. The costs and memories of Algorithm 4 for a modest value of L, e.g., L ≤ 4, are not high compared with those of the Gl-GPBiCG and Gl-BiCGstab(L) algorithms.

Numerical experiments
We present numerical experiments to show the effectiveness of the proposed Gl-GPBiCGstab(L) method. Numerical calculations were carried out in doubleprecision floating-point arithmetic on a PC (Intel Core i7-1185G7 CPU with 32 GB of RAM) equipped with MATLAB 2021a. The right-hand side B ∈ R n×s was given by a random matrix. The initial guess X 0 and initial shadow residualR 0 were set to O and R 0 (= B), respectively. The least-squares problem (6) was converted to a normal equation (7) and solved using the backslash command in MATLAB. The other computational conditions were set individually for each example, as noted below.

Comparison of the naive and refined algorithms
We first compare convergence between the naive and refined Gl-GPBiCGstab(L) algorithms with and without preconditioning. Algorithms 1-4 were applied to (31).
Following [6], the coefficient matrix A was given by the following Toeplitz matrix: . . .
The iterations were stopped when the relative residual norms R k F / B F were less than 10 −14 . The parameters s and L were set to s = 1, 2, 4, 8, 16, 32 and L = 2, 4, 8, respectively. The maximum number of MMs was set to 2n. ILU(0) [19] was used as the preconditioner in Algorithms 3 and 4. Figure 1 shows the convergence histories of the relative residual norms of Algorithms 3 and 4, i.e., the naive and refined Gl-GPBiCGstab(L) with right preconditioning, for s = 1 and 32. The plots indicate the number of MMs along the horizontal axis versus log 10 of the relative residual F-norm along the vertical axis. Note that "MV" (matrix-vector product) and 2-norm are utilized for s = 1 instead of MM and F-norm, respectively. Table 2 shows the number of MMs required for successful convergence of the four algorithms. The symbol † in Table 2 indicates that the residual norms stagnate or diverge at the late stages of the iterations, as displayed in Fig. 1.
From Fig. 1 and Table 2, we observe the following. The numbers of MMs required for successful convergence of Algorithms 1 and 2 (without preconditioning) are comparable for each value of s and L. In contrast, in the preconditioned case, Algorithm 4 achieves faster convergence than in the non-preconditioned case for all s and L, whereas Algorithm 3 often does not converge. As noted in Section 4.1,R in Algorithm 3 may differ from K −1 R because of rounding errors, and this difference is expected to increase with increasing L. Actually, Algorithm 3 becomes unstable as parameter L increases for a fixed s. Similar results have been observed for other matrices. Moreover, because stagnation of the residual norms occurs even when s = 1, there is an inherent problem in the original GPBiCGstab(L) with right preconditioning, but the numerical instability seems to grow with increasing s. For the above reasons, we conclude that our refined Gl-GPBiCGstab(L) algorithm is more robust than the naive version with right preconditioning.

Experiments for linear systems with multiple right-hand sides
Next, we apply several Gl-LTPMs with preconditioning to (31) and compare their convergences. We use Gl-BiCGSTAB, Gl-GPBiCG, Gl-BiCGstab(L), and Gl-GPBiCGstab(L), which can be obtained from Algorithm 4. For Gl-GPBiCG, we also use the Gl-GPBiCG-plus implementation [15,Algorithm 5]. Table 3 displays the abbreviations of the solvers and their corresponding algorithms. Right preconditioning was applied to all methods, and the ILU(0) preconditioner was used.
We consider test matrices from the SuiteSparse Matrix Collection [20]. Table 4 shows the dimension (n), number of nonzero entries (nnz), and 2-norm condition number (κ 2 ) of each matrix. The condition number is displayed only if it is given in the above collection. The iterations are stopped when the relative residual norms R k F / B F are less than 10 −10 . The maximum number of MMs was set to 2n. For Gl-BiCGstab(L) and Gl-GPBiCGstab(L), we use L = 2, 4. The number of righthand sides s is set to 16. Figure 2 displays the convergence histories of the relative residual norms of Gl-LTPMs with right preconditioning for sme3Db and garon2. We refer to Section 5.1 for the plots of the figures. Table 5 shows the number of MMs required for successful convergence (MMs), computation time (Time), and explicitly computed relative residual norm (referred to as the true relative residual norm) B − AX k F / B F (TRR) at the time of termination of the solvers. Following [6], the smallest MMs  and Time are displayed in boldface font in italics, and the second smallest values are displayed only in italics for each problem. "Inf"' and "NaN" in the table indicate that the iteration values are "infinity" and "not a number", respectively, when iterating with MATLAB and that the iterations cannot proceed thereafter. From Fig. 2 and Table 5, we observe the following. The fastest convergence in terms of number of MMs is achieved by the proposed Gl-GPBiCGstab(L) for all problems. In particular, Gl-GPBiCGstab(4) often converges faster than not only the other Gl-LTPMs but also Gl-GPBiCGstab (2). As noted in [6], moderately increasing L seems to be useful for reducing the number of MMs required for successful convergence. These are important metrics of the effectiveness of Gl-GPBiCGstab(L) as a global Krylov subspace method because the number of MMs coincides with the dimensions of the matrix Krylov subspace. In terms of the computational time, Gl-BiCGstab(L) and Gl-GPBiCGstab(L) are comparable to the conventional Gl-GPBiCG-plus on average. Comparing the true relative residual norms, there are cases where Gl-BiCGstab(L) and Gl-GPBiCGstab(L) generate slightly more accurate approximate solutions than Gl-GPBiCG-plus.
We also remark on the numerical results briefly when the stopping criterion was set to R k F / B F < 10 −12 , because we obtained slightly different observations from the above. With respect to the convergence speed in terms of the number of MMs, Gl-BiCGstab(4) and Gl-GPBiCGstab(4) are superior to other Gl-LTPMs. Moreover,

Experiments for the Sylvester equation
Herein, we present the numerical results of the Gl-LTPMs for the Sylvester equation (2). The solvers shown in Table 6 are applied to (2) without preconditioning. Note that linear operation by A to X ∈ R n×s is defined as A(X) := AX − XC for the given matrices A ∈ R n×n and C ∈ R s×s , and "MM" is replaced by "OP" to denote an operation with A. Motivated by [15,22], we set the test matrices A and C as displayed in Table 7. The matrices are derived from the SuiteSparse Matrix Collection [20], except for the tridiagonal matrix in problems #11 and #12. The tridiagonal matrix C = [c ij ] is defined as c i+1,i := 11, c ii := −2, and c i,i+1 := −9 for each i (otherwise, c ij := 0). The value s is determined by the size of C, and the maximum number of OPs was set to 2sn. All other computational conditions were similar to those in Section 5.2.  Figure 3 displays the convergence histories of the relative residual norms of the Gl-LTPMs for problems #9 and #11. The number of OPs and log 10 of the relative residual F-norms are plotted on the horizontal and vertical axes, respectively. Table 8 shows the number of OPs required for successful convergence (OPs), computation time (Time), and true relative residual norm B − A(X k ) F / B F (TRR) at the time of termination. The other notations are as defined in Table 5.
From Fig. 3 and Table 8, we observe the following. Similar to the results in Section 5.2, Gl-GPBiCGstab(L) (especially, with L = 4) often converges faster than the other Gl-LTPMs with respect to the number of OPs. The conventional Gl-GPBiCG-plus is efficient in terms of the computational time. However, Gl-GPBiCG-plus does not converge for problem #11, whereas Gl-BiCGstab(L) and Gl-GPBiCGstab(L) converge. Similar situations have occasionally occurred in our Table 8 Numbers of OPs, computation times, and true relative residual norms for Gl-LTPMs for the Sylvester equation (2) -stab(L) G P -s t a b ( L) experience, and the proposed methods appear to be more robust than the conventional Gl-LTPMs. We also note that the numerical results with the stopping criterion R k F / B F < 10 −12 are similar to the above. However, Gl-BiCGstab(4) and Gl-GPBiCGstab(4) have a slightly larger residual gap for problems #9 and #10. We will not further discuss the residual gap in the present study, but will seek more refined algorithms of Gl-LTPMs based on [21] in the future.

Concluding remarks
We propose a novel global Lanczos-type method Gl-GPBiCGstab(L) for solving linear matrix equations. The original GPBiCGstab(L) can be easily extended to its global version naively, but such a method has numerical instabilities when right preconditioning is applied for solving linear systems with multiple right-hand sides. Therefore, we reconstruct Gl-GPBiCGstab(L) using alternative recursion formulas to update the iteration matrices. The resulting refined algorithm with right preconditioning has greater robustness and lower computational cost than the naive version. Moreover, the results of numerical experiments show that the refined Gl-GPBiCGstab(L) converges quickly and stably compared with other Gl-LTPMs for linear systems with multiple right-hand sides and the Sylvester equation.
Based on the above results, there are two main prospects for the proposed approach. First, we can apply the proposed approach to more difficult classes of matrix equations, such as the general coupled matrix equations including the generalized Sylvester equation [11]. Second, the proposed method can be used to discuss block Krylov subspace methods, such as the block BiCGstab(L) [23] and block GPBiCG [24] methods. Because these block-type methods are effective approaches, especially when solving linear systems with multiple right-hand sides, it is natural that a block version of GPBiCGstab(L) be developed, as noted in [6]. We have not elaborated on these points in the present work but expect to discuss them in future works.
included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/.