We study the performance of GPLS and the resulting iterative methods on various standard test problems with randomly generated dense system matrices, different types of randomly generated band matrices, and real-world applications.
For comparability, we use the same settings for our GP approach in all experiments. The individuals in the first generation (\(i = 0\)) are generated by the ramped-half-and-half method. As variation operators, we use standard subtree-crossover with a crossover probability of \(p_c = 0.8\) and standard subtree-mutation with a mutation probability of \(p_m = 0.05\). For selection, we use tournament selection of size 3. The population size is set to 1, 500 and we stop a GP run after 30 generations.
The weights for the objective function were determined based on some manual parameter tuning. We set the weight for the spectral radius to \(w_s = 0.8997\), the weight for the non-zero values to \(w_z = 0.1\), and method’s complexity weight (nodes in the parse tree) to \(w_c = 0.0003\). The largest value is assigned to \(w_s\) because we require \(\rho (G) < 1\) to guarantee convergence of the iterative method. To favor small solutions, we assign a very low value to \(w_c\).
Performance of GPLS for random system matrices
The main application of the iterative methods found by GPLS is the solution of a linear system discretized from PDEs. However, for a first analysis of the GP performance we use randomly generated system matrices as input. We study the change of the three components of the objective function—spectral radius, non-zero values, and the number of nodes in the parse tree—for the best individual over time in a GP run on randomly generated system matrices of increasing size. To generate a random system matrix, for a given matrix size we fill the elements with equally distributed integer values ranging from \(-10\) to 10.
Figure 3 shows the median spectral radius \({\tilde{\rho }}\) of 100 GP runs over the number of generations. As input we use a randomly generated \(100\times 100\) dense system matrix. We find that the average median of the spectral radius decreases from about 11 (at the beginning of the runs) to about 3e−14 at the end of the runs.
For the same randomly generated \(100\times 100\) system matrix used as input, Figs. 4 and 5 show the median non-zero entries \({\tilde{z}}\) and median number of nodes \({\tilde{c}}\), respectively. We find that the number of non-zero entries decrease over the run with a strong reduction between generations 10 and 15. The median number of nodes increases slightly over a run starting with about seven tree nodes and increasing to 13 nodes. Due to the low weight \(w_c\) for parsimony pressure, the number of nodes slightly increases while still allowing GPLS to improve on the spectral radius and sparsity. This is reflected by the choice of the weights in the objective function, where spectral radius and sparsity are more important than the size of the resulting iterative numerical method (\(w_s, w_z>w_c\)).
Table 1 Median and interquartile range of spectral radius, number of non-zero entries, and number of nodes in the parse tree (method’s complexity) in first (\(i=0\)) and last (\(i=29\)) generation. We present results for different problem sizes Table 1 extends the analysis and presents results for the spectral radius, the number of non-zero entries, and the number of nodes in the parse tree for random problems of size \(10\times 10\) to \(100\times 100\). For each problem size, we perform 100 runs with a random system matrix. We show the median as well as the interquartile range (IQR; in parentheses) of the best solution in the initial (\(i = 0\)) and last generation (\(i = 29\)). We use the IQR as a proxy for the variance of the results. It is defined as the difference between the 75th and the 25th percentile. Best median results of a run are printed in bold. All differences between the first and last generations were tested for significance with a Wilcoxon rank-sum test (\(p < 0.001\)).
We find that GPLS reliably finds solutions with low spectral radius (median spectral radius \({\tilde{\rho }} < 1.0\) for all studied problem instances). For some problem sizes, we observe a quite large IQR because the search space is complex and GPLS does not always find a successful solution (where \(\rho < 1.0\)). However, this is not a problem for the practical use of GPLS, since we can simply check the found solution for its suitability (calculate the spectral radius) and, if necessary, restart the GPLS run. In addition to the spectral radius, the GP approach also improves the sparsity of the found iteration matrices for all problem sizes. Only the number of nodes increase during a GP run. This is expected as the weight \(w_c\) is chosen very low to work only as slight bloat control, as a median size of 15 nodes is acceptable (comparable to the Gauss–Seidel and the Jacobi methods).
Generalization of iteration matrices found by GPLS
A direct comparison of GPLS and classical stationary iterative methods is difficult as GPLS’ main effort comes from the search for a suitable term that builds an iteration matrix from a system matrix. This effort is high, especially if the considered linear systems are very large. In contrast, classical stationary iterative methods like Gauss–Seidel do not require any search process but are directly applicable.
A relevant question is whether GPLS finds iteration matrices that are general and can (analogously to classical stationary iterative methods) be applied to a wide range of different problems. When searching for such generalizable expressions, we can utilize the fact that linear systems discretized from PDEs often have similar structures and characteristics independently of their degree of detail and size. We can take advantage of this and evolve iteration matrices with GPLS for small linear systems and subsequently use the found solutions on larger systems with a similar structure, based on the assumption that the found solutions for the small systems also yield satisfactory results for the larger systems.
We study the generalization of the found solutions with a set of diagonal \(n\times n\) band matrices used as system matrices, which are also relevant for real-world problems (see tridiagonal Toeplitz matrices [8]). A band matrix is a sparse matrix with a main diagonal and additional diagonals on both sides of the main diagonal containing non-zero values [7]. We use diagonal matrices in 1D and 2D with additional diagonals on the upper side and on the lower side of the main diagonal [9]. The structure of these matrices is independent of the node size n because, for each matrix, we use consistent values for the diagonals.
In our experiments, we randomly generate 100 system matrices of low size (\(n=5\) and \(n=9\)). For each of the problems, GPLS determines an iteration matrix. In a next step, for each of the 100 system matrices (for each considered problem type) we generate appropriate system matrices with larger n. The larger system matrices are also diagonal matrices. We apply the solution that has been found by GPLS for the low value of n to the larger system matrices and evaluate the corresponding spectral radii \(\rho \) of the iteration matrices. Our hope is that the solutions found for small n are general and also work well for larger n.
Figure 6 shows box-plots of the spectral radius \(\rho \) over the problem size n of the \(n\times n\) system matrices. Each box-plot contains the spectral radius of 100 iteration matrices. The dashed line shows a spectral radius of 1.0. In this experiment, GPLS was only applied to diagonally dominant and diagonal system matrices in 1D of size \(5\times 5\). Thus, only the spectral radii of the iteration matrices in the first box-plot are a direct result of GPLS. And for this first box-plot, we considered only found iteration matrices with a spectral radius \(\rho < 1.0\). For the larger system matrices, we did not apply GPLS anew but re-used the iterative methods evolved for the small system matrices (\(n=5\)).
As expected, the spectral radii become larger with increasing n. Nevertheless, the median spectral radius is always lower than 1.0 for the analyzed matrix sizes. For \(n=5\), GPLS finds solutions with a median spectral radius \(\rho = 9.23e-6\). Applying these solutions to a problem with \(n=1000\) still yields a median spectral radius \({\tilde{\rho }} < 1.0\).
Figure 7 shows the same analysis, but this time we start from \(9\times 9\) diagonal system matrices in 2D. Again, the median spectral radius is always lower than 1.0. However, with an increasing problem size n, we see an increase of the number of outliers with a spectral radius \(\rho > 1.0\).
In summary, on the analyzed problems, the iterative methods found by GPLS for small system matrices are generalizable and can be re-used for larger n, if the basic structure of the problem stays the same.
GPLS overcomes limitations of existing stationary iterative methods
The well-known Gauss–Seidel method converges if the system matrix A is either symmetric positive definite or strictly diagonally dominant. If this is not the case, there is no guarantee that the Gauss–Seidel method will find an appropriate iteration matrix G [7, 21]. To address such cases is a good challenge for GPLS because GP can search the whole space of potential methods and maybe come up with solutions for problems where the Gauss–Seidel method fails.
Consequently, we generate typical random system matrices where the Gauss–Seidel method cannot find an appropriate iteration matrix and study the properties of iteration matrices generated by GPLS. We use heat maps for the visual inspection of system and iteration matrices, which are graphical representations of the numerical elements in a matrix. Heat maps make it easier to see structural characteristics like diagonals and the sparsity of a matrix, as each entry/value is represented by a specific color.
Figure 8 shows a randomly generated dense system matrix of size \(25\times 25\). For this example, we filled the matrix with equally distributed integer values ranging from \(-10\) to 10. The Gauss–Seidel method only finds an iteration matrix with a spectral radius of around 28,000. Hence, the Gauss–Seidel method cannot be used. In contrast, GPLS finds a solution for this example described by the term \((((A D)+((U+D)+(L+D)))-(((D^{-1}+U)+(((L+D)^{-1}-U)-(D^{-1}+(L+D)^{-1})))+((A D)+((U+D)+(L+D)))))\). Figure 9 shows the resulting iteration matrix. The matrix has a spectral radius of 2.22e−16 as well as high sparsity. The few non-zero values are concentrated in the upper triangular area because the found term is dominated by the terminals \(L+D\) and U.
A second example is a randomly generated tridiagonal band matrix of size \(25\times 25\) as system matrix. For each diagonal, we used an equally distributed random integer value from the interval \([-10, 10]\). Thus, the band matrix is not diagonally dominant. Figure 10 shows the heat map for this system matrix. The spectral radius of the iteration matrix found by the Gauss–Seidel method is 6.0. Thus, the Gauss–Seidel method is not usable in this case.
In contrast, GPLS again finds an expression that is able to solve the problem. The term found by GPLS is \(U + D^{-1}\). The resulting iteration matrix (see Fig. 11) has a spectral radius of 0.2 and is similar to the system matrix but has one diagonal less.
Convergence analysis of iteration matrices found by GPLS
This section studies the convergence speed of the iterative numerical methods found by GPLS for two types of dominant band matrices. We compare the solutions found by GPLS with those of the Jacobi, Gauss–Seidel, and SOR methods. For this purpose, we consider linear systems of the form
$$\begin{aligned} A x = 0. \end{aligned}$$
(5.1)
Sparse diagonally dominant band matrices
In a first set of experiments, we study the convergence behavior for linear equations that arise from the discretization of PDEs. In particular, we consider Poisson’s equation in 1D, 2D, and 3D with the following boundary condition (Dirichlet):
$$\begin{aligned} u(x) = 0. \end{aligned}$$
We transform the PDEs into a system of linear equations (compare Sect. 3) using FDM, which leads to a system of the form of Eq. 5.1. In all three cases (1D, 2D, and 3D), the resulting system matrices are sparse diagonally dominant band matrices, for which, e.g., Jacobi and Gauss–Seidel are guaranteed to converge. GPLS evolved the following terms to calculate the iteration matrix G:
Table 2 Spectral radii of the iteration matrices for the discretized Poisson equations Table 2 compares the spectral radii of the iteration matrices of the Jacobi, Gauss–Seidel, SOR and GPLS method for all three cases of the discretized Poisson equation. For SOR, we set the relaxation parameter \(\omega = 0.8\) [we tested values from the interval (0,2) with step size 0.1]. As expected \(\rho \) is lowest for iteration matrices found by GPLS. The spectral radii of iteration matrices constructed by the Jacobi or Gauss–Seidel method are only slightly lower than one.
To study the convergence behavior of the resulting iterative methods more closely, we employ the iteration scheme
$$\begin{aligned} x^{(i+1)} = G x^{(i)}, \end{aligned}$$
where \(x^{(i)}\) is the current solution and G the iteration matrix. As initial guess \(x^{(0)}\) for the solution of the system we use
$$\begin{aligned} x_{j}^{(0)} = 1 \quad \forall j = 1, \dots n, \end{aligned}$$
with n as the number of discretization points. As we know that the solution of the system defined in Eq. 5.1 is 0, the absolute error \(\epsilon \) is equal to the current approximation \(x^{(i)}\) during each iteration i:
$$\begin{aligned} \epsilon = x^{(i)} - 0 = x^{(i)}. \end{aligned}$$
Figures 12, 13, and 14 plot the \(L^2\)-norm of the error \(\epsilon \) over the number of iterations for the Jacobi, Gauss–Seidel, SOR, and GPLS-evolved iteration methods for the solution of Poisson’s equation in 1D, 2D, and 3D, respectively. As expected, all three iteration schemes converge to the solution of the system, although—as reflected in the lower spectral radius of its iteration matrix—the iteration schemes evolved by GPLS converge much faster than Gauss–Seidel and Jacobi. For example, in the 1D and 2D case convergence can be achieved with GPLS in only a few iterations. In the 3D case there is an increase of the error in the first few iterations followed by a fast decrease of the error. In all three instances, the convergence speed of SOR is similar to that of GPLS. However, the convergence speed strongly depends on the choice of the right relaxation parameter \(\omega \).
Being surprised by the extremely fast convergence of the iterative numerical methods evolved by GPLS (especially for the 1D case of Poisson’s equation), we study whether GPLS has found as iteration matrix G the inverse of the system matrix A or a matrix that is very similar. If this is the case, the fast convergence behavior would be inevitable. Consequently, Fig. 15 shows the heat map of the product of A and the iteration matrix G found by GPLS. If the product would be the identity matrix I, then GPLS would have found \(A^{-1}\). However, the figure shows that \(A G \ne I\), because we have four diagonals in the upper triangular part of the matrix and no main diagonal.
Non-diagonally dominant band matrices
As a second and more challenging test case, we consider the class of non-diagonally dominant band matrices. For this class of matrices, e.g., the Jacobi and Gauss–Seidel methods are not guaranteed to converge in the general case. Thus, it is uncertain if a stationary iterative method that converges to the solution of an arbitrary linear system with a non-diagonally dominant system matrix can be evolved. To generate a suitable instance of this class of matrices, we randomly generate a tridiagonal matrix of the form
$$\begin{aligned} A_{1} = \begin{bmatrix} a &\quad b \\ c &\quad a &\quad b \\ &\quad c &\quad \ddots &\quad \ddots \\ &\quad &\quad \ddots &\quad \ddots &\quad b \\ &\quad &\quad &\quad c & \quad a \end{bmatrix}, \end{aligned}$$
that satisfies \(|a |< |b |+ |c |\). As a test case, we randomly choose the values \(a = 4\), \(b = 8\), and \(c = 2\). We assume that this matrix corresponds to a one-dimensional problem. Thus, we can generate higher-dimensional problems of the same instance by computing the Kronecker sum of the matrix with itself:
$$\begin{aligned} A_{2}= {} A_{1} \oplus A_{1},\\ A_{3}= {} A_{2} \oplus A_{1}. \end{aligned}$$
The resulting system matrices are also non-diagonally dominant.
Table 3 shows the spectral radii of the resulting Jacobi, Gauss–Seidel, and SOR iteration matrices, as well as of the iteration matrices evolved by GPLS. For SOR, we set the relaxation parameter \(\omega = 0.6\) [again, we tested values from the interval (0,2) with step size 0.1]. The spectral radii of the iteration matrices generated by Jacobi and Gauss–Seidel are all larger than one. Thus, convergence cannot be guaranteed. In contrast, SOR and GPLS can evolve iteration matrices with a spectral radius smaller than one. For the band matrices in 1D, 2D, and 3D, GPLS evolved the following terms to calculate the iteration matrix G:
Table 3 Spectral radii of the iteration matrices for a non-diagonally dominant band matrix Analogous to the Poisson case, we study the convergence of the resulting iterative methods by solving the system defined in Eq. 5.1, using the same initial guess \(x^{(0)} = 1\). Again, we measure the \(L^2\) norm of the error \(\epsilon \) compared to the exact solution 0 during each iteration.
Figures 16, 17, and 18 plot the error over the number of iterations. As expected, the Jacobi and Gauss–Seidel methods do not converge in any of the three cases, but the error increases further during each iteration. In contrast, GPLS was able to evolve an iteration matrix that leads to convergence in all three cases. The convergence speed is on a level similar to the SOR method (in all three studied instances).
If we compare the convergence behavior of GPLS of non-diagonally dominant band matrices to the Poisson case (see Figs. 12, 13, and 14), we find that the evolved schemes on average require more iterations and that convergence is only achieved after an initial stagnation or even an increase of the error. Nevertheless, the evolved iteration matrices always lead to low errors in less than 100 iterations. The initial error increase can be explained by the fact that within a stationary iterative method, not all error components can be eliminated simultaneously. Consequently, the reduction of certain error components can cause an increase in the remaining ones and, thus, lead to the observed overall growth of the approximation error. However, after this initial error increase, the total error quickly decreases (with GPLS and SOR), which means that after particular error components are eliminated within the first few iterations, the remaining ones are efficiently reducible.