1 Introduction

The numerical solution of time-dependent partial differential equations (PDEs) often leads to sequences of linear systems of the form

$$\begin{aligned} A(t_{i}) {\varvec{x}}(t_{i}) = {\varvec{b}}(t_{i}) \quad \quad i=0,1,2 \cdots , \end{aligned}$$
(1)

where \(t_{0}< t_1<t_2 < \cdots \) is a discretization of time t, and both the system matrix \(A(t_{i}) \in {\mathbb {R}}^{n \times n}\) and the right-hand side \({\varvec{b}}(t_{i}) \in {\mathbb {R}}^n\) depend on time. Typically, the systems (1) are available only consecutively. Such sequences of linear systems can arise in a number of applications, including implicit time stepping schemes for the solution of PDEs or iterative solutions of non-linear equations and optimization problems. A relevant example is given by time-dependent PDEs solved in presence of algebraic constraints. In this case, even when an explicit time stepping method is used to evolve the nonlinear PDE, the discretization of the algebraic constraints leads to linear systems that need to be solved at every (sub-)timestep. This is the case of the simulation of turbulent plasma dynamics [10], where a linear constraint (Maxwell equations) is imposed upon the plasma dynamics described by a set of non linear fluid or kinetics equations. The linear systems resulting from the discretized algebraic constraints may feature millions of degrees of freedom, hence their solution is often computationally very expensive.

One usually expects that the linear system (1) changes slowly in subsequent time steps. This work is focused on exploiting this property to accelerate iterative solvers, such as CG [17] for symmetric positive definite matrices and GMRES [24] for general matrices. An obvious way to do so is to supply the iterative solver for the timestep \(t_{i+1}\) with the solution of (1) at timestep \(t_{i}\), as initial guess. As a more advanced technique, in the context of Krylov subspace methods, subspace recycling methods [26] such as GCROT [7] and GMRES-DR [21] have been proposed. Such methods have been developed in the case of a single linear system, to enrich the information when restarting the iterative solver. The idea behind is often to accelerate the convergence by suppressing parts of the spectrum of the matrix, including the corresponding approximate invariant subspace in the Krylov minimization subspace. GCROT and GMRES-DR have then been adapted to sequences of linear systems in [22], recycling selected subspaces from one system to the next. For this class of methods to be efficient, it is necessary that the sequence of matrices undergoes local changes only, that is, the difference \(A(t_{i+1})- A(t_{i})\) is computationally cheap to apply. For example, one can expect this difference matrix to be sparse when time dependence is restricted to a small part of the computational domain, e.g., through time-dependent boundary conditions. We refer to [26] for a more complete survey of subspace recycling methods and their applications. In [5], subspace recycling was combined with goal-oriented POD (Proper Orthogonal Decomposition) in order to limit the size of the subspaces involved in an augmented CG approach. Simplifications occur when the matrices \(A(t_i)\) are actually a fixed matrix A shifted by different scalar multiples of the identity matrix, because Krylov subspaces are invariant under such shifts. In the context of subspace recycling, this property has been exploited in, e.g., [27], and in [25] it is shown how a smoothly varying right-hand side can be incorporated.

When \(A(t_i)\) and \({{\textbf {b}}}(t_i)\) in (1) are samples of smooth matrix/vector-valued functions, one expects that the subspace of the previously computed solutions contains a very good approximation of the current one. This can be exploited to construct a better initial guess, either explicitly through (polynomial) extrapolation, or implicitly through projection techniques. Examples of the extrapolation approach include polynomial POD extrapolation [14], weighted group extrapolation methods [30] and a stabilized, least-squares polynomial extrapolation method [1], for the case that only the right-hand side evolves in time. For the same setting, projection techniques have been introduced by Fischer [11]. Following this first work, several approaches have been developed to extract an initial guess from the solution of a reduced-order model, constructed from projecting the problem to a low-dimensional subspace spanned by previous solutions. In [28], such an approach is applied to fully implicit discretizations of nonlinear evolution problems, while [20] applies the same idea to the so called IMPES scheme used for simulating two-phase flows through heterogeneous porous media.

In this paper, we develop a new projection technique for solving sequences of linear systems that combines projection with randomized linear algebra techniques, leading to considerably reduced cost. Moreover, a novel convergence analysis of the algorithm is carried out to show its efficiency. This is also proved numerically by applying the algorithm to the numerical simulation of turbulent plasma in the boundary of a fusion device.

The rest of this paper is organized as follows. In Sect. 2, we first discuss general subspace acceleration techniques based on solving a projected linear system and then explain how randomized techniques can be used to speed up existing approaches. In Sect. 3, a convergence analysis of these subspace acceleration techniques is presented. In Sect. 4 we first discuss numerical results for a test case to demonstrate the improvements that can be attained by the new algorithm in a somewhat idealistic setting. In Sect. 5 our algorithm is applied to large-scale turbulent simulation of plasma in a tokamak, showing a significant reduction of computational time.

2 Algorithm

The algorithm proposed in this work for accelerating the solution of the sequence of linear systems (1) uses randomized techniques to lower the cost of a POD-based strategy, such as the one proposed in [20]. Recall that we aim at solving the linear systems \(A(t_{i}) {\varvec{x}}(t_{i}) = {\varvec{b}}(t_{i})\) consecutively for \(i = 0,1,\cdots \). We make no assumption on the symmetry of \(A(t_{i}) \in {\mathbb {R}}^{n \times n}\) and thus GMRES is an appropriate choice for solving each linear system. Supposing that, at the ith timestep, M previous solutions are available, we arrange them into the history matrix

$$\begin{aligned} X = \left[ {\varvec{x}}(t_{i-M})\,|\,\cdots \,|\,{\varvec{x}}(t_{i-1})\right] \in {\mathbb {R}}^{n\times M}. \end{aligned}$$

where the notation on the right-hand side indicates the concatenation of columns. Instead of using the complete history, which may contain redundant information, one usually selects a subspace \({\mathcal {S}} \subset \textrm{span}(X)\) of lower dimension \(m\le M\). Then, the initial guess for the ith linear system is obtained from choosing the element of \({\mathcal {S}}\) that minimizes the residual:

$$\begin{aligned} \min \limits _{s \in {\mathcal {S}}} \Vert A(t_{i}){\varvec{s}} - {\varvec{b}}(t_{i})\Vert _2 = \min \limits _{{\varvec{z}} \in {\mathbb {R}}^{m}}\Vert A(t_{i})Q {\varvec{z}} - {\varvec{b}}(t_{i})\Vert _2, \end{aligned}$$

where the columns of \(Q \in {\mathbb {R}}^{n\times m}\) contain an orthonormal basis of \({\mathcal {S}}\). We use \(\Vert \cdot \Vert _2\) to denote the Euclidean norm for vectors and the spectral norm for matrices. The described approach is summarized in Algorithm 1, which is a template that needs to be completed by an appropriate choice of the subspace \({\mathcal {S}}\), in Sects. 2.1 and 2.2.

Algorithm 1
figure a

Solution of ith linear system \( A(t_{i}) \varvec{x}(t_{i}) = \varvec{b}(t_{i})\)

If the complete history is used, \({\mathcal {S}} = \textrm{span}(X)\), then computing Q via a QR decomposition [13], as required in Step 2, costs \({\mathcal {O}}(M^2 n)\) operations. In addition, setting up the linear least-squares problem in Step 3 of Algorithm 1 requires M (sparse) matrix–vector products in order to compute \(A(t_{i}) Q\). The standard approach for solving the linear least-squares problem proceeds through the QR decomposition of that matrix and costs another \({\mathcal {O}}(M^3 + M^2 n)\) operations. This strong dependence of the cost on M effectively forces a rather small choice of M, neglecting relevant components of the solutions that could be contained in older solutions only. In the following, we discuss two strategies to overcome this problem.

2.1 Proper Orthogonal Decomposition

An existing strategy [20] to arrive at a low-dimensional subspace \({\mathcal {S}} \subset \textrm{span}(X)\) uses a POD approach [19] and computes the orthonormal basis Q for \({\mathcal {S}}\) through a truncated SVD (Singular Value Decomposition) of X; see Algorithm 2. Note that only the first m left singular vectors \(\varvec{\Psi }_{1}, \cdots , \varvec{\Psi }_{m}\) need to be computed in Step 2.

Algorithm 2
figure b

Method 1 (POD) to generate basis \(Q = Q_{{\textsf{POD}}}\)

Thanks to basic properties of the SVD, the basis \(Q_{{\textsf{POD}}}\) enjoys the following optimality property [29]:

$$\begin{aligned} \Vert (I - Q_{{\textsf{POD}}} Q_{{\textsf{POD}}}^{T}) X \Vert _{F}^{2} = \sum _{k=m+1}^{M} \sigma _{k}^{2} = \min _{\begin{array}{c} Q \in {\mathbb {R}}^{n\times n} \\ Q^T Q = I_m \end{array}} \Vert (I - Q Q^{T}) X \Vert _{F}^{2}, \end{aligned}$$
(2)

where \(\Vert \cdot \Vert _F\) denotes the Frobenius norm and \(\sigma _1 \ge \sigma _2 \ge \cdots \ge \sigma _M \ge 0\) are the singular values of X. In words, the choice \(Q_{{\textsf{POD}}}\) minimizes the error of orthogonally projecting the columns of X onto an m–dimensional subspace. The relation to the singular values of X established in (2) also allows one to choose m adaptively, e.g., by choosing m such that most of the variability in the history matrix X is captured.

At every time step, the history matrix X gets modified by removing its first column and appending a new last column. The most straightforward implementation of Algorithm 2 would compute the SVD needed of Step 2 from scratch at every time step, leading to a complexity of \({\mathcal {O}}(nM^2)\) operations. In principle, SVD updating techniques, such as the ones presented in [4] and [6], could be used to reduce this complexity to \(O(mn + m^3)\) for every time step. However, in the context of our application, there is no need to update a complete SVD (in particular, the right singular vectors are not needed) and the randomized techniques discussed in the next section seem to be preferable.

2.2 Randomized Range Finder

In this section, an alternative to the POD method (Algorithm 1) for generating the low-dimensional subspace \({\mathcal {S}} \subset \textrm{span}(X)\) is presented, relying on randomized techniques. The randomized SVD from [16] applied to the \(n\times M\) history matrix X proceeds as follows. First, we draw an \(M\times m\) Gaussian random matrix Z, that is, the entries of Z are independent and identically distributed (i.i.d) standard normal variables. Then the so-called sketch

$$\begin{aligned} \Omega = X Z = \left[ {\varvec{x}}(t_{i-M})\,|\, \cdots \,|\, {\varvec{x}}(t_{i-1}) \right] Z \end{aligned}$$
(3)

is computed, followed by a reduced QR decomposition \(\Omega = QR\). This only involves the \(n\times m\) matrix \(\Omega \), which for \(m\ll M\) is a significant advantage compared to Algorithm 2, which requires the SVD of an \(n\times M\) matrix. The described procedure is contained in lines 2–4 and 11 of Algorithm 3 below.

According to [16, Theorem 10.5], the expected value (with respect to Z) of the error returned by the randomized SVD satisfies

$$\begin{aligned} {\mathbb {E}} \Vert (I- QQ^{T})X \Vert _{F} \le \Big (1+ \frac{r}{p-1}\Big )^{1/2} \Big (\sum _{k > r} \sigma _{k}^{2}\Big )^{1/2}, \end{aligned}$$
(4)

where we partition \(m = r + p\) for a small oversampling parameter \(p\ge 2\). Also, the tail bound from [16, Theorem 10.7] implies that it is highly unlikely that the error is much larger than the upper bound (4). Comparing (4) with the error (2), we see that the randomized method is only a factor \(\sqrt{2}\) worse than the optimal basis of roughly half the size produced by POD. As we also see in our experiments of Sect. 4, this bound is quite pessimistic and usually the randomized SVD performs nearly as good as POD using bases of the same size.

Algorithm 3
figure c

Method 2 (Randomized Range Finder) to generate basis Q

Instead of performing the randomized SVD from scratch in every timestep, one can easily exploit the fact that only a small part of the history matrix is modified. To see this, let us consider the sketch from the previous timestep:

$$\begin{aligned} \Omega _{{\textsf{prev}}} = \left[ {\varvec{x}}(t_{i-M-1})\,|\, \cdots \,|\, {\varvec{x}}(t_{i-2}) \right] {Z^{{\textsf{prev}}}}. \end{aligned}$$
(5)

Comparing with (3), we see that the sketch \(\Omega \) of the current timestep is obtained by removing the contribution from the solution \({\varvec{x}}(t_{i-M-1})\) and adding the contribution of \({\varvec{x}}(t_{i-1})\). The removal is accomplished in line 6 of Algorithm 3 by a rank-one update:

$$\begin{aligned} \Omega _{{\textsf{prev}}} - {\varvec{x}}(t_{i-M-1}) {{\varvec{z}}^{{\textsf{prev}}}_{1}}= \left[ {\varvec{0}}\,|\,{\varvec{x}}(t_{i-M})\,|\, \cdots \,|\, {\varvec{x}}(t_{i-2}) \right] {Z^{{\textsf{prev}}}}. \end{aligned}$$

By a cyclic permutation, we can move the zero column to the last column, \(\left[ {\varvec{x}}(t_{i-M})\,|\, \cdots \,|\, {\varvec{x}}(t_{i-2}) \,|\, {\varvec{0}}\right] \), updating Z as in line 7 of Algorithm 3. Finally, the contribution of the latest solution is incorporated by adding the rank-one matrix \({\varvec{x}}(t_{i-1}) {{\varvec{z}}_M}^T\), where \({\varvec{z}}_M\) \( \in {\mathbb {R}}^m\) is a newly generated Gaussian random vector that is stored in the last row of Z. Under the (idealistic) assumption that all solutions are exactly computed (and hence deterministic), the described progressive updating procedure is mathematically equivalent to computing the randomized SVD from scratch. In particular, the error bound (4) continues to hold.

Lines 6–9 of Algorithm 3 require \({\mathcal {O}}(nm)\) operations. When using standard updating procedures for QR decomposition [13], line 11 has the same complexity. This compares favorably with the \({\mathcal {O}}(nM^2)\) operations needed by Algorithm 2 per timestep.

When performing the progressive update of \(\Omega \) over many timesteps, one can encounter numerical issues due to numerical cancellation in the repeated subtraction and addition of contributions to the sketch matrix. To avoid this, the progressive update is carried out only for a fixed number of timesteps, after which a new random matrix Z is periodically generated and \(\Omega \) is computed from scratch.

3 Convergence Analysis

We start our convergence analysis of the algorithms from the preceding section by considering analytical properties of the history matrix \(X = [{\varvec{x}}(t_{i-M})\,|\,\cdots \,|\,{\varvec{x}}(t_{i-1})]\). After reparametrization, we may assume without loss of generality that each of the past timesteps is contained in the interval \([-1,1]\):

$$\begin{aligned} -1 = t_{i-M}< \cdots < t_{i-1} = 1. \end{aligned}$$

For notational convenience, we define

$$\begin{aligned} X \equiv X({\varvec{t}}):= \left[ {\varvec{x}}(t_{i-M})\,|\, \cdots \,|\,{\varvec{x}}(t_{i-1}) \right] , \quad {\varvec{t}}= \left[ t_{i-M}, \cdots ,t_{i-1} \right] , \end{aligned}$$
(6)

where \({\varvec{x}}(t)\) satisfies the (parametrized) linear system

$$\begin{aligned} A(t) {\varvec{x}}(t) = {\varvec{b}}(t), \quad A:[-1,1] \rightarrow {\mathbb {R}}^{n \times n}, \quad {\varvec{b}}: [-1,1] \rightarrow {\mathbb {R}}^{n}, \end{aligned}$$
(7)

that is, each entry of A and \({\varvec{b}}\) is a scalar function on the interval \([-1,1]\). Indeed, for the convergence analysis, we assume that each linear system of the sequence in (1) is obtained by sampling the parametrized system in (7) in \(t_i \in [-1,1]\). In many practical applications, like the one described in Sect. 5, the time dependence in (7) arises from time-dependent coefficients in the underlying PDEs. Frequently, this dependence is real analytic, which prompts us to make the following smoothness assumption on A, \({\varvec{b}}\).

Assumption 1

Consider the open Bernstein ellipse \(E_{\rho } \subset {\mathbb {C}}\) for \(\rho > 1\), that is, the open ellipse with foci \(\pm 1\) and semi-minor/-major axes summing up to \(\rho \). We assume that \(A: \left[ -1, 1 \right] \rightarrow {\mathbb {C}}^{n \times n}\) and \( {\varvec{b}}: \left[ -1, 1 \right] \rightarrow {\mathbb {C}}^{n}\) admit extensions that are analytic on \(E_{\rho }\) and continuous on \(\bar{E_{\rho }}\) (the closed Bernstein ellipse), such that A(t) is invertible for all \(t\in \bar{E_{\rho }}\). In particular, this implies that \({\varvec{x}}(t) = A^{-1}(t) {\varvec{b}}(t)\) is analytic on \(E_{\rho }\) and \(\kappa _\rho := \max _{t \in \partial E_{\rho }} \Vert {\varvec{x}}(t) \Vert _2\) is finite.

3.1 Compressibility of the Solution Time History

The effectiveness of POD-based algorithms relies on the compressibility of the solution history, that is, the columns of X can be well approximated by an m–dimensional subspace with \(m \ll M\). According to (2), this is equivalent to stating that the singular values of X decrease rapidly to zero. Indeed, this property is implied by Assumption 1 as shown by the following result, which was stated in [18] in the context of low-rank methods for solving parametrized linear systems.

Theorem 2

([18, Theorem 2.4]) Under Assumption 1, the kth largest singular value \(\sigma _k\) of the history matrix \(X({\varvec{t}})\) from (6) satisfies

$$\begin{aligned} \sigma _{k} \le \frac{2 \rho \kappa _\rho \sqrt{M}}{1 - \rho ^{-1}} \rho ^{-k}. \end{aligned}$$

Combined with (2), Theorem 2 implies that the POD basis \(Q_{{\textsf{POD}}} \in {\mathbb {R}}^{n\times m}\) satisfies the error bound

$$\begin{aligned} \Vert (I - Q_{{\textsf{POD}}} Q_{{\textsf{POD}}}^{T}) X \Vert _{F}^{2} \le \frac{4 \rho ^{2} \kappa _\rho ^{2} M}{(1 - \rho ^{-1})^{2}} (\rho ^{-(m+1)} - \rho ^{-(M+1)}). \end{aligned}$$

3.2 Quality of Prediction Without Compression

Algorithm 1 determines the initial guess \({\varvec{s}}^*\) for the next time step \(t_{i} > t_{i-1} = 1\) by solving the minimization problem

$$\begin{aligned} {\varvec{s}}^* = \mathop {{{\,\textrm{argmin}\,}}}\limits _{{\varvec{s}}\in {\mathcal {S}}} \Vert A(t_{i}) {\varvec{s}} - {\varvec{b}}(t_{i})\Vert _2. \end{aligned}$$
(8)

In this section, we will assume, additionally to Assumption (1), that \({\mathcal {S}} = \textrm{span}(X({\varvec{t}}))\), that is, \(X({\varvec{t}})\) is not compressed. Our analysis focuses on uniform timesteps \(\varvec{t_{{\textsf{equi}}}}= \left[ t_{i-M},\cdots , t_{i-1}\right] \) defined by

$$\begin{aligned} t_{i-M} = -1,\ t_{i-M+1} = -1+\Delta t,\ \cdots ,\ t_{i-2} = 1-\Delta t,\ t_{i-1} = 1, \quad \Delta t = 2/(M-1). \end{aligned}$$

Note that the next timestep \(t_{i} = 1 + \Delta t\) satisfies \(t_{i} \in E_\rho \) if and only if \(\rho > t_{i} + \sqrt{t_{i}^2-1} \approx 1 + \sqrt{2 \Delta t}\). The following result shows how the quality of the initial guess rapidly improves (at a square root exponential rate, compared to the exponential rate of Theorem 2) as M, the number of previous time steps in the history, increases.

Theorem 3

Under Assumption (1), the initial guess constructed by Algorithm 1 with \({\mathcal {S}} = \textrm{span}(X)\) satisfies the error bound

$$\begin{aligned} \Vert A(t_{i}) {\varvec{s}}^{*} - {\varvec{b}}(t_{i})\Vert _2 \le 2\Vert A(t_{i})\Vert _2 \kappa _\rho \Big [ \frac{1}{1-r}+ \frac{C(M,R) \rho }{ (\rho -1)\sqrt{\rho ^2 r^2 -1}} \Big ] r^{R+1}, \end{aligned}$$

with \(C(M,R) = 5 \sqrt{5} \sqrt{2R+1} \sqrt{M} / \sqrt{2(M-1)}\), for any \(R \le \frac{1}{2} \sqrt{M-1}\), \(r = (t_{i} + \sqrt{t_{i}^{2} -1})/ \rho < 1\).

3.2.1 Proof of Theorem 3

The rest of this section is concerned with the proof of Theorem 3. We establish the result by making a connection to vector-valued polynomial extrapolation and extending results by Demanet and Townsend [8] on polynomial extrapolation to the vector-valued setting.

Let \({\mathbb {P}}_{R} \subset {\mathbb {R}}^n[t]\) denote the subspace of vector-valued polynomials of length n and degree at most R for some \(R \le M-1\). We recall that any \({{\textbf {v}}} \in {\mathbb {P}}_{R}\) takes the form \({{\textbf {v}}}(t) = {{\textbf {v}}}_0 + {{\textbf {v}}}_1 t + \cdots + {{\textbf {v}}}_R t^R\) for constant vectors \({{\textbf {v}}}_0, \cdots , {{\textbf {v}}}_R \in {\mathbb {R}}^n\). Equivalently, each entry of \({{\textbf {v}}}\) is a (scalar) polynomial of degree at most R. In our analysis we consider vector-valued polynomials of the particular form

$$\begin{aligned} {\varvec{p}}(t)= X(\varvec{t_{{\textsf{equi}}}}) {\varvec{y}}(t), \end{aligned}$$
(9)

for a vector-valued polynomial \({\varvec{y}}(t)\) of length M. A key observation is that the evaluation of \({\varvec{p}}\) in the next timestep \(t_{i}\) satisfies \({\varvec{p}}(t_{i}) \in \textrm{span}(X(\varvec{t_{{\textsf{equi}}}})) = {\mathcal {S}}\). According to (8), \(\mathbf {s^*}\) minimizes the residual over \({\mathcal {S}}\). Hence, the residual can only increase when we replace \(\mathbf {s^*}\) by \({\varvec{p}}(t_{i})\) in

$$\begin{aligned} \Vert A(t_{i}) \varvec{s^{*}} - {\varvec{b}}(t_{i}) \Vert _2&\le \Vert A(t_{i}) {\varvec{p}}(t_{i}) - {\varvec{b}}(t_{i}) \Vert _2 \nonumber \\&\le \Vert A(t_{i})\Vert _2 \Vert {\varvec{p}}(t_{i})- {\varvec{x}} (t_{i}) \Vert _2. \end{aligned}$$
(10)

Thus, it remains to find a polynomial of the form (9) for which we can establish convergence of the extrapolation error \( \Vert {\varvec{p}}(t_{i})- {\varvec{x}} (t_{i}) \Vert _2\). For this purpose, we will choose \({\varvec{p}}_R \in {\mathbb {P}}_{R}\) to be the least-squares approximation of the M function samples contained in \(X(\varvec{t_{{\textsf{equi}}}})\):

$$\begin{aligned} {\varvec{p}}_{R}:= \mathop {{{\,\textrm{argmin}\,}}}\limits _{{\varvec{p}} \in {\mathbb {P}}_{R}}\Vert X(\varvec{t_{{\textsf{equi}}}}) - P(\varvec{t_{{\textsf{equi}}}}) \Vert _{F}, \quad P(\varvec{t_{{\textsf{equi}}}}) = \left[ {\varvec{p}}(t_{i-M})\,|\, \cdots \,|\,{\varvec{p}}(t_{i-1}) \right] . \end{aligned}$$
(11)

We will represent the entries of \({\varvec{p}}_{R}\) in the Chebyshev polynomial basis:

$$\begin{aligned} {\varvec{p}}_{R}(t) = q_{0}(t) {\varvec{c}}_{0,p} + q_{1}(t) {\varvec{c}}_{1,p} + \cdots + q_{R}(t) {\varvec{c}}_{R,p}, \end{aligned}$$
(12)

where \({\varvec{c}}_{k,p} \in {\mathbb {R}}^n\) and \(q_{k}\) denotes the Chebyshev polynomial of degree k, that is, \(q_{k}(t) = \cos (k \cos ^{-1}t)\) for \(t\in [-1,1]\). Setting

$$\begin{aligned} C_{p} = \left[ {\varvec{c}}_{0,p}| \cdots | {\varvec{c}}_{R,p} \right] \in {\mathbb {R}}^{n \times (R+1)}, \quad {\varvec{q}}_{R}(t) = \left[ q_{0}(t), \cdots , q_{R}(t) \right] ^{T}, \end{aligned}$$
(13)

we can express (12) more compactly as \({\varvec{p}}_{R}(t) = C_{p} {\varvec{q}}_{R}(t)\). Thus,

$$\begin{aligned} P_R(\varvec{t_{{\textsf{equi}}}}) = C_{p} Q_R(\varvec{t_{{\textsf{equi}}}}), \quad Q_R(\varvec{t_{{\textsf{equi}}}}) = \ \left[ {\varvec{q}}_{R}(t_1)| \cdots |{\varvec{q}}_{R}(t_M) \right] . \end{aligned}$$

In view of (11), the matrix of coefficients \(C_{p}\) is determined by minimizing \(\Vert X(\varvec{t_{{\textsf{equi}}}}) - C_{p} Q_R(\varvec{t_{{\textsf{equi}}}})\Vert _{F}\). Because \(R \le M-1\), the matrix \(Q_R(\varvec{t_{{\textsf{equi}}}})\) has full row rank and thus the solution of this least-squares problem is given by \(C_{p} = X(\varvec{t_{{\textsf{equi}}}})Q_{R}(\varvec{t_{{\textsf{equi}}}})^{\dagger }\) with \(Q_{R}(\varvec{t_{{\textsf{equi}}}})^{\dagger } = Q_{R}(\varvec{t_{{\textsf{equi}}}})^{T} (Q_{R}(\varvec{t_{{\textsf{equi}}}}) Q_{R}(\varvec{t_{{\textsf{equi}}}})^{T})^{-1}\). In summary, we obtain that

$$\begin{aligned} {\varvec{p}}_{R}(t) = C_{p}{\varvec{q}}_{R}(t) = X(\varvec{t_{{\textsf{equi}}}}) Q_{R}(\varvec{t_{{\textsf{equi}}}})^{\dagger } {\varvec{q}}_{R}(t), \end{aligned}$$
(14)

which is of the form (9) and thus contained in \(\textrm{span}(X(\varvec{t_{{\textsf{equi}}}}))\), as desired.

In order to analyze the convergence of \({\varvec{p}}_{R}(t)\), we relate it to Chebyshev polynomial interpolation of \({\varvec{x}}\). The following lemma follows from classical approximation theory, see, e.g., [18, Lemma 2.2].

Lemma 4

Let \({\varvec{q}}_{R}(t) \in {\mathbb {R}}^{R+1}\) be defined as in (13), containing the Chebyshev polynomials up to degree R. Under Assumption 1 there exists an approximation of the form

$$\begin{aligned} {\varvec{x}}_{R}(t) = C_x {\varvec{q}}_{R}(t), \quad C_x = \left[ {\varvec{c}}_{0,x}, {\varvec{c}}_{1,x}, \cdots , {\varvec{c}}_{R,x} \right] \in {\mathbb {R}}^{n\times (R+1)}, \end{aligned}$$

such that \( \Vert {\varvec{c}}_{k,x} \Vert _2 \le 2 \kappa _{\rho } \rho ^{-k} \) and

$$\begin{aligned} \max _{ t \in \left[ -1,1\right] } \Vert {\varvec{x}}_{R}(t) -{\varvec{x}}(t) \Vert _2 \le \frac{2\kappa _{\rho }}{\rho -1 } \rho ^{-R}. \end{aligned}$$

Following the arguments in [8] for scalar functions, Lemma 4 allows us to estimate the extrapolation error for \({\varvec{p}}_{R}(t)\) if \(R \sim \sqrt{M}\).

Theorem 5

Suppose that Assumption 1 holds and \(R \le \frac{1}{2}\sqrt{M-1}\). Then the vector-valued polynomial \({\varvec{p}}_{R} \in {\mathbb {P}}_R\) defined in (14) satisfies for every \(t \in (1, (\rho + \rho ^{-1})/2)\) the error bound

$$\begin{aligned} \Vert {\varvec{x}}(t) - {\varvec{p}}_{R}(t)\Vert _2 \le 2\kappa _{\rho } \Big [ \frac{1}{1-r}+ \frac{C(M,R) \rho }{(\rho -1)\sqrt{\rho ^2 r^2 -1}} \Big ] r^{R+1}, \end{aligned}$$

with \(r = (t + \sqrt{t^{2} -1})/ \rho < 1\) and C(MR) defined as in Theorem 3.

Proof

Letting \({\varvec{x}}_{R}\) be the polynomial from Lemma 4, we write

$$\begin{aligned} \Vert {\varvec{x}}(t) - {\varvec{p}}_{R}(t) \Vert _2&\le \Vert {\varvec{x}}(t) - {\varvec{x}}_{R}(t) \Vert _2 + \Vert {\varvec{x}}_{R}(t) - {\varvec{p}}_{R}(t) \Vert _2 \nonumber \\&= \Big \Vert \sum _{k=R+1}^{\infty } {\varvec{c}}_{k,x} q_{k}(t)\Big \Vert _2 + \Vert (C_{x}- C_{p}) {\varvec{q}}_{R}(t) \Vert _2 \nonumber \\&\le \sum _{k=R+1}^{\infty } \Vert {\varvec{c}}_{k,x} \Vert _2 | q_{k}(t)| + \Vert C_{x}- C_{p} \Vert _2 \Vert {\varvec{q}}_{R}(t) \Vert _2. \end{aligned}$$
(15)

To treat the second term in (15), first note that, by definition, we have

$$\begin{aligned} X_R(\varvec{t_{{\textsf{equi}}}}) = \left[ {\varvec{x}}_R(t_{i-M})\,|\,\cdots \,|\,{\varvec{x}}_R(t_{i-1}) \right] = C_x Q_R(\varvec{t_{{\textsf{equi}}}}) \end{aligned}$$

and hence \(C_x = X_R(\varvec{t_{{\textsf{equi}}}}) Q_R(\varvec{t_{{\textsf{equi}}}})^\dagger \). Setting \(\sigma := \sigma _{\min }(Q_R(\varvec{t_{{\textsf{equi}}}})) = 1/\Vert Q_R(\varvec{t_{{\textsf{equi}}}})^\dagger \Vert _2\), we obtain

$$\begin{aligned} \Vert C_x - C_p\Vert _2&= \Vert (X_R(\varvec{t_{{\textsf{equi}}}}) - X(\varvec{t_{{\textsf{equi}}}}) ) Q_R(\varvec{t_{{\textsf{equi}}}})^\dagger \Vert _2 \le \Vert X_R(\varvec{t_{{\textsf{equi}}}}) - X(\varvec{t_{{\textsf{equi}}}})\Vert _2 / \sigma \\&\le \frac{\sqrt{M}}{\sigma } \cdot \max _{k=1,..,M} \Vert {\varvec{x}}_{R}(t_{k}) - {\varvec{x}}(t_{k}) \Vert _2 \le \frac{\sqrt{M}}{\sigma } \frac{2\kappa _{\rho }}{\rho -1} \rho ^{-R}, \end{aligned}$$

where we used Lemma 4 in the last inequality. Applying, once more, Lemma 4 to the first term in (15) gives

$$\begin{aligned} \Vert {\varvec{x}}(t) - {\varvec{p}}_{R}(t) \Vert _2&\le 2\kappa _{\rho } \Big [ \sum _{k=R+1}^{\infty } \rho ^{-k} |q_{k}(t)| + \frac{\sqrt{M}}{\sigma } \frac{\rho ^{-R}}{\rho -1 } \Vert {\varvec{q}}_{R}(t) \Vert _2 \Big ] \end{aligned}$$
(16)

Because \(|q_{k}(t)| \le (t + \sqrt{t^{2}-1})^{k} \le \rho ^k r^k\) for \(t>1\), we have that

$$\begin{aligned} \Vert {\varvec{q}}_{R}(t) \Vert _2^2&\le \sum _{k = 0}^R (\rho r)^{2k} = (\rho r)^{2R} \sum _{k = 0}^R (\rho r)^{-2k} \le \frac{(\rho r)^{2R+2}}{\rho ^2 r^2-1}. \end{aligned}$$
(17)

Inserted into (16), this gives

$$\begin{aligned} \Vert {\varvec{x}}(t) - {\varvec{p}}_{R}(t) \Vert _2&\le 2\kappa _{\rho } \Big [ \sum _{k=R+1}^{\infty } \rho ^{-k} \rho ^{k} r^{k} + \frac{\sqrt{M}}{\sigma } \frac{\rho ^{-R} (\rho r)^{R+1}}{(\rho -1)\sqrt{\rho ^2 r^2 -1} } \Big ] \\ {}&\le 2\kappa _{\rho } \Big [ \frac{1}{1-r}+ \frac{\sqrt{M}\rho }{\sigma (\rho -1)\sqrt{\rho ^2 r^2 -1}} \Big ] r^{R+1}. \end{aligned}$$

The proof is completed by inserting the lower bound

$$\begin{aligned} \sigma = \sigma _{\min }(Q_{R}(\varvec{t_{{\textsf{equi}}}})) \ge \frac{\sqrt{2}}{5\sqrt{5}} \frac{\sqrt{M-1}}{\sqrt{2R +1}}, \end{aligned}$$
(18)

which holds when \(R \le \frac{1}{2}\sqrt{M-1}\) according to [8, Theorem 4]. \(\square \)

Using Theorem 5 with \(t=t_{i}\) and inserting the result in (10), we have proven the statement of Theorem 3.

3.3 Optimality of the Prediction with Compression

When the matrix \(X({\varvec{t}})\) is compressed via POD (Algorithm 2) or the randomized range finder (Algorithm 3), the orthonormal basis \(Q \in {\mathbb {R}}^{n \times m}\) used in Algorithm 1 spans a lower-dimensional subspace \({\mathcal {S}} \subseteq \textrm{span}(X)\).

Corollary 6

Suppose that Algorithm 1 is used with an orthonormal basis satisfying \(\Vert (QQ^{T} -I) X(\varvec{t_{{\textsf{equi}}}}) \Vert _{2} \le \varepsilon \) for some tolerance \(\varepsilon > 0\). Under Assumption 1, the initial guess \({\varvec{s}}^{*}\) constructed by the algorithm satisfies the error bound

$$\begin{aligned} \Vert A(t_{i}){\varvec{s}}^{*} - {\varvec{b}}(t_{i})\Vert _2 \le 2 \Vert A(t_{i})\Vert _2 \kappa _{\rho } \left[ \frac{1}{1-r}+ \frac{C(M,R) \rho }{\sqrt{\rho ^2 r^2-1}} \left( \frac{1}{\rho -1} + \frac{ \varepsilon \rho ^{R}}{2 \sqrt{M} \kappa _{\rho }} \right) \right] r^{R+1} \end{aligned}$$

for any \(R \le \frac{1}{2} \sqrt{M-1}\).

Proof

Let \({\varvec{p}}_{R}(t) = X(\varvec{t_{{\textsf{equi}}}}) Q_{R}(\varvec{t_{{\textsf{equi}}}})^{\dagger } {\varvec{q}}_{R}(t)\) be the polynomial constructed in (14). Using that \({\varvec{s}}^{*}\) satisfies the minimization problem (8) and \(QQ^T {\varvec{x}}(t_{i}) \in {\mathcal {S}} = \textrm{span}(Q)\), we obtain:

$$\begin{aligned} \Vert A(t_i) \varvec{s^{*}} - {\varvec{b}} (t_i) \Vert _2&\le \Vert A(t_i) Q Q^{T} {\varvec{x}} (t_i) - {\varvec{b}} (t_i) \Vert _2 \\ {}&\le \Vert A(t_{i})\Vert _2\big [ \Vert (QQ^{T}-I) ({\varvec{x}} (t_i) -{\varvec{p}}_{R}(t_i) ) \Vert _2 \\&\quad + \Vert (Q Q^{T}- I){\varvec{p}}_{R}(t_i) \Vert _2 \big ] \\ {}&\le \Vert A(t_{i})\Vert _2 \big [ \Vert {\varvec{x}} (t_i) -{\varvec{p}}_{R}(t_i) \Vert _2 \\&\quad + \Vert (Q Q^{T}- I)X(\varvec{t_{{\textsf{equi}}}}) Q_{R}(\varvec{t_{{\textsf{equi}}}})^{\dagger } {\varvec{q}}_{R}(t_i) \Vert _2 \big ]. \end{aligned}$$

The first term is bounded using Theorem 5 with \(t = t_i\). For the second term, we use the bound in (17) on \(\Vert {\varvec{q}}_{R}(t_{M+1})\Vert \) to obtain

$$\begin{aligned} \Vert (Q Q^{T}- I)X(\varvec{t_{{\textsf{equi}}}}) Q_{R}(\varvec{t_{{\textsf{equi}}}})^{\dagger } {\varvec{q}}_{R}(t) \Vert _2&\le \Vert (Q Q^{T}- I)X(\varvec{t_{{\textsf{equi}}}}) \Vert _{2} \Vert {\varvec{q}}_{R}(t_{M+1})\Vert _2 / \sigma \\&\le \frac{ \varepsilon (\rho r)^{R+1}}{\sigma \sqrt{\rho ^2 r^2-1}}, \end{aligned}$$

with \(\sigma := \sigma _{\min }(Q_R(\varvec{t_{{\textsf{equi}}}}))\). The proof is completed using the lower bound (18) on \(\sigma \). \(\square \)

4 Numerical Results: Test Case

To test the subspace acceleration algorithms proposed in Sect. 2, we first consider a simplified setting, an elliptic PDE with an explicitly given time- and space-dependent coefficient \(a({\varvec{x}},t)\) and source term \(g({\varvec{x}},t)\):

$$\begin{aligned} {\left\{ \begin{array}{ll} \nabla \cdot (a({\varvec{x}},t) \nabla f({\varvec{x}},t)) = g({\varvec{x}},t) &{} \quad \quad \text {in } \Omega \\ f({\varvec{x}},t) = 0 &{}\quad \quad \text {on } \partial \Omega \end{array}\right. } \end{aligned}$$
(19)

We consider the domain \(\Omega = \left[ 0,1\right] ^{2} \subset {\mathbb {R}}^{2}\) and discretize (19) on a uniform two-dimensional Cartesian grid using a centered finite difference scheme of order 4. This leads to a linear system for the vector of unknowns \(\varvec{{f}}(t)\), for which both the matrix and the right-hand side depend on t:

$$\begin{aligned} A(t) \varvec{{f}}(t)= \varvec{{g}}(t). \end{aligned}$$
(20)

We discretize the time variable on the interval \(\left[ t_{0}, \, t_{f} \right] \) with a uniform timestep \(\Delta t\) on \(N_{t}\) points, such that \(t_{f} = t_{0} + N_{t} \Delta t\). Evaluating (20) in these \(N_{t}\) instants, we obtain a sequence of linear systems of the same type as (1).

We set \(a({\varvec{x}},t) = \exp ^{\left[ -(x-0.5)^{2} - (y-0.5)^{2}\right] } \cos (tx) +2.1\) and choose the right-hand side \(g({\varvec{x}},t)\) such that

$$\begin{aligned}f({\varvec{x}},t) = \sin (4 \pi y t) \sin (15 \pi x t) \left[ 1+ \sin (15 \pi x t) \cos (3 \pi y t) \exp ^{\left[ (x-0.5)^{2} + (y-0.5)^{2} -0.25^{2} \right] } \right] \end{aligned}$$

is the exact solution of (19). The tests are performed using MATLAB 2023a on an M1 MacbookPro. We employ GMRES as iterative solver for the linear system, with tolerance \(10^{-7}\) and incomplete LU factorization as preconditioner. We start the simulations at \(t_{0}= 2.3\, s\) and perform \(N_{t} = 200\) timesteps.

Fig. 1
figure 1

GMRES iterations per timestep when solving Eq. (20) with different initial guesses

The results reported in Fig. 1 use a spatial grid of dimension \(100 \times 100\), leading to linear systems of size \(n = 10000\). Different values of M, the number of previous solutions retained in the history matrix X, and m, the dimension of the reduced-order model, were tested. We found that the choices \(M = 20, \, m = 10\) and \(M = 35,\, m = 20\) lead to good performance for \(\Delta t = 10^{-5}\) and \(\Delta t = 10^{-3}\), respectively. The baseline is (preconditioned) GMRES with the previous solution used as initial guess; the resulting number of iterations is indicated with the solid blue line (“Baseline”) in Fig. 1. This is compared to the number of iterations obtained by applying GMRES when Algorithm 1 is employed to compute the initial guess, in combination with both the POD basis in Algorithm 2 (“POD” in the graph) and the Randomized Range Finder in Algorithm 3 (“RAND” in the graph). For the Randomized Range Finder algorithm, the matrix \(\Omega \) is computed from scratch only every 50 timesteps, while in the other timesteps is updated as described in Algorithm 3, resulting in a computationally efficient version of the algorithm. Both the POD and Randomized versions of the acceleration method give a remarkable gain in computational time with respect to the baseline.

When employing \(\Delta t = 10^{-5}\), in Fig. 1a, the number of iterations computed by the linear solver vanishes most of the time, since the initial residual computed with the new initial guess is already below the tolerance, set to \(10^{-7}\) in this case. It is worth noticing that the new randomized method gives an acceleration comparable to the existing POD one, but it requires a much lower computational cost, as described in Sect. 2.

The results obtained for larger timesteps, in Fig. 1b, are slightly worse, as expected, since it is less easy to predict new solutions using the previous ones when they are further apart in time. Nevertheless, the gain of the acceleration method is still visible, obtaining always less than half iterations with respect to the baseline and adding the solution of a reduced-order system of dimension \(m= 20 \) only, compared to the full solution of dimension 10000. The resulting advantage of the new method can indeed be observed in Fig. 2, which compares the computational time needed by the solver using the baseline approach with the one obtained by using the new guess (this includes the time employed to compute the guess). The timings showed are the ones needed to produce the results in Fig. 1. The time employed by the POD method has not been included since it is significantly higher than the baseline, as predicted by the analysis in Sect. 2.1.

Fig. 2
figure 2

Computational time per timestep corresponding to Fig. 1a and b. The average speedup per iteration of the randomized method with respect to the baseline is a factor 9 for \(\Delta t = 10^{-5} \) and a factor 10 for \(\Delta t = 10^{-3}\)

5 Numerical Results: Plasma Simulation

In this Section, we apply the subspace acceleration method to the numerical simulation of plasma turbulence in the outermost plasma region of tokamaks, where the plasma enters in contact with the surrounding external solid walls, resulting in strongly non-linear phenomena occurring on a large range of time and length scales.

Fig. 3
figure 3

GBS computational domain. The toroidal direction is along \(\varphi \), the radial direction is along R, and the vertical direction is along Z. The domain consists of \(N_{\varphi }\) rectangular poloidal planes, each discretized on a \(N_{R} \times N_{Z}\) Cartesian grid

In this work, we consider GBS (Global Braginskii Solver) [12, 23], a three-dimensional, flux-driven, two-fluid code developed for the simulation of the plasma dynamics in the boundary of a fusion device. GBS implements the Braginskii two-fluid model [3], which describes a quasi-neutral plasma through the conservation of density, momentum, and energy. This results in six coupled three-dimensional time-evolving non-linear equations which evolve the plasma dynamics in \(\Omega \), a 3D toroidal domain with rectangular poloidal cross section, as represented in Fig. 3. The fluid equations are coupled with Maxwell equations, specifically Poisson and Ampére, elliptic equations for the electromagnetic variables of the plasma. In the limit considered here the elliptic equations reduce to a set of two-dimensional algebraic constraints decoupled along the toroidal direction, therefore to be satisfied independently on each poloidal plane. The differential equations are spatially discretized on a uniform Cartesian grid employing a finite difference method, resulting in a system of differential-algebraic equations of index one [15]:

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _{t} {\varvec{f}}(t)= \varvec{{\mathcal {Y}}}({\varvec{f}}(t),{\varvec{x}}(t)) \quad \quad \quad \, \, \, \text {in } \Omega \\ A_{k}({\varvec{f}}(t)) {\varvec{x}}_{k}(t) = {\varvec{b}}_{k}({\varvec{f}}(t)) \quad \quad \text {for each { k}th poloidal plane } \end{array}\right. } \end{aligned}$$
(21)

where \(\varvec{{\mathcal {Y}}}({\varvec{f}}(t),{\varvec{x}}(t))\) is a non-linear, 6-dimensional differential operator and

$$\begin{aligned} {\varvec{x}}(t)&= \left[ {\varvec{x}}_{1}(t), \cdots , {\varvec{x}}_{k}(t) \cdots , {\varvec{x}}_{N_{Z}}(t) \right] \in {\mathbb {R}}^{N_{R}N_{\varphi }N_{Z}}, \\ {\varvec{f}}(t)&= \left[ {\varvec{f}}_{1}(t), \cdots , {\varvec{f}}_{k}(t) \cdots , {\varvec{f}}_{N_{Z}}(t) \right] \in {\mathbb {R}}^{N_{R}N_{\varphi }N_{Z}} \end{aligned}$$

are the vector of, respectively, the electromagnetic and fluid quantities solved for by GBS, where the solutions of all the \(N_{Z}\) poloidal planes are stacked together. More precisely, the time evolution of the fluid variables, \({\varvec{f}}\), is coupled with the set of linear systems \(A_{k}({\varvec{f}}(t)) {\varvec{x}}_{k}(t) = {\varvec{b}}_{k}({\varvec{f}}(t))\) which result from the discretization of Maxwell equations. Indeed, the matrix \(A_{k} \in {\mathbb {R}}^{N_{R}N_{Z} \times N_{R}N_{Z} }\) and right-hand side \(\varvec{b_{k}} \in {\mathbb {R}}^{N_{R}N_{Z}}\) depend on time through \({\varvec{f}}\).

In GBS, system (21) is integrated using a Runge–Kutta scheme of order four, on the discrete times \(\left\{ t_{i} \right\} _{i=1}^{N_{t}}\), with step-size \(\Delta t\). Given \({\varvec{f}}^{i}\) and \({\varvec{x}}^{i}\), the value of \({\varvec{f}}\) and \({\varvec{x}}\) at time \(t_{i}\), the computation of \(\varvec{ {f}}^{i+1}\), requires performing three intermediate substeps where the quantities \(\varvec{ {f}}^{i+1, j}\) for \( j =1,2,3\) are computed. To guarantee the consistency and convergence of the Runge–Kutta integration method [15], the algebraic constraints are solved at every substep, computing \(\varvec{ {x}}^{i+1,j}_{k}\) for \( j =1,2,3\) and for each \(k-\)th poloidal plane. As a consequence, the linear systems \(A_{k}({\varvec{f}}(t)) {\varvec{x}}_{k}(t) = {\varvec{b}}_{k}({\varvec{f}}(t))\) are assembled and solved four times for each of the \(N_{\varphi }\) poloidal planes, to advance the full system (21) by one timestep. Since the timestep \(\Delta t\) is constrained to be small from the stiff nature of the GBS model, the solution of the linear systems is among the most computationally expensive part of GBS simulations.

In GBS, the linear system is solved using GMRES, with the algebraic multigrid preconditioner boomerAMG from the HYPRE library [9], a choice motivated by previous investigations [12]. The subspace acceleration algorithm proposed in Sect. 2 is implemented in the GBS code and, given the results shown in Sect. 4, the randomized version of the algorithm is chosen. The results reported are obtained from GBS simulations on one computing node. The poloidal planes of the computational domain are distributed among 16 cores, specifically of type Intel(R) Core i7-10700F CPU at 2.90GHz. GBS is implemented in Fortran 90, and relies on the PETSc library [2] for the linear solver and Intel MPI 19.1 for the parallelization.

We consider the simulation setting described in [12], taking as initial conditions the results of a simulation in a turbulent state. We use a Cartesian grid of size of \(N_{R} = 150,\, N_{Z} = 300\) and \(N_{\varphi }=64\), with additional 4 ghost points in the Z and R directions. Therefore, the imposed algebraic constraints result in 64 sequences of linear systems of dimension \( N_{R} N_{Z} \times N_{R} N_{Z} = 46816 \times 46816\). The timestep employed is \(\Delta t = 0.7 \times 10^{-5}\). The sequence of linear systems we consider represents the solution of the Poisson equation on one fixed poloidal plane, but the same considerations apply to the discretization of Ampére equation.

Fig. 4
figure 4

Performance of the algorithm applied to the solution of Poisson equation in GBS simulations. The time for the RAND algorithm is on average approximately one fourth of the time for the baseline

In Fig. 4a the number of iterations obtained with the method proposed in Sect. 2, denoted as “RAND” is compared with the ones obtained using the previous step solution as initial guess, depicted in blue as “Baseline”. We notice that, employing the acceleration method, the number of GMRES iterations needed for each solution of the linear system is reduced by a factor 2.9, on average, at the cost of computing a solution of an \(m \times m \) reduced-order system. In Fig. 4b the wall clock time required for the solution of the systems is shown. The baseline approach is compared to the accelerated method, where we also take into account the cost of computing the initial guess. Thanks to the randomized method employed, the process of generating the guess is fast enough to provide a time speed up of a factor of 6.5 per iteration.

The employed values of \(M=15\), the number of previous solutions retained, and \(m=10\), the dimension of the reduced-order model, are the ones found to give a good balance between the decrease in the number of iterations and the computational cost of the reduced-order model. In Table 1 the results for different values of M and m are reported. It is worth noticing that an average number of GMRES iterations per timestep smaller than one implies that often the initial residual obtained with the initial guess is below the tolerance set for the solver. It is possible to notice that higher values of m lead to very small number of iterations, but the overall time speedup is reduced since the computation of the guess becomes more expensive.

Table 1 GBS simulations result corresponding to different values of M and m

6 Conclusions

In this paper, we propose a novel approach for accelerating the solution of a sequence of large-scale linear systems that arises from, e.g., the discretization of time-dependent PDEs. Our method generates an initial guess from the solution of a reduced-order model, obtained by extracting relevant components of previously computed solutions using dimensionality reduction techniques. Starting from an existing POD-like approach, we accelerate the process by employing a randomized algorithm. A convergence analysis is performed, which applies to both approaches, POD and the randomized algorithm and shows how the accuracy of the method increases with the history size. A test case displays how POD leads to a noticeable decrease in the number of iterations, but at the same time a nearly equal decrease is achieved by the cheaper randomized method, that leads to a time speedup per iteration of a factor 9. In real applications such as the plasma simulations described in Sect. 5, the speedup is more modest, given the stiff nature of the problem which constrains the timestep of the explicit integration method to be very small, but still practically relevant.