Matrix Equation Techniques for Certain Evolutionary Partial Differential Equations

We show that the discrete operator stemming from time-space discretization of evolutionary partial differential equations can be represented in terms of a single Sylvester matrix equation. A novel solution strategy that combines projection techniques with the full exploitation of the entry-wise structure of the involved coefficient matrices is proposed. The resulting scheme is able to efficiently solve problems with a tremendous number of degrees of freedom while maintaining a low storage demand as illustrated in several numerical examples.


Introduction
The numerical treatment of time-dependent partial differential equations (PDEs) often involves a first discretization phase which yields a discrete operator that needs to be inverted.In general, the discrete problem is written in terms of a sequence of large linear systems where n is the number of spatial degrees of freedom and is the number of time steps.Wellestablished procedures, either direct or iterative, can be employed in the solution of (1.1).However, in many cases, the coefficient matrices in (1.1) are very structured and a different formulation of the algebraic problem in terms of a matrix equation can be employed.The matrix oriented formulation of the algebraic problems arising from the discretization of certain deterministic and stochastic PDEs is not new.See, e.g., [39,50,52,53].Nevertheless, many of the contributions available in the literature deal with elliptic PDEs.Moreover, only in the last decades the development of efficient solvers for large-scale matrix equations allows for a full exploitation of such reformulation also during the solution phase.See, e.g., [9,10,25,36,40,51], and [48] for a thorough presentation about solvers for linear matrix equations.
In this paper, we discuss time-dependent PDEs and we show that the aforementioned reformulation in terms of a matrix equation can be performed also for this class of operators.
The model problem we have in mind is of the form on ∂Ω, u(x, 0) = u 0 (x), (1.2) where Ω ⊂ R d , d = 1, 2, 3, and L is a linear differential operator involving only spatial derivatives.For the sake of simplicity in the presentation, in (1.2) we consider Dirichlet boundary conditions.However, the methodology presented in this paper can be used in case of Neumann or Robin boundary conditions as well.Moreover, we specialize some of our results in the case of tensorized spatial domains of the form Ω = d i=1 Ω i and Laplace-like operators 1 L.However, the matrix equation formulation we propose in this paper still holds for more general domains Ω and operators L.
We discretize the problem (1.2) in both space and time, and, for the sake of simplicity, we assume that a finite difference method is employed in the space discretization whereas we apply a backward differentiation formula (BDF) of order s, s = 1, . . ., 6, for the discretization in time.
If an "all-at-once" approach is considered, the algebraic problem arising from the discretization of (1.2) amounts to a single linear system with A ∈ R n × n .As shown in [32], the n × n coefficient matrix A possesses a Kronecker structure.While in [32] the authors exploit this Kronecker form to design an effective preconditioner for (1.1), we take advantage of the Kronecker structure to reformulate the algebraic problem in terms of a single matrix equation and we show how appropriate projection techniques can be applied for its efficient solution.
Notice that the strategy proposed in this paper significantly differs from other matrixequation-oriented schemes available in the literature.For instance, in [10], certain timestepping schemes are rewritten in matrix form.Therefore, a sequence of matrix equations need to be solved.Here we efficiently solve only one matrix equation by combining stateof-the-art projection methods with a novel approach which fully exploits the circulant-pluslow-rank structure of the discrete time operator.The resulting scheme manages to efficiently solve problems with a huge number of degrees of freedom while utilizing modest memory resources.
Here is a synopsis of the paper.In Sect. 2 we show how the all-at-once approach for the solution of (1.2) leads to a Sylvester matrix equation.We discuss the incorporation of the boundary conditions for the matrix equation formulation in Sect.3. In particular, in Sect.3.1 we illustrate an automatic procedure in case of tensorized spatial domains Ω and Laplacelike operators L. In Sect. 4 we discuss left-projection methods for Sylvester equations and a generic approximation space.Some computational details for the extended and rational Krylov subspaces (1.3) and (1.4) are given in Sects.4.1.1 and 4.1.2,respectively.Projection methods can largely benefit from the possible Laplace-like structure of the obtained stiffness matrix.This structure can be further exploited in the solution process as illustrated in Sect.4.2.
The projection framework we consider to reduce the complexity of the spatial operator may not be sufficient to obtain an efficient solution scheme, especially for large .We address this problem by fully exploiting the circulant-plus-low-rank structure of the time operator and in Sect. 5 we illustrate a novel strategy to be combined with the aforementioned projection technique.The resulting solution scheme turns out to be very successful also when dealing with problems with a tremendous number of degrees of freedom in both space and time.As already mentioned, the novel framework we present can be employed in the solution of many different PDEs of the form (1.2).In Sect.6 we briefly discuss the case of time-dependent convection-diffusion equations as an example of non Laplace-like operators.Several results illustrating the potential of our new methodology are reported in Sect.7 while our conclusions are given in Sect.8.
Throughout the paper we adopt the following notation.The matrix inner product is defined as X , Y F : = trace(Y T X ) so that the induced norm is X2 F = X , X F .The Kronecker product is denoted by ⊗ while the operator vec : R n×n → R n 2 is such that vec(X ) is the vector obtained by stacking the columns of the matrix X one on top of each other.The identity matrix of order n is denoted by I n .The subscript is omitted whenever the dimension of I is clear from the context.Moreover, e i is the i-th basis vector of the canonical basis of R n .The brackets [•] are used to concatenate matrices of conforming dimensions.In particular, a Matlab-like notation is adopted and [M, N ] denotes the matrix obtained by putting M and N one next to the other.If w ∈ R n , diag(w) denotes the n × n diagonal matrix whose i-th diagonal entry corresponds to the i-th component of w.
Given a suitable space K m , 2 we will always assume that a matrix V m ∈ R n×r , Range(V m ) = K m , has orthonormal columns and it is full rank so that dim(K m ) = r .Indeed, if this is not the case, deflation strategies to overcome the possible linear dependence of the spanning vectors can be adopted as it is customary in block Krylov methods.See, e.g., [21,Section 8].

A Matrix Equation Formulation
Assuming a BDF of order s is employed for the time integration, if denotes a discretization of the closed domain Ω, and the time interval [0, T ] is discretized with + 1 equidistant nodes {t k } k=0,..., , then the discretization of (1.2) leads to where α j = α j (s), β = β(s) ∈ R are the coefficients defining the selected BDF.See Table 1. 3 It has been proved that for s > 6 the BDFs become unstable, see, e.g., [2, Section 5.2.3], and we thus restrict ourselves to the case of s ≤ 6.
together with the boundary conditions, while u k gathers the approximations to the space nodal values of the solution u at time t k , i.e., u(x i d , t k ) for all x i d ∈ Ω h . 4 generic BDF of order s, s ≤ 6, requires the s−1 additional initial values u −1 , . . ., u −s+1 together with u 0 .If u −1 , . . ., u −s+1 are not given, they must be carefully approximated and such a computation must be O(τ s ) accurate to maintain the full convergence order of the method.In standard implementation of BDFs, the k-th initial value u k , k = −1, . . ., −s + 1, is computed by a BDF of order k with a time-step τ k , τ k ≤ τ .See, e.g., [2,Section 5.1.3].Allowing for a variable time-stepping is crucial for preserving the convergence order of the method.
We anticipate that the solution scheme presented in this paper is designed for a uniform time grid and it is not able to automatically handle a variable time-stepping.Therefore, even though the solution process is illustrated for a generic BDF of order s ≤ 6, in the experiments reported in Sect.7 we make use of the implicit Euler scheme for the time discretization when the additional initial values u −1 , . . ., u −s+1 are not provided.
The generalization of the proposed algorithm to the case of variable, and more in general, adaptive time-stepping will be the topic of future works.
Rearranging the terms in (2.1) and applying an all-at-once approach, we get the n × n linear system where u 0 collects the space nodal values of the initial condition u 0 .The coefficient matrix A in (2.2) can be written as A = I ⊗(I n +τβ K d )− s j=1 α j Σ j ⊗I n where Σ j denotes the × zero matrix having ones only in the j-th subdiagonal.For instance, 2) can be reformulated as Many numerical methods for the efficient solution of Sylvester matrix equations can be found in the literature; see, e.g., [48].However, state-of-the-art solvers are not able to deal with the peculiar structure of the matrices Σ j in general and the solution of Eq. (2.3) may be unfeasible, especially for large .In this paper we propose a novel algorithm that combines projection techniques for the spatial component of (2.3) with a full exploitation of the circulant-plus-low-rank structure of the Σ j 's.
Notice that the formulation (2.3) does not require any particular assumption on the spatial domain Ω.In particular, we do not need any tensorized structure.The key factor here is the natural separability between the space and time components of the overall differential operator encoded in (1.2).
In what follows we always assume that the matrix Roughly speaking, this can be justified by assuming the functions f and g to be sufficiently smooth in time so that f k does not differ too much from f k+1 if the time-step size τ is sufficiently small.More precisely, if f k contains entries having an analytic extension in an open elliptic disc with foci 0 and T for all k, then the results in [26 If g = 0, a different way to obtain such low-rank representation may be to computing a separable approximation to f at the continuous level, namely f (x, t) ≈ p i=1 h i (x)ϑ i (t), e.g., by the Emprical Interpolation Method (EIM) [4].Then With F 1 F T 2 at hand, Eq. ( 2.3) can be written as If a finite elements method is employed for the space discretization, also a mass matrix M has to be taken into account and the matrix equation we have to deal with has the form (2.4) See, e.g., [32].The generalized Sylvester equation (2.4) can be easily treated as a standard Sylvester equation.Indeed, if M = L L T denotes the Cholesky factorization of M, we can consider the standard equation where U = L T U. Once an approximation U m to U is computed, we can retrieve an approximate solution to the original problem (2.4) by performing U m = L −T U m ≈ U. Notice that the matrix L −1 K d L −T does not need to be explicitly computed.See, e.g., [46,Example 5.4] and Example 7.2.

Imposing the Boundary Conditions
It is not difficult to equip the algebraic problem (2.3) with the proper boundary and initial conditions.Indeed, it is always possible to design the stiffness matrix K d and the vectors u 0 , u −1 , . . ., u −s+1 , f 1 , . . ., f by mimicking what one would do if the linear system (2.2) was solved.Once these factors are computed, the solution step involves the formulation (2.3) in place of (2.2).However, in the next section we show how to directly include Dirichlet boundary conditions in the matrix formulation (2.3) in case of tensorized spatial domain Ω = d i=1 Ω i and Laplace-like operators L.

Tensorized Domains and Laplace-like Operators
In this section we consider the problem (1.2) in case of tensorized spatial domains Ω = d i=1 Ω i and Laplace-like operators L, namely the stiffness matrix K d is such that , and we show how to directly impose the boundary conditions in the matrix formulation (2.3).For the sake of simplicity, we assume Ω = (0, 1) d and that n nodes in each of the d spatial direction have been employed so that n = n d .
We first consider d = 1 in (1.2).The boundary nodes correspond to the entries of index i, i = 1, n, in each column of U. Denoting by P 1 the operator which selects only the boundary nodes, namely its entries are 1 for indexes corresponding to boundary nodes and 0 otherwise, for 1-dimensional problems we have The operator I + τβ K 1 should act as the identity operator on the space of boundary nodes which means that Therefore, if we define the matrix we can consider I n − P 1 + τβ K 1 in place of I n + τβ K 1 as left coefficient matrix in (2.3).In (3.2), the matrix K1 ∈ R (n−2)×n corresponds to the discrete operator stemming from the selected finite difference scheme and acting only on the interior of Ω h .Different choices with respect to the one in (3.2) can be considered to meet the constraint (3.1).For instance, we can select K 1 : = [0 T ; K1 ; 0 T ], 0 the zero vector of length n, and consider I n + τβ K 1 as coefficient matrix.However, such a K 1 is not suitable for the solution process we are going to present in Sect. 4 due to its singularity and the matrix K 1 in (3.2) is thus preferred.
We now show how to select the right-hand side in (2.3) when the coefficient matrix is as in (3.2).We have Since all the initial values satisfy the boundary conditions, namely u 0 (1) and A similar approach can be pursued also for 2-and 3-dimensional problems.In this cases, following the same ordering of the unknowns proposed in [36], it can be shown that the operator selecting the boundary nodes in U has the form Since L is supposed to be a Laplace-like operator, we can write The most natural choice for imposing the boundary conditions is thus to select and use I n 2 − P 2 + τβ K 2 and I n 3 − P 3 + τβ K 3 as coefficient matrices in (2.3).Notice that (3.4) and where Therefore the extra terms G 2 , G 3 in (3.4)-(3.5)must be taken into account when constructing the right-hand side s j=1 α j u 1− j , s j=2 α j u 2− j , . . ., α s u 0 [e 1 , . . ., e s ] T + τβ[f 1 , . . ., f ], and the relation i.e.,

A Projection Framework for the Spatial Operator
In this section we show how to effectively tackle the spatial component of Eq. (2.3) and stateof-the-art projection techniques for matrix equations are employed to this end.In particular, Eq. (2.3) is projected only from the left onto a suitable subspace K m .The scheme we are going to present is very general and it is given for a generic approximation space K m .However, we will discuss certain implementation details like, e.g., the residual norm computation, in case of the extended and rational Krylov subspaces (1.3) and (1.4).Notice that the projection scheme given in the following is able to reduce only the space component of Eq. (2.3) since the peculiar structure of the time component, namely the matrices Σ j 's, does not allow for generic two-sided projection schemes as it is customary for large-scale Sylvester equations.However, in Sect. 5 we show how to combine the projection scheme with a novel procedure for dealing with the time component.The overall scheme results in a numerical procedure which is able to efficiently solve problems with a large number of degrees of freedom in both space and time.

Left Projection
Typically, projection methods for Sylvester equations of the form (2.3) construct an approximation U m = V m Y m ∈ R n× where the columns of V m represent an orthonormal basis of a suitable subspace K m .The matrix Y m can be computed, e.g., by imposing a Galerkin condition on the residual matrix R m : = (I , and select the coefficients of the BDF of order s 2 Generate the first basis block Compute Y m as the solution to (4.1) T .This Galerkin condition can be written as so that Y m is the solution of the reduced Sylvester equation where Thanks to the reduction in the space dimension we perform, equation (4.1) can be solved by means of general-purposed dense solvers for Sylvester equations like the Bartels-Stewart method [5] or the Hessenberg-Schur method presented in [19], which may be particularly appealing in our context due to the lower Hessenberg pattern of the Σ T j 's, whenever is moderate, say = O( 103 ).See also [7,Section 3].However, the computational cost of these procedure grows cubically with the dimension of the coefficient matrices in (4.1).Therefore, for sizable values of , namely when fine time-grids need to be employed, such methods are too demanding leading to excessive computational efforts.In Sect. 5 we illustrate a novel procedure whose computational cost is polylogarithmic in .
Once Y m is computed, the residual norm R m F is checked.If this is sufficiently small, the current approximation is returned.Otherwise, the space is expanded.
In Algorithm 1 a generic projection scheme with only left projection for Eq.(2.3) is summarized.
Notice that the initial residual norm δ in line 1 of Algorithm 1 can be computed at low cost by exploiting the properties of the Frobenius norm and the trace operator.
In many cases the dimension of the final space K m , namely the number of columns of V m , turns out to be much smaller than .See Sect.7. Therefore, to reduce the memory demand of Algorithm 1, we suggest to store only V m and Y m and not to explicitly assemble the solution matrix U m = V m Y m ∈ R n× .If desired, one can access the computed approximation to the solution u at time t k by simply performing V m (Y m e k ).

The Extended Krylov Subspace Method
A valid option for the approximation space K m is the extended Krylov subspace generated by Notice that we use only K d in the definition of the space instead of the whole coefficient matrix I + τβ K d .Indeed, all the spectral information about the spatial operator are collected in K d .See, e.g., [47] for a similar strategy in the context of extended Krylov subspace methods for shifted linear systems.
The basis p+s) , of the extended Krylov subspace can be constructed by the extended Arnoldi procedure presented in [46] and the following Arnoldi relation where , holds.By exploiting such relation, once Y m is computed, it is easy to show that the Frobenius norm of the residual matrix R m can be cheaply evaluated as See, e.g., [37, Section 5.2].

The Rational Krylov Subspace Method
In many contributions it has been shown that the rational Krylov subspace (1.4) is one of the most effective approximation spaces for the numerical solution of large-scale matrix equations.See, e.g., [14][15][16].If Eq. (2.3) needs to be solved, we can construct the rational Krylov subspace ) , and perform a left projection as illustrated in Sect.4.1.However, the employment of a rational Krylov subspace requires the careful implementation of certain technical aspects that we are going to discuss in the following.
The basis V m can be computed by an Arnoldi-like procedure as illustrated in [15, Section 2] and it is well-known that the quality of the computed rational Krylov subspace deeply depends on the choice of the shifts ξ ξ ξ employed in the basis construction.Effective shifts can be computed at the beginning of the iterative method if, e.g., some additional informations about the problem of interest are known.In practice, the shifts can be adaptively computed on the fly and the strategy presented in [15] can be employed to calculate the (m + 1)th shift ξ m+1 .The adaptive procedure proposed by Druskin and Simoncini in [15] only requires rough estimates of the smallest and largest eigenvalues of K d together with the Ritz values, i.e., the eigenvalues of the projected matrix T m , that can be efficiently computed in O(m 3 ( p + s) 3 ) flops.In all the examples reported in Sect.7 such a scheme is adopted for the shifts computation.
For the rational Krylov subspace method, the residual norm cannot be computed by performing (4.3) as an Arnoldi relation of the form (4.2) does not hold.An alternative but still cheap residual norm computation is derived in the next proposition.Proposition 4.1 At the m-th iteration of the rational Krylov subspace method, the residual matrix R m = (I where the matrix H m ∈ R (m+1)•( p+s)×m( p+s) collects the orthonormalization coefficients stemming from the "rational" Arnoldi procedure and H m ∈ R m( p+s)×m( p+s) is its principal square submatrix.
spans the rational Krylov subspace (4.4) then the following Arnoldi-like relation holds See, e.g., [15,41].Since the Arnoldi procedure is employed in the basis construction, H m is a block upper Hessenberg matrix with blocks of size p + s and we can write and collecting the matrix V m+1 E T m+1 H m H −1 m we get the result.
Proposition 4.1 shows that the convergence check requires to compute the Frobenius norm of a n × m( p + s) matrix when the rational Krylov subspace is employed.This operation can be carried out in O(nm( p + s)) flops by exploiting the cyclic property of the trace operator.

Structured Space Operators
In this section we show how the projection scheme presented above can largely benefit from the possible Laplace-like structure of the stiffness matrix K d and the tensorized nature of the spatial domain Ω.
In principle, one can apply the strategy proposed in Sect.4.1 and build the space with K d , also when this is in Kronecker form.However, if u 0 , f and g in (1.2) are separable functions in the space variables, the Kronecker structure of K 2 and K 3 can be exploited in the basis construction.More precisely, if, e.g., n = n d , only d subspaces of R n can be computed instead of one subspace of R n d leading to remarkable reductions in both the computational cost and the storage demand of the overall solution process.See, e.g., [25].The Laplace-like structure we exploit in this section is at the basis of the tensorized Krylov approach presented in [25] but it has been exploited also in [30] to derive an ADI iteration tailored to certain high dimensional problems.We first assume d = 2 and then extend the approach to the case of d = 3.Moreover, for the sake of simplicity, we assume that the backward Euler scheme is employed for the time integration and the extended Krylov subspace is selected as approximation space.The same machinery can be used for BDFs of larger order and the rational Krylov subspace.
If Ω h consists in n equidistant points in each direction (x i , y j ), i, j = 1, . . ., n, and u 0 = φ u 0 (x)ψ u 0 (y), then we can write 3) can be written as We further assume that the low-rank representation , is such that the separability features of the functions f and g are somehow preserved.In other words, we assume that we can write where Notice that this construction is not hard to meet in practice.See, e.g., Sect.7.With the assumptions above, it has been shown in [25] that the construction of a tensorized Krylov subspace is very convenient.In particular, we can implicitly construct the space ) is very advantageous in terms of both number of operations and memory requirements compared to the computation of EK m (K 2 , [u 0 , F 1 ]).For instance, only multiplications and solves with the n × n matrix K 1 are necessary while the orthogonalization procedures only involves vectors of length n.Moreover, at iteration m, we need to store the two matrices ), so that only 2m(q + r + 2) vectors of length n are allocated instead of the 2m( p + 1) vectors of length n 2 the storage of V m requires.Moreover, the employment of the tensorized subspace ) may also prevent some delay in the convergence of the adopted projection technique as shown in [38] for the case of the polynomial block Krylov subspace method.
Even if we construct the matrices W m and Q m instead of V m , the main framework of the extended Krylov subspace method remains the same.We look for an approximate solution of the form U m = (W m ⊗ Q m )Y m where the 4m 2 (q + 1)(r + 1) × matrix Y m is computed by imposing a Galerkin condition on the residual matrix T .Such Galerkin condition can be written as is the solution of the reduced Sylvester equation where q+1) , and The cheap residual norm computation (4.3) has not a straightforward counterpart of the form in our current setting.A different though cheap procedure for computing the residual norm at low cost is derived in the next proposition.

Proposition 4.2 At the m-th iteration of the extended Krylov subspace method, the residual
where Therefore, where we have exploited the orthogonality of the bases.
By having Q m , W m and Y m at hand, we can compute the approximation to the solution For 3-space-dimensional problems with separable data we can follow the same approach.If, then we can compute the subspaces ).The derivation of the method follows the same exact steps as before along with straightforward technicalities and we thus omit it here.
As already mentioned, the same machinery can be used in case of BDFs of larger order and/or when the rational Krylov subspace is selected as approximation space.

Exploiting the Circulant-Plus-Low-Rank Structure of the Time Operator
One of the computational bottlenecks of Algorithm 1 is the solution of the inner problems (4.1).For large , this becomes the most expensive step of the overall solution process.Therefore, especially for problems that require a fine time grid, a more computational appealing alternative than the naive application of decomposition-based method for Sylvester equations must be sought.
In principle, one may think to generate a second approximation space in order to reduce also the time component of the discrete operator in (2.3), in agreement with standard procedures for Sylvester equations.See, e.g., [48,Section 4.4.1].However, no extended Krylov subspace can be generated by s j=1 α j Σ j due to its singularity.A different option may be to generate the polynomial Krylov subspace with s j=1 α j Σ j .Nevertheless, this space is not very informative as the action of s j=1 α j Σ j on a vector v = (v 1 , . . ., v l ) T ∈ R only consists in a permutation -and scaling -of its components.Alternatively, one can try to apply an ADI iteration tailored to Sylvester equations [8].However, the shift selection for the right coefficient matrix s j=1 α j Σ j may be tricky.In the next section we propose a novel strategy that fully exploits the structure of s j=1 α j Σ j .Such a procedure, combined with the projection framework presented in Sect.4, leads to a very successful solution scheme which is able to efficiently solve equation of very large dimensions in both space and time.We start by illustrating our algorithm for the backward Euler scheme and we then generalize the approach for BDFs of larger order.

The Backward Euler Scheme
The matrix Σ 1 possesses a circulant-plus-low-rank structure.Indeed, it can be written as This relation has been exploited in [32] to design an effective preconditioner for (2.2) by dropping the low-rank term e 1 e T .We can use (5.1) to transform equation (4.1) into a generalized Sylvester equation of the form If m denotes the dimension of the current approximation space K m , by assuming m to be small, we can compute the eigendecomposition of the coefficient matrix I +τ T m , namely . ., λ m ) whereas, thanks to its circulant structure, C 1 can be diagonalized by the fast Fourier transform (FFT), i.e., where F denotes the discrete Fourier transform matrix.See, e.g., [20,Equation (4.7.10)].
Pre and postmultiplying Eq. (5.2) by S −1 m and F T respectively, we get ( The Kronecker form of Eq. (5.3) is Denoting by  Denoting by H ∈ R m× the matrix whose (i, j)-th element is given by 1/(λ i − e T j (F(C 1 e 1 ))), i = 1, . . ., m, j = 1, . . ., , since L is diagonal, we can write We now have a closer look at the matrix N T L −1 M in (5.4).The (i, j)-th entry of this matrix can be written as (5.5) Note the abuse of notation in the derivation above: e i , e j denote the canonical basis vectors of R m whereas e 1 , e the ones of R .An important property of the Hadamard product says that for any real vectors x, y and matrices A, B of conforming dimensions, we can write x T (A B)y = trace(diag(x)Adiag(y)B T ).By applying this result to (5.5), we get where δ i, j denotes the Kronecker delta, i.e., δ i,i = 1 and δ i, j = 0 otherwise.Equation (5.6) says that 3) can thus be computed by performing The linear solve L −1 w can be still carried out by exploiting the Hadamard product and the matrix H as To conclude, the matrix Y m can be computed by where and no Kronecker products are involved in such a computation.The computation of Y m by (5.7) is very advantageous.Indeed, its asymptotic cost amounts to O m 3 + (log + m 2 ) floating point operations.Moreover, all the computations involving the FFT in the construction of Z and W can be performed once and for all at the beginning of the iterative process.
The discrete Fourier transform matrix F is never explicitly assembled and in all the experiments reported in Sect.7 its action and the action of its inverse have been performed by means of the Matlab function fft and ifft respectively.
We would like to point out that the novel strategy presented in this section can be applied as a direct solver to Eq. (2.3) whenever the eigendecomposition of I + τ K d can be computed, e.g., if (1.2) is discretized on a coarse spatial grid or if this matrix can be cheaply diagonalized by, e.g., sine transforms as considered in [32].

s > 1
Similarly to what we did for s = 1, for a generic s we can write where C s ∈ R × is circulant and can be thus diagonalized by the FFT, namely where now As before, the action of L −1 can be carried out by exploiting the matrix H and the Hadamard product.In particular, and The inspection of the entries of the matrix N T L −1 M ∈ R s m×s m is a bit more involved than before.With abuse of notation, we start by recalling that the vector e j ∈ R s m , j = 1, . . ., s m, can be written as e j = vec(e k e T h ), Notice that in the second step above we have e T h [e 1 , . . ., e s ] T = e T h and, differently from the one in the left-hand side where e h ∈ R s , the vector in the right-hand side denotes the h-th canonical basis vector of R , h = 1, . . ., s.
By exploiting the same property of the Hadamard product used in the derivation presented in Sect.5.1, we have (5.9) Recalling that the indices in the above expression are such that i = r + m • (q − 1) and j = k + m • (h − 1), the relation in (5.9) means that N T L −1 M is a s × s block matrix with blocks of size m which are all diagonal.The (q, h)-th block of N T L −1 M is given by diag H F −T [e −s+1 , . . ., e ]α α α T s e q ) F e h .If

The Convection-Diffusion Equation
In this section we briefly discuss the case of time-dependent convection-diffusion equations as an example of non Laplace-like operators.We consider where ε > 0 is the viscosity parameter, and the convection vector w = w(x) is assumed to be incompressible, i.e., div(w) = 0.
As already mentioned, if K cd d ∈ R n× n denotes the matrix stemming from the discretization of the convection-diffusion operator L(u) = −εΔu + w • ∇u on Ω, the discrete problem can be recast in terms of the following Sylvester matrix equation when a BDF of order s is employed in the time integration.If d = 1 and w = φ(x), the matrix K cd 1 can be written as K cd 1 = εK 1 + Φ B 1 where K 1 denotes the discrete negative Laplacian whereas B 1 represents the discrete first derivative and the diagonal matrix Φ collects the nodal values φ(x i ) on its diagonal.
In [36], it has been shown that the 2-and 3D discrete convection-diffusion operators possess a Kronecker structure if the components of w are separable functions in the space variables and Ω = (0, 1) d .
If w = (φ 1 (x)ψ 1 (y), φ 2 (x)ψ 2 (y)) and Φ i , Ψ i are diagonal matrices collecting on the diagonal the nodal values of the corresponding functions φ i , ψ i , i = 1, 2, then In this case, we can take advantage of the Kronecker structure of K cd d to automatically include the boundary conditions in the matrix equation formulation of the time-dependent convection-diffusion equation.This can be done by combining the arguments of Sect.3.1 with the strategy presented in [36, Section 3].
Even though K cd d still has a Kronecker form, this is not a Laplace-like structure due to the presence of the extra terms containing B 1 in the definition (6.2) of K cd d .Therefore, the tensorized Krylov approach presented in [25] and adapted to our purposes in Sect.4.2 cannot be employed.This difficulty is strictly related to the fact that efficient projection methods for generic generalized Sylvester equations of the form have not been developed so far.The available methods work well if the coefficient matrices A i and B i fulfill certain assumptions which may be difficult to meet in case of the discrete convection-diffusion operators.See, e.g, [6,23,40,44] for more details about solvers for generalized matrix equations.Further research in this direction is necessary and worth pursuing since many different problems can be represented in terms of (6.3).

Numerical Results
In this section we compare our new matrix equation approach with state-of-the-art procedures for the solution of the algebraic problem arising from the discretization of time-dependent PDEs.Different solvers can be applied to (2.2) depending on how one interprets the underlying structure of the linear operator A. We reformulate (2.2) as a matrix equation but clearly A can be seen as a large structured matrix and well-known iterative techniques as, e.g., GMRES [43], can be employed in the solution of the linear system (2.2).The matrix A does not need to be explicitly assembled and its Kronecker structure can be exploited to perform "matrixvector" products.Moreover, one should take advantage of the low rank of the right-hand side vec([ s j=1 α j u 1− j , s j=2 α j u 2− j , . . ., α s u 0 , F 1 ][e 1 , . . ., e s , τβ F 2 ] T ) to reduce the memory consumption of the procedure.Indeed, if n is very large, we would like to avoid the allocation of any long n dimensional vectors and this can be done by rewriting the Krylov iteration in matrix form and equipping the Arnoldi procedure with a couple of low-rank truncations.These variants of Krylov schemes are usually referred to as low-rank Krylov methods and in the following we will apply low-rank GMRES (LR-GMRES) to the solution of (2.2).See, e.g., [6,9,22,51] for some low-rank Krylov procedures applied to the solution of linear matrix equations while [28] for details about how to preserve the convergence properties of the Krylov routines when low-rank truncations are performed.
Both the aforementioned variants of GMRES needs to be preconditioned to achieve a fast convergence in terms of number of iterations.In [32], it has been shown that the operator is a good preconditioner for (2.2).If right preconditioning is adopted, at each iteration of the selected Krylov procedure we have to solve an equation of the form P v = v m , where v m denotes the last basis vector that has been computed.Again, many different procedures can be employed for this task.In case of GMRES, we proceed as follows.We write and we solve the block diagonal linear system with I ⊗ (I n + τβ K d ) − Π s ⊗ I n by applying block-wise the algebraic multigrid method AGMG developed by Notay and coauthors [33][34][35].
In the low-rank Krylov technique framework, the allocation of the full basis vector v m ∈ R n is not allowed as we would lose all the benefits coming from the low-rank truncations.Since P v = v m can be recast in terms of a matrix equation, in case of LR-GMRES we can inexactly invert P by applying few iterations of Algorithm 1.Notice that in this case, due to the definition of P, the solution of the inner equations in Algorithm 1 is easier.Indeed, with the notation of Sect.5, we have Y m = S m Z F −T at each iteration m.However, since the extra computational efforts of computing Y m by (5.7) turned out to be very moderate with respect to the cost of performing Y m = S m Z F −T , we decided to run few iterations 5 of Algorithm 1 with the original operator instead of the preconditioner P.This procedure can be seen as an inner-outer Krylov scheme [49].
The preconditioning techniques adopted within GMRES and LR-GMRES are all nonlinear.We thus have to employ flexible variants of the outer Krylov routines, namely FGMRES [42] and LR-FGMRES.
We would like to underline that the concept of preconditioning does not really exist in the context of matrix equations.See, e.g., [48,Section 4.4].The efficiency of our novel approach mainly relies on the effectiveness of the selected approximation space.
In the following we will denote our matrix equation approach by either EKSM, when the extended Krylov subspace is adopted, or RKSM, if the rational Krylov subspace is employed as approximation space.The construction of both the extended and rational Krylov subspaces requires the solution of linear systems with the coefficient matrix K d (or a shifted version of it).Except for Example 7.5, these linear solves are carried out by means of the Matlab sparse direct solver backslash.In particular, for EKSM, the LU factors of K d are computed once and for all at the beginning of the iterative procedure so that only triangular systems are solved during the basis construction.The time for such LU decomposition is always included in the reported results.
To sum up, we are going to compare EKSM and RKSM with FGMRES preconditioned by AGMG (FGMRES+AGMG) and LR-FGMRES preconditioned by EKSM (LR-FGMRES+EKSM).The performances of the different algorithms are compared in terms of both computational time and memory requirements.In particular, since all the methods we compare need to allocate the basis of a certain Krylov subspace, the storage demand of each algorithm consists in the dimension of the computed subspace.The memory requirements of the adopted schemes are summarized in Table 2 where m indicates the number of performed iterations.
For LR-FGMRES, r i and z i denote the rank of the low-rank matrix representing the i-th vector of the unpreconditioned and preconditioned basis respectively.
Notice that for separable problems where the strategy presented in Sect.4.2 can be applied and n = n d , the memory requirements of EKSM and RKSM can be reduced to 2(m + 1) i=1 p i respectively, where p i denotes the rank of the initial block used in the construction of the i-th Krylov subspace, i = 1, . . ., d.
If not stated otherwise, the tolerance of the final relative residual norm is always set to 10 −6 .
All results were obtained by running MATLAB R2017b [31] on a standard node of the Linux cluster Mechthild hosted at the Max Planck Institute for Dynamics of Complex Technical Systems in Magdeburg, Germany. 6 We would like to mention that the operator A in (2.2) can be seen also as a tensor.In this case, the algebraic problem stemming from the discretization scheme thus amounts to a tensor equation for which different solvers have been proposed in the recent literature.See, e.g., [1,3,12,13].To the best of our knowledge, all the routines for tensor equations available in the literature include a rank truncation step to reduce the storage demand of the overall procedure.Most of the time, a user-specified, constant rank r is employed in such truncations and determining the value of r which provides the best trade off between accuracy and memory reduction is a very tricky task while the performance of the adopted scheme deeply depends on this selection.See, e.g., [1,Section 4].This drawback does not affect our matrix equation schemes where no rank truncation is performed while moderate memory requirements are still achieved as illustrated in the following examples.Moreover, tensor techniques are specifically designed for solving high dimensional PDEs and we believe they are one of the few multilinear algebra tools that are able to deal with the peculiar issues of such problems.However, here we consider problems whose dimensionality is at most 4 (d = 3 in space and one dimension in time).Due to the aspects outlined above, we refrain from comparing our matrix equation schemes with tensor approaches as a fair numerical comparison is difficult to perform.
This is a toy problem as the exact solution is known in closed form and it is given by u(x, t) = sin(x)e −t .With u at hand, we are able to calculate the discretization error provided by our solution process.Equation (7.1) is discretized by means of second order centered finite differences in space and a BDF of order s, s ≤ 6, in time.
In the following we denote by U m ∈ R n× the approximate solution computed by EKSM, by U exact the n × matrix whose i-th column represents the exact solution evaluated on the space nodal values at time t i whereas U backslash ∈ R n× collects the vectors computed by sequentially solving by backslash the linear systems involved in the standard implementation of BDFs. 7In particular, U backslash e k = (I + τβ K 1 ) −1 (τβf k + s j=1 α j U backslash e k− j ) where U backslash e k− j = u k− j for k = 1 and j = 1, . . ., s.We first solve the algebraic problem by EKSM with two different tolerances = 10 −12 , 10 −14 and we compare the obtained U m with U backslash .In Table 3 we report the results for n = 4096, s = 1 and different values of .
Looking at the timings reported in Table 3, since EKSM requires a (almost) costant number of iterations to converge for a given threshold and all the tested values of , we can readily appreciate how the computational cost of our novel approach mildly depends on while the time for the calculation of U backslash linearly grows with the number of time steps.
Moreover, we see how, for this example, we can obtain a very small algebraic relative error U m − U backslash F / U backslash F by setting a strict tolerance on the relative residual norm computed by EKSM.This means that, when we compare U m with U exact , the discretization error is the quantity that contributes the most to U m −U exact F / U exact F .In Fig. 1 we plot U m − U exact F / U exact F for different values of n, and s.Here U m is computed by setting = 10 −12 .In particular, in the picture on the left we plot the relative error for = 16384 and s = 1 while varying n.On the right, we fix n = 32,768 and we ) can be constructed.From the plots in Fig. 1 we can recognize how the convergence order of the tested discretization schemes is always preserved.Similar results are obtained for larger values of s, namely s = 4, 5, 6, provided either a larger n or a space discretization scheme with a higher convergence order is employed.

Example 7.2
The goal of the second example is to show that the framework we propose in this paper is not restricted to problems posed on tensorized spatial domains Ω whose discretization is carried out by a finite difference scheme.To this end, we consider the following 2D problem where Ω is an L-shaped domain obtained as the complement in [−1, 1] 2 of the quadrant (−1, 0] × (−1, 0].The finite elements method with Q 2 elements is employed in the space discretization of (7.2) and the stiffness matrix K 2 ∈ R n× n , the mass matrix M ∈ R n× n , and the source term F 1 ∈ R n are obtained by running the Matlab function diff_testproblem, problem 2, of the IFISS package [45]. 8The backward Euler scheme is used for the time integration.
The employment of the finite elements method in the discretization step leads to a discrete problem that can be represented in terms of the following generalized Sylvester equation As already mentioned at the end of Sect.2, if M = L L T is the Cholesky factorization of the mass matrix M, we can solve the transformed equation . Notice that the coefficient matrix L −1 K 2 L −T does not need to be explicitly constructed.See, e.g., [46,Example 5.4].If U m denotes the approximation computed by EKSM, we retrieve U m = L −T U m ≈ U.The goal of this example is to check whether our matrix equation approach is able to compute a meaningful solution also for the considered problem setting.To this end we compare the EKSM solution U m with the sequential solution of the linear systems involved in the backward Euler scheme.In particular, In Table 4 we collect the results for different values of n, and .In particular, Rel.Err.refers to the relative error U m − U backslash F / U backslash F .
We can notice that, for this example, EKSM needs a sizable number of iterations to attain the desired accuracy.Nevertheless, it is still very competitive and a reasonable computational time is needed to achieve the prescribed accuracy in terms of relative residual norm.More remarkably, the relative error between the solution computed by EKSM and U backslash is always very small.This confirms that our novel solution scheme is able to handle differential problems of the form (1.2) where the spatial domain Ω has a complex geometry and more sophisticated discretization schemes than finite differences are adopted.
(7.4) Equation (7.4) is discretized by means of second order centered finite differences in space with n nodes in each spatial direction, and the backward Euler scheme in time.
Since the initial condition is a separable function in the space variables, and both the source term and the boundary conditions are zero, the strategy presented in Sect.4.2 can be adopted.In particular if u 0 denotes the n = n 2 vector collecting the values of u 0 for all the nodal values (x i , y j ), then we can write u 0 = φ φ φ u 0 ⊗ ψ ψ ψ u 0 where φ φ φ u 0 = [x 1 (x 1 − 1), . . ., x n (x n − 1)] T , ψ ψ ψ u 0 = [y 1 (y 1 − 1), . . ., y n (y n − 1)] T ∈ R n .Therefore, the two extended Krylov subspaces EK m (K 1 , φ φ φ u 0 ) and EK m (K 1 , ψ ψ ψ u 0 ) can be constructed in place of EK m (K 2 , u 0 ).Similarly for the rational Krylov subspace method.
In Table 5 we report the results for different values of n and .As outlined in [32], the preconditioner P is very effective in reducing the total iteration count in FGMRES+AGMG and one FGMRES iteration is sufficient for reaching the desired accuracy for every value of n and we tested.However, the preconditioning step is very costly in terms of computational time; this almost linearly grows with .FGMRES+AGMG may benefit from the employment of a parallel implementation in the inversion of the block diagonal matrix I ⊗ (I + τ K 2 ) − Π 1 ⊗ I n 2 .Moreover, for the largest problem dimension we tested, the system returned an Out of Memory (OoM) message as we were not able to allocate any n 2 dimensional vectors.
LR-FGMRES+EKSM performs quite well in terms of computational time, especially for small n, and the number of iterations needed to converge is rather independent of both n and confirming the quality of the inner-outer preconditioning technique.
Our new algorithms, EKSM and RKSM, are very fast.We would like to remind the reader that, for this example, the number of degrees of freedom (DoF) is equal to n 2 .This means that, for the finest refinement of the space and time grids we tested, our routines are able to solve a problem with O 4 • 10 9 DoF in few seconds while reaching the desired accuracy.
The number of iterations performed by EKSM and RKSM turns out to be very robust with respect to and the (almost) constant iteration count we obtain for a fixed n lets us appreciate once again how the computational cost of our procedures modestly grows with .
The robustness of our routines with respect to is not surprising.Roughly speaking, the time component is solved exactly thanks to the strategy presented in Sect.5, whereas the projection procedure we perform only involves the spatial component of the overall operator, namely I − τ K 2 .Therefore, the effectiveness of the overall procedure in terms of number of iterations strictly depends on the spectral properties of I − τ K 2 which are mainly fixed for a given n although the mild dependence on due to the presence of the scalar τ .
Thanks to the separability of Eq. (7.4) and the employment of the strategy presented in Sect.4.2, EKSM and RKSM are very competitive also in terms of storage demand as illustrated in Table 5.
Once again, Eq. (7.5) is discretized by means of second order centered finite differences in space with n nodes in each spatial direction, and the backward Euler scheme in time.
Thanks to the separability of w, the spatial discrete operator K cd 2 has a Kronecker structure and it can be written as in (6.2).However, the presence of the extra terms containing the discrete first order derivative operator does not allow for the memory-saving strategy described in Sect.4.2.Nevertheless, the structure of K cd 2 can be exploited to easily include the boundary conditions in the matrix equation formulation.Moreover, since the initial condition is equal to the boundary conditions on the boundary nodes and zero otherwise, the boundary  The reported timings are in seconds 123 conditions do not depend on time, and the source term is zero everywhere, the right-hand side of Eq. (2.3) can be written as [u 0 , F 1 ][e 1 , τ [0, 1 −1 ] T ] T where, with a notation similar to the one used in Sect.3, on the boundary nodes and zero otherwise.
In Table 6 we report the results for different values of n, and the viscosity parameter ε.
We can notice that the preconditioner P within the FGMRES+AGMG procedure is still effective in reducing the outer iteration count.However, it seems its performance depends on the viscosity parameter ε.Moreover, also for this example the preconditioning step leads to an overall computational time of FGMRES+AGMG that is not competitive when compared to the one achieved by the other solvers.As in Example 7.3, an OoM message is returned whenever we try to allocate vectors of length n 2 for n 2 = = 65,536.However, for this example, also for n 2 = 16,384, = 65,536, and n 2 = 65,536, = 16,384, with the viscosity parameter ε = 0.01, the same error message is returned.Indeed, while the system is able to allocate only a moderate number of n 2 dimensional vectors, FGMRES+AGMG needs a sizable number of iterations to converge so that the computed basis cannot be stored. 9 restarted procedure may alleviate such a shortcoming.
LR-FGMRES+EKSM is very competitive in terms of running time as long as very few outer iterations are needed to converge.Indeed, its computational cost per iteration is not fixed but grows quite remarkably as the outer iterations proceed.This is mainly due to the preconditioning step.At each LR-FGMRES iteration k, EKSM is applied to an equation whose right-hand side is given by the low-rank matrix that represents the k-th basis vector of the computed space and the rank of such a matrix grows with k.This significantly increases the computational efforts needed to perform the 10 EKSM iterations prescribed as preconditioning step worsening the performance of the overall solution procedure.
Also for this example, the new routines we propose in this paper perform quite well and the number of iterations mildly depends on .
The performances of our solvers are also pretty robust with respect to ε and, especially for RKSM, it turns out that the number of iterations needed to converge gets smaller as the value of ε is reduced.In the steady-state setting this phenomenon is well-understood.See, e.g., [17,Section 4.2.2].In our framework, we can explain such a trend by adapting convergence results for RKSM applied to Lyapunov equations.Indeed, in [14,Theorem 4.2] it is shown how the convergence of RKSM for Lyapunov equations is guided by the maximum value of a certain rational function over the field of values W (A) : = {z * Az, z ∈ C n , z = 1} of the matrix A used to define the employed rational Krylov subspace.Roughly speaking, the smaller W (A), the better.In our context, even though we use K cd 2 to build K m (K cd 2 , [u 0 , F 1 ], ξ ξ ξ), the projection technique involves the whole coefficient matrix I − τ K cd 2 and we thus believe it is reasonable to think that the success of RKSM relies on the field of values of such a matrix.In Fig. 2 we plot the field of values of I −τ K cd 2 for n 2 = 65,536, = 1024, and different values of ε and we can appreciate how these sets are nested and they get smaller when decreasing ε.This may intuitively explains the relation between the RKSM iteration count and ε but further studies in this direction are necessary.
Even though the approach presented in Sect.4.2 cannot be adopted in this example, EKSM and RKSM are still very competitive also in terms of storage demand as illustrated in Table 6.We conclude this example by showing that our routines are also able to identify the physical properties of the continuous solution we want to approximate.In Fig. 3 we report the solution computed by EKSM for the case n 2 = 65,536 and = 1024.In particular, we report the solution at different time steps t 1 , t /2 , t (left to right) and for different values of ε (top to bottom).We remind the reader that our solution represents the temperature distribution in a cavity with a constant, hot external wall.Looking at Fig. 3, we can appreciate how the temperature distributes quite evenly in our domain for ε = 1.The smaller ε, the more viscous the media our temperature spreads in.Therefore, the temperature is different from zero only in a very restricted area of our domain, close to the hot wall, for ε = 0.1, 0.01.Notice that for ε = 0.01 and t 1 , the part of the domain where the temperature is nonzero is so narrow that is difficult to appreciate with the resolution of Fig. 3.For ε = 0.1, 0.01 we can also see how the temperature stops being evenly distributed as for ε = 1 but follows the circulating flow defined by the convection vector w.
Example 7.5 For the last example, we take inspiration from [36, Example 5] and consider the following 3D time-dependent convection-diffusion equation u t − Δu + w • ∇u = 0, in Ω × (0, 1], Ω : = (0, 1) 3 , u = 0, on ∂Ω, u 0 = g, (7.6)where w = (x sin x, y cos y, e z 2 −1 ) and g is such that −Δg + w • ∇g = 1, in Ω, g = 0, on ∂Ω.(7.7)Both (7.6) and (7.7) are discretized by centered finite differences in space with n nodes in each spatial direction, and the backward Euler scheme is used for the time integration of (7.6).Once (7.7) is discretized, we compute a numerical solution g ∈ R n 3 by applying the strategy presented in, e.g., [36], and then set u 0 = g.Also in this example the convection vector w is a separable function in the space variables and the stiffness matrix K cd 3 ∈ R n 3 ×n 3 can be written in terms of a Kronecker sum as illustrated in Sect.6.However, the initial value u 0 is not separable in general and we have to employ EK m (K cd 3 , u 0 ) and K m (K cd 3 , u 0 , ξ ξ ξ) as approximation spaces.It is well-known that sparse direct routines are not very well suited for solving linear systems with a coefficient matrix that stems from the discretization of a 3D differential operator, and iterative methods perform better most of the time.Therefore, the inner-outer GMRES method is employed to solve the linear systems involved in the basis construction of both EK m (K cd 3 , u 0 ) and K m (K cd 3 , u 0 , ξ ξ ξ).We set the tolerance on the relative residual norm for such linear systems equal to 10 −8 , i.e., two order of magnitude less than the outer tolerance.However, the novel results about inexact procedures in the basis construction of the rational and extended Krylov subspace presented in [27] may be adopted to further reduce the computational cost of our schemes.
Due to the very large number n 3 of DoFs we employ, in Table 7 we report only the results for EKSM and RKSM.
We can appreciate how our routines need a very reasonable time to meet the prescribed accuracy while maintaining a moderate storage consumption.For instance, the finest space and time grids we consider lead to a problem with O(10 11 ) DoFs and RKSM manages to converge in few minutes by constructing a very low dimensional subspace.
It is interesting to notice how the computational time of RKSM is always much smaller than the one achieved by EKSM.This is due to the difference in the time devoted to the solution of the linear systems during the basis construction.Indeed, in RKSM, shifted linear systems of the form K cd 3 −ξ j I have to be solved and, in this example, it turns out that GMRES is able to achieve the prescribed accuracy in terms of relative residual norm in much fewer iterations than what it is able to do when solving linear systems with the only K cd 3 as it is done in EKSM.The reported timings are in seconds

Conclusions
In this paper we have shown how the discrete operator stemming from the discretization of time-dependent PDEs can be described in terms of a single matrix equation.Our strategy can be applied to any PDE of the form u t + L(u) = f whenever L(u) is a linear differential operator involving only spatial derivatives, provided certain assumptions on the source term f and the boundary conditions are fulfilled.On the other hand, no particular hypotheses on the structure of the spatial domain Ω are needed.The matrix equation formulation of the discrete problem naturally encodes the separability of the spatial and time derivatives of the underlying differential operator.This lets us employ different strategies to deal with the spatial and time components of the algebraic problem and combine them in a very efficient solution procedure.In particular, state-of-the-art projection techniques have been proposed to tackle the spatial operator while the circulant-plus-lowrank structure of the time discrete operator has been exploited to derive effective solution schemes.
We have shown how to fully exploit the possible Kronecker structure of the stiffness matrix.Very good results are obtained also when this structure is not capitalized on in the solution process.Moreover, in Example 7.2 our method has been able to compute accurate numerical solutions for a heat equation on a L-shaped spatial domain whose discretization has been carried out by Q 2 finite elements.This means that our approach can be successfully applied also to problems which do not lead to a stiffness matrix that possesses a Kronecker form as, e.g., in case of spatial domains Ω with a complex geometry or when sophisticated discretization methods (in space) are employed.We believe that also elaborate space-time adaptive techniques [11,29] can benefit from our novel approach.In particular, our routines can be employed to efficiently address the linear algebra phase within adaptive schemes for fixed time and space grids.Once the grids have been modified, our solvers can deal with the discrete operator defined on the newly generated time-space meshes.Both EKSM and

. 4 )
With Y m at hand, we can recover Y m by simply performingY m = S m Y m F −T .We are thus left with deriving a strategy for the computation of Y m that should not require the explicit construction of L, M, and N to be efficient.In what follows denotes the Hadamard (element-wise) product.
S : = I + N T L −1 M and Z : = H S −1 m G m (F[e 1 , . . ., e s , τβ F 2 ]) T , then we denote by P the m × s matrix such that vec(P) = S −1 vec Z F −T [e −s+1 , . . ., e ]α α α T s and, to conclude, the solution Y m of the reduced problems (4.1) can be computed by

Table 1
Coefficients for the BDF of order s for s ≤ 6 , Lemma 2.2] and [26, Corollary 2.3] can be adapted to demonstrate an exponential (superexponential in case of entire function) decay in the singular values of [f 1 , . . ., f ].This can be done by simply transforming the interval [−1, 1] used in [26, Lemma 2.2] to the interval [0, T ].

Table 2
Storage demand of the compared methods

Table 3
Example 7.1Results for different values of Before comparing EKSM and RKSM with other solvers we would like to show first how our novel reformulation of the algebraic problem in terms of a Sylvester matrix equation is able to maintain the convergence order of the adopted discretization schemes.In particular, we present only the results obtained by EKSM as the ones achieved by applying RKSM are very similar.We consider the following 1D problem

Table 4
Example 7.2.Results for different values of n, and

Table 5
Example 7.2.Results for different values of

Table 6
Example 7.4.Results for different values of n,

Table 7
Example 7.5.Results for different values of n and