UvA-DARE (Digital Academic Repository) A wavelet-in-time, finite element-in-space adaptive method for parabolic evolution equations

In this work, an r -linearly converging adaptive solver is constructed for parabolic evolution equations in a simultaneous space-time variational formulation. Exploiting the product structure of the space-time cylinder, the family of trial spaces that we consider are given as the spans of wavelets-in-time and (locally refined) finite element spaces-in-space. Numerical results illustrate our theoretical findings.


INTRODUCTION
This paper is about the adaptive numerical solution of parabolic evolution equations written in a simultaneous space-time variational formulation.In comparison to the usually applied time-marching schemes, simultaneous space-time solvers offer the following potential advantages: • local, adaptive refinements simultaneous in space and time ([SY18, RS19,GS19]), • quasi-best approximation from the selected trial space ('Cea's lemma') ([And13, LM17, SW21]), being a necessary requirement for proving optimal rates for adaptive routines ([CS11, KSU16, RS19]), • superior parallel performance ([DGvdZ18, NS19, HLNS19, vVW21a]), • using the product structure of the space-time cylinder, sparse tensor product approximation ([GO07, CS11, KSU16, RS19]) which allows to solve the whole time evolution at a complexity of solving the corresponding stationary problem.
In any case without applying sparse tensor product approximation, a disadvantage of the space-time approach is the larger memory consumption because instead of solving a sequence of PDEs on a d-dimensional space, one has to solve one PDE on a (d + 1)-dimensional space.This disadvantage, however, disappears when one needs simultaneously the whole time evolution as for example with problems of optimal control ([GK11, BRU20]) or data-assimilation ([DSW21]).
An application of a variational formulation of the PDE over space and time leads to an equation where, with X := L 2 (I; V) ∩ H 1 (I; V ) and Y := L 2 (I; V), the operator at the left hand side is boundedly invertible X → Y × H.
1.2.Our previous work.In [CS11,RS19] we equipped X and Y with Riesz bases being tensor products of wavelet bases in space in time, and H with some spatial Riesz basis.Consequently, the equation (1.1) got an equivalent formulation as a bi-infinite well-posed matrix-vector equation B γ 0 u = g u 0 (actually, in [RS19], we considered a formulation of first order, and in [CS11] we used a variational formulation with essentially interchanged roles of X and Y, which however is irrelevant for the current discussion).To get a coercive bilinear form we formed normal equations to which we applied an adaptive wavelet scheme ( [CDD01]).
With such a scheme the norm of a sufficiently accurate approximation of the (infinite) residual vector of a current approximation is used as an a posteriori error estimator.The coefficients in modulus of this vector are applied as local error indictors in a bulk chasing (or Dörfler marking) procedure.The resulting adaptive algorithm converges at the best possible rate in linear computational complexity.
The goal of the current work is to investigate to what extent similar optimal theoretical results can be shown when we replace the wavelets-in-space by finite element spaces.In any case for locally refined partitions, a disadvantage will be that these spaces cannot be equipped with bases that are uniformly 'stable' w.r.t. a range of Sobolev norms.On the other hand, the application of finite elements has quantitative advantages, as provided by a very localized single-scale basis and efficient multigrid preconditioners, and it avoids a cumbersome construction of wavelet bases on domains that are not hyperrectangles.1.3.Least squares minimization.Without having Riesz bases for X and Y, already the step of first discretizing and then forming normal equations does not apply, and we reverse their order.A problem equivalent to (1.1) is to compute An obvious approach for the numerical approximation is to consider the minimization over finite dimensional subspaces X δ of X, which however is not feasible because of the presence of the dual norm.
For trial spaces X δ that are 'full' (or 'sparse') tensor products of finite element spaces in space and time, in [And13] it was shown how to construct corresponding test spaces Y δ ⊂ Y of similar type and dimension, such that (X δ , Y δ ) is uniformly inf-sup stable meaning that when the continuous dual norm • Y is replaced by the discrete dual norm • Y δ , a minimization over X δ yields a quasi-best approximation to u from X δ .Such a family of trial spaces however does not allow to create a nested sequence of trial spaces by adaptive local refinements.

Family of inf-sup stable pairs of trial and test spaces.
To construct an alternative, considerably larger family, let Σ be a wavelet Riesz basis for L 2 (0, T) that, after renormalization, is also a Riesz basis for H 1 (0, T).We equip this basis with a tree structure where every wavelet that is not on the coarsest level has a parent on the next coarser level.In space, we consider the collection of all linear finite element spaces that can be generated by conforming newest vertex bisection starting from an initial conforming partition of a polytopal Ω into d-simplices.The restriction to linear finite elements is not essential and is made for simplicity only.Now we consider trial spaces X δ that are spanned by a number of wavelets each of them tensorized with a finite element space from the aforementioned collection.In order to be able to apply the arising system matrices in linear complexity ( [KS14,vVW21b]) we impose the condition that if a wavelet tensorized with a finite element space is in the spanning set, then so is its parent wavelet tensorized with a finite element space that includes the former one.
The infinite collection of finite element spaces can be associated with a hierarchical 'basis' that can be equipped with a tree structure.Each hierarchical basis function, except those on the coarsest level, is associated with a node ν that was inserted as the midpoint of an edge connecting two nodes on the next coarser level, which nodes we call the parents of ν.With this definition there is a one-to-one correspondence between the finite element spaces from our collection and the spans of the sets of hierarchical basis functions that form trees. Consequently, our collection of trial spaces X δ consists of the spans of sets of tensor products of waveletsin-time and hierarchical basis functions-in-space which sets are downwards closed, also known as lower, in the sense that if a pair of a wavelet and a hierarchical basis function is in the set, then so are all its parents in time and space.Spaces from this collection can be 'locally' expanded by adding the span of a tensor product of a wavelet and hierarchical basis function one-by-one.
For this family of spaces X δ we construct a corresponding family of spaces Y δ ⊂ Y of similar type such that each pair (X δ , Y δ ) is uniformly inf-sup stable, with the dimension of Y δ being proportional to that of X δ .Furthermore, using the properties of the wavelets in time and by applying multigrid preconditioners in space we construct optimal preconditioners at X and Y-side which allow a fast solution of the discrete problems argmin 1.5.Adaptive algorithm.Having fixed the family of trial spaces, it remains to develop an algorithm that selects a suitable, preferably quasi-optimal, nested sequence of spaces from the family adapted to the solution u of (1.2).The theory on adaptive (Ritz-)Galerkin approximations for such quadratic minimization problems is in a mature state.As noticed before, however, straightforward Galerkin approximations for (1.2), i.e., approximations obtained by replacing X by X δ are not computable due to the presence of the dual norm.
Therefore given X δ , let X δ ⊃ X δ be such that saturation holds, i.e., for some constant ζ < 1, it holds that inf w∈X δ u − w X ≤ ζ inf w∈X δ u − w X .We now replace problem (1.2) by where in the notation u δδ the first instance of δ refers to the space Y δ and the second to the space X δ.Its (computable) Galerkin approximation from X δ is given by By a standard adaptive procedure, described below, we expand X δ to some X δ ⊆ X δ such that u δ δ is nearer to u δδ than u δδ is.Next, we replace Y δ by Y ¯δ (being the test space corresponding to X ¯δ ) and repeat (i.e.consider (1.3) with ( δ, δ) reading as ( ¯δ, ¯δ), and improve its Galerkin approximation u ¯δ δ from X δ by an adaptive enlargement of the latter space).
The adaptive expansion of the trial space X δ to X δ will be by the application of the usual solve-estimate-mark-refine paradigm, where the error indicators are given by the coefficients of the residual vector w.r.t.(modified) tensor product basis functions that were added to X δ to create X δ.In order for this collection of additional tensor product basis functions to be stable in X-norm, for this step we modify the hierarchical basis functions such that they get a vanishing moment, and therefore become closer to 'real' wavelets.
Under the aforementioned saturation assumption, we prove that the overall adaptive procedure produces an r-linearly converging sequence to the solution.
1.6.Numerical results.We tested the adaptive algorithm in several examples with a two-dimensional spatial domain.In all but one case, we observed a convergence rate equal to 1/2, being the best that can be expected in view of the piecewise polynomial degree of the trial functions and the tensor product construction, and for non-smooth solutions improving upon usual non-adaptive approximation.Only for the case where u 0 = 1 and homogenous Dirichlet boundary conditions are prescribed, the observed rate was reduced to 0.4.It is unknown whether or not this is the best non-linear approximation rate for our family of trial spaces.
Thanks to the use of optimal preconditioners and that of a carefully designed matrix-vector multiplication routine, which generalizes such a routine for sparsegrids introduced in [BZ96] to adaptive settings, we observe that throughout the whole execution of the adaptive loop the total runtime remains proportional to the current number of unknowns.1.7.Organization.This paper is organized as follows: In Sect. 2 the well-posed space-time variational formulation of the parabolic problem is discussed, and in Sect. 3 we discuss its inf-sup stable discretisation.The adaptive solution procedure is presented in Sect.4, and its convergence is proven.The construction of the trial and test spaces is detailed in Sect.5, and optimal preconditioners are presented.In Sect.6, the definition of the enlarged space X δ is given, and the construction of a stable basis of a stable complement space of X δ in X δ is outlined.Numerical results are presented in Sect.7, and a conclusion is formulated in Sect.8.For normed linear spaces E and F, by L(E, F) we will denote the normed linear space of bounded linear mappings E → F, and by Lis(E, F) its subset of boundedly invertible linear mappings E → F. We write E → F to denote that E is continuously embedded into F.For simplicity only, we exclusively consider linear spaces over the scalar field R.

SPACE-TIME FORMULATIONS OF A PARABOLIC EVOLUTION PROBLEM
Let V, H be separable Hilbert spaces of functions on some "spatial domain" such that V → H with dense embedding.Identifying H with its dual, we obtain the Gelfand triple V → H H → V .
For a.e.t ∈ I := (0, T), let a(t; given a forcing function g and an initial value u 0 , we are interested in solving the parabolic initial value problem to finding u such that In a simultaneous space-time variational formulation, the parabolic PDE reads as finding u from a suitable space of functions of time and space such that for all v from another suitable space of functions of time and space.One possibility to enforce the initial condition is by testing it against additional test functions.A proof of the following result can be found in [SS09], cf.[DL92, Ch.XVIII, §3] and [Wlo82, Ch.IV, §26] for slightly different statements. where for t ∈ Ī, γ t : u → u(t, •) denotes the trace map.That is, assuming g ∈ Y and u 0 ∈ H, finding u ∈ X such that is a well-posed simultaneous space-time variational formulation of (2.3).
For simplicity, additionally we assume that a(t; •, •) is symmetric, and define

Because
A 0 0 Id ∈ Lis(Y × H, Y × H), an equivalent formulation of (2.4) as a self-adjoint saddle point equation reads as finding (µ, σ, u) ∈ Y × H × X (where We equip Y and X with 'energy'-norms H , which are equivalent to the canonical norms on Y and X.Notice that (2.5)-(2.7)are the Euler-Langrange equations that result from the least squares problem (1.2).

DISCRETIZATIONS
3.1.Galerkin discretization of the Schur complement equation.Let (X δ ) δ∈∆ be a collection of closed, e.g, finite dimensional, subspaces of X, so equipped with • X .We will specify such a family in Sect.5-6.1.We define a partial order on ∆ by For δ ∈ ∆, let u δ ∈ X δ denote the Galerkin approximation to the solution u of (2.7), i.e., the solution of being the best approximation to u from X δ w.r.t.• X .This u δ is the solution of (1.2) with the minimization being performed over X δ instead of over X.
For proving convergence of an adaptive solution routine, as well as for a posteriori error estimation, we shall make the following assumption.Assumption 3.1 (Saturation).There exists a collection of subspaces δ where δ δ, and some fixed constant ζ < 1 such that for all δ ∈ ∆, assuming that (g, Notice that above assumption cannot be valid without restrictions on the right-hand side f = B A −1 g + γ 0 u 0 ∈ X .Indeed for any X δ ⊂ X δ X, con- sider a non-zero f ∈ X that vanishes on X δ.Then u δ = u δ = 0 = u, violating (3.2).For now, we will operate under the restrictive assumption that whenever we apply (3.2) (visible by the appearance of the constant ζ) we simply assume that (g, u 0 ) ∈ δ G × δ U 0 .Later, in Sect.4.3, we will remove this assumption.
The discretized problem from (3.1) only serves theoretical purposes.Indeed, since the Schur complement operator S contains the inverse of A, there is no way to determine u δ exactly.The reason to introduce (3.1) is that S is an elliptic operator, so that for δ δ we can make use of u − u δ 2 X , being a crucial tool for proving convergence of adaptive algorithms.

Uniformly stable Galerkin discretization of the saddle-point formulation.
Our numerical approximations will be based on Galerkin discretizations of the saddle-point formulation (2.6).Let (Y δ ) δ∈∆ be a collection of closed subspaces of Y, so equipped with • Y , such that Notice that 1 − γ ∆ can be made arbitrarily small by selecting, for each δ ∈ ∆, Y δ sufficiently large in relation to Below we show that (3.5)-(3.6)are uniquely solvable.In 'operator language', (3.5) is the Galerkin discretization of (2.6) on the closed subspace As we will see, however, for Y δ , and thus Y δ, 'large' in relation to X δ , u δδ will be 'close' to u δ .This will allow us to show that (r-linear) convergence of a sequence of Galerkin solutions u δ of (3.1) implies that of the corresponding sequence u δδ .
We equip X δ with a family of 'energy' norms By definition of γ ∆ it holds that Similar to Lemma 2.2 we have the following result.
By using additionally (3.4) this result shows that (S δδ •)(•) is coercive on X δ × X δ so that (3.6), and thus (3.5), has a unique solution.Moreover, we have the following result.
3.3.Modified discretized saddle-point.In view of obtaining an efficient implementation, in the definition of (µ δδ , u δδ ) in (3.5), and so in that of S δδ and f δδ in for some constant κ ∆ ≥ 1 (i.e.K δ Y is an optimal (self-adjoint and coercive) preconditioner for E δ Y AE δ Y ), and which can be applied at linear cost.The resulting system (3.6) is now amenable to the application of the (preconditioned) conjugate gradient iteration.Despite this modification, we keep using the old notations for µ δδ , u δδ , S δδ , • X δδ := (S δδ •)(•) 1 2 , and f δδ .As shown in [SW21, Remark 3.8], instead of (3.8) now it holds that whereas one deduces that (3.7) now should be read as For our forthcoming analysis, we will need κ ∆ γ ∆ − 1 to be sufficiently small.
Remark 3.5.Later, in the proof of Proposition 4.5, temporarily we will consider the system (3.5) with δ = δ (i.e.Y δ = Y δ ), but with X δ replaced by X, and, as we do in the current subsection, + γ 0 γ 0 will be denoted as S δ∞ .Note that the exact solution solves X we find the Galerkin orthogonality (S δ∞ (u−u δδ ))(X δ ) = 0.It holds that is only a semi-norm on X, which is equal to • X δδ on X δ , and (3.12) ) taking δ := δ from Assumption 3.1.So for a given 'trial space' X δ , we employ Y δ as 'test space', which is known to be sufficiently large to give stability even when employed with trial space X δ X δ .We will use this room to (adaptively) expand X δ to some X δ ⊂ X δ while keeping Y δ fixed.Then in a second step we adapt the test space to the new trial space, i.e., replace Y δ by Y ¯δ .By doing so will construct a sequence

CONVERGENT ADAPTIVE
As a first step, in the next lemma it is shown that if one constructs from w ∈ X δ a v ∈ X δ that is closer to the best approximation u δ to u from X δ, then, thanks to Assumption 3.1, v is also closer to u. Lemma 4.1.Let w ∈ X δ , v ∈ X δ be such that for some ρ ≤ 1, X , where we used Assumption 3.1 and u − u δ X ≤ u − w X .
Notice that u δδ is the Galerkin approximation from X δ to the solution u δδ ∈ X δ of the system S δδ u δδ = f δδ , i.e., it is its best approximation from X δ w.r.t.• X δδ .In the next proposition it is shown that an improved Galerkin approximation from an intermediate space X δ ⊇ X δ ⊇ X δ , i.e., the function u δ δ, is, for κ ∆ γ ∆ − 1 sufficiently small, also an improved approximation to u, and that this holds true also for u ¯δ δ.The latter function will be the successor of u δδ in our converging sequence.

Bulk chasing and a posteriori error estimation.
To realize (4.1), i.e., to construct from the Galerkin approximation u δδ to u δδ an improved Galerkin approximation u δ δ , we apply the concept of bulk chasing (or Dörfler marking) on a collection of a posteriori error indicators that constitute an efficient and reliable error estimator.We will apply an estimator of 'hierarchical basis' type ([ZMD + 11]): A suitable collection Θ δ will be constructed in Sect.6.1.Proposition 4.3.Assume (4.2).Let r δ δ := ( f δδ − S δδ u δδ )(Θ δ ), being the residual vector of u δδ .Let J ⊆ J δ be such that for some constant ϑ ∈ (0, 1], and, for some δ δ, let is valid, and so, when Proof.As a consequence of (4.2) and (3.11), we have We infer that which completes the proof of (4.1).The final statement follows from an application of Proposition 4.2.
Moreover, r δ δ is an efficient and reliable a posteriori estimator for u − u δδ X : Proposition 4.4.
As we already have noted in the proof of Proposition 4.2, (3.10) is equivalent to The proof is completed by (3.11) and m , where the latter inequalities follow from (4.3) by reading (J, ϑ, δ) as (J δ , 1, δ).
Next we present an alternative a posteriori error estimator that does not rely on (4.2), that we expect to be more accurate, and that is computable at the cost of one additional inner product.
It provides a computable, and efficient and reliable a posteriori error estimator.Indeed, Proof.The equality follows directly from (Bu, and From (4.6), the triangle-inequality, (3.11), and (4.5) we obtain for v ∈ Conversely, we have by again applying Assumption 3.1.
Notice that the estimator from Proposition 4.5 is exact when ζ = 0 and κ ∆ = 1 = γ ∆ , where the one from Proposition 4.3, for v = u δδ , is exact only when additionally m = 1 = M.

Data oscillation.
In view of the discussion following Assumption 3.1, notice that all results obtained so far that depend on the 'saturation constant' ζ, i.e., Lemma 4.1, and Propositions 4.2, 4.3, and 4.4, are only valid if (g, Let us now consider the situation that the solutions and residuals in these statements refer to solutions and residuals with the true data (g, u 0 ) ∈ Y × H being replaced by an approximation ( δ g, δ u 0 ) ∈ δ G × δ U 0 .In the following we denote such solutions and residuals with an additional left superscript δ, or more generally δ when a right-hand side ( δg, δu 0 ) ∈ δG × δU 0 has been used for their computation.Proposition 4.6.Assume (4.2), and let ϑ ∈ (0, 1] be a constant.Then for κ ∆ γ ∆ − 1 and a constant ω > 0 both being sufficiently small dependent on ϑ, with max( κ ∆ γ ∆ − 1, ω) ↓ 0 when ϑ ↓ 0, there exists a constant ρ < 1 such that for J ⊆ J δ with δ r δ δ | J ≥ ϑ δ r δ δ , and X δ + span Θ δ | J ⊆ X δ, and Proof.In the introduced notations, the statements of Propositions 4.3 and 4.4 read and The proof is easily completed by In view of the latter proposition, we make the following assumption.
Assumption 4.7.We assume to have maps of the following types available: Notice that this in particular means that for any ε > 0 we are able to find a δ ∈ ∆ and ( δ g, A specification of a suitable family ( δ G, δ U 0 ) δ∈∆ will be given in Sect.6.4.

A convergent algorithm.
In view of the statement of Proposition 4.6, in the following we will use the short-hand notations (cf. (3.6) and Sect.3.3), and (4.9) Instead of solving (4.8) exactly, we will allow it to be solved approximately with a sufficiently small relative tolerance by the application of an iterative method.To that end, we assume to have available a K δ X = K δ X ∈ Lis(X δ , X δ ) for which both (4.10) X is an optimal (self-adjoint and coercive) preconditioner for S δδ ), and which can be applied at linear cost.Besides for an efficient iterative solving of (4.8), we will use this preconditioner to compute a quantity t δ that is equivalent to the Xnorm of the algebraic error in an approximation ũδ ∈ X δ to u δ .The residual vector rδ corresponding to ũδ is defined as in (4.9) by replacing u δ by ũδ .
When passing the until-clause, it holds that t δ ≤ ξ( rδ + η(δ) + t δ ), and so by using ξ < 1 kicking back t δ , Taking ξ small enough and kicking back t δ , we obtain t δ ξ( u − u δ X + η(δ)), and similarly When passing the until-clause, furthermore we have (4.16) Denoting δ at the subsequent passing of the until-clause as δ, we have By using t δ ξ( u − ũ δ X + η( δ)) ((4.14)) and kicking back u − ũ δ X , we infer that for ξ sufficiently small, (4.17) In the case that for some constant C > 0, and so Now let ϑ > 0 be such that 2ϑC + 1 2 < 1.Given a constant ω (which later will be selected such that ω/ϑ is sufficiently small), let ( κ ∆ γ ∆ − 1)/ω, (1 − ρ 1 )/ω, ξ/ω be sufficiently small such that the expression Next we consider the other case that We have and so by taking ξ sufficiently small and kicking back r δ − rδ , also By taking ξ sufficiently small and kicking back r δ , we conclude that there exists a constant θ > 0 such that r δ | J ≥ θ r δ .Assuming that κ ∆ γ ∆ − 1 and ω are small enough, using (4.24) an application of Proposition 4.6 shows that there exists a constant ρ 3 < 1 such that u − ũδ X + η(δ), and so from (4.20) by kicking back η(δ) and taking ω sufficiently small, (4.26) So for ω/ϑ and ξ/ϑ sufficiently small, also in this case we established a reduction of ϑ u − ũ δ X + η( δ) by at least a constant factor less than 1.

WAVELETS-IN-TIME TENSORIZED WITH FINITE-ELEMENTS-IN-SPACE
We specify the parabolic problem at hand, as well as the type of families (X δ ) δ∈∆ and (Y δ ) δ∈∆ of 'trial' and 'test' spaces.The specification of a likely harmless minor further restriction to these families that will be needed for the construction of an X-stable collection Θ δ that spans an X-stable complement space of X δ in X δ, specifically condition (4.2), will be postponed to Sect.6.1.

Continuous problem.
For some bounded domain Ω ⊂ R d , we take H = L 2 (Ω) and, for some closed with ess inf c ≥ 0, and |Γ D | > 0 when the latter ess inf is zero.W.l.o.g.we take T = 1.

Wavelets in time.
We will construct the trial and test spaces as the span of wavelets-in-time tensorized with finite element spaces-in-space.In this subsection we collect some assumptions on the wavelets.
The definitions of parents and children give rise to obvious notions of ancestors and descendants.
For some wavelets bases, e.g.Alpert wavelets ( [Alp93]), it suffices to take S Σ (λ We assume that Σ is a Riesz basis for L 2 (I), and, when renormalized in H 1 (I)norm, it is a Riesz basis for H 1 (I).Although not essential, thinking of wavelets being (essentially) constructed by means of dilation, we assume that At the 'test side' we consider a similar collection Ψ = {ψ µ : µ ∈ ∨ Ψ } of wavelets, with the difference though that this one has to be an even orthonormal basis for L 2 (Ω), whilst these wavelets do not need to be in H 1 (I).
We will assume that Σ and Ψ are selected such that for any ∈ N 0 , 5.3.Uniform stability.In the following proposition, we further specify the type of families of trial and test spaces that we consider, and formulate sufficient conditions for the requirements (3.3)-(3.4),which implied uniform stability of the Galerkin discretizations of our saddle-point problem (2.6).
Proof.For w λ ∈ W δ λ and w := by the first assumption.Similarly ∂ t w = ∑ µ∈∨ Ψ ψ µ ⊗ ṽµ where ṽµ := ∑ λ∈∨ Σ σ λ , ψ µ L 2 (I) w λ .For any ε > 0, the second assumption shows that for any µ ∈ ∨ Ψ , there exists a In order to be able to apply at linear cost the arising linear operators in (4.8)-(4.9),we will restrict the type of trial spaces X δ = ∑ λ∈∨ Σ σ λ ⊗ W δ λ by imposing the following tree condition For the same reason the analogous condition will be needed for Y δ .For X δ that satisfies (5.4), below the latter will be verified, and sufficient, more easily verifiable conditions for (5.2)-(5.3)are derived.
for some constant γ ∆ > 0. Then the conditions (5.2) and (5.3) from Proposition 5.1 for uniform stability are satisfied.
When dim V δ µ dim " W δ µ , then dim Y δ dim X δ , and under the natural condition that a larger " W δ µ gives rise to a larger (more precisely, not smaller) V δ µ , the constructed Y δ satisfies the tree condition , so that (5.2) and (5.3) are guaranteed by the selection of λ , and the fact that for any λ ∈ ∨ Σ , the number of µ ∈ ∨ Ψ with |µ| = |λ| and |S Ψ (µ) ∩ S Σ (λ)| > 0 is uniformly bounded.
As follows from [DSW21, Thm.3.11] (taking B to be the Riesz map H → H ) condition (5.5) has the following equivalent formulation.Proposition 5.3.Condition (5.5) is equivalent to existence of uniformly bounded pro- 5.4.Selection of the spatial approximation spaces as finite element spaces.We will select the spaces W δ λ from a collection O of finite element spaces in V, which collection is closed under taking (finite) sums, and for which (5.7) inf Consequently, the stability conditions (3.3)-(3.4)are satisfied for some γ ∆ > 0 by simply taking in Proposition 5.2, (5.8) V δ µ := " W δ µ ∈ O. Proposition 5.3 shows that (5.7) is equivalent to uniform boundedness w.r.t. the V-norm of the H-orthogonal projector onto W ∈ O.An example of a collection O for which the latter is true is given by the set of all finite element spaces W δ λ w.r.t.quasi-uniform, uniformly shape regular conforming partitions of Ω into, say, d-simplices.
It is known that uniform boundedness w.r.t. the V-norm of the H-orthogonal projector holds also true for finite element spaces w.r.t.locally refined partitions as long as the grading of the partitions is sufficiently mild.Recently, in [DST20], it was shown that in d = 2 spatial dimensions, and for any polynomial order, the collection of all conforming partitions that can be generated by newest vertex bisection (NVB), starting from a fixed conforming initial partition T ⊥ with an assignment of the newest vertices that satisfies a so-called matching condition, is sufficiently mildly graded in the above sense.Since the overlay of two conforming NVB partitions is a conforming NVB partition, this collection is closed under taking (finite) sums.In other words, with this collection of finite element spaces, which we will employ in our experiments, again the choice (5.8) guarantees uniform stability.
Also in [DST20], a similar result was shown in d = 2 for red-blue-green refinement with any polynomial order.Unfortunately, for d > 2 such results seem not yet to be available.
Remark 5.4 (Getting γ ∆ close to 1).We discussed uniform boundedness w.r.t. the Vnorm of the H-orthogonal projectors onto a family of finite element spaces, which, by taking V δ µ := " W δ µ in Proposition 5.2, yields the uniform inf-sup condition (3.4) for some value γ ∆ > 0, and so uniform stability of the Galerkin discretizations of the saddle-point (2.6).
For proving convergence of our adaptive routine Algorithm 4.8, however, we needed a value of γ ∆ > 0 that is sufficiently close to 1.Although in our numerical experiments, reported on in Sect.7, with continuous piecewise linear finite element spaces generated by conforming NVB and V δ µ = " W δ µ , the adaptive routine is r-linearly converging, there is no guarantee that 1 − γ ∆ is sufficiently small.
Restricting to quasi-uniform partitions, it can be shown that 1 − γ ∆ can be made arbitrarily small by taking the mesh underlying V δ µ to be a sufficiently deep, but fixed refinement of the mesh underlying " W δ µ .One may conjecture that the same result holds true for sufficiently mildly graded locally refined meshes.5.5.Best possible rates.Although we have not proved it, we expect that the sequence of approximations generated by our adaptive Algorithm 4.8 is not only r-linearly converging, but, ignoring data oscillation, that it is a sequence of approximations from the family (X δ ) δ∈∆ that converges with the best possible rate.In this subsection, under some (mild) smoothness conditions on the solution u, we show that for our selection of the (X δ ) δ∈∆ this best possible rate equals the rate of best approximation to the solution of the corresponding stationary problem from the spatial finite element spaces w.r.t. the V-norm.
Consider a family of spaces X δ = ∑ λ∈∨ Σ σ λ ⊗ W δ λ that satisfies (5.4), with the W δ λ selected from a collection of finite element spaces O that in any case contains all spaces that correspond to uniform refinements of some initial partition of Ω.Let Σ a collection of wavelets of order d t , and assume that the finite element spaces are of order d x .When for each X δ , the space Y δ is selected as in Proposition 5.2, then the combination of (3.10) and the analysis from [SS09, Sect.7.1] shows that if the exact solution u of our parabolic problem satisfies the mixed regularity condition u ∈ H d t (I) ⊗ H d x (Ω), then a suitable (non-adaptive) choice of the spaces W δ λ yields a sequence of solutions u δδ ∈ X δ (for arbitrary Y δ ⊃ Y δ ) of the modified discretized saddle-point from Sect 3.3, for which Note that for d t − 1 ≥ d x −1 d , the rate d x −1 d equals the best rate in the V-norm when the finite element spaces are employed for solving the corresponding stationary problem, which is posed on For an optimal adaptive choice of the finite element spaces W δ λ from a sufficiently 'rich' collection that contains locally refined partitions, as the collection of all conforming NVB partitions, it can be expected that the rate min(d t − 1, d x −1 d ) is realized under much milder regularity conditions on u. 5.6.Preconditioners.Our adaptive algorithm for the parabolic problem requires optimal preconditioners for E δ Y AE δ Y and S δδ , see (3.9) and (4.10).That is, for both Z = Y and Z = X and δ ∈ ∆, we need operators , which moreover should be applicable at linear cost.To construct these preconditioners, for Z ∈ {Y, X} we will select a symmetric, bounded, and coercive bilinear form on Z × Z, and after selecting some basis for Z δ , we will construct a matrix K δ Z = K δ Z that can be applied in linear complexity, and that is uniformly spectrally equivalent to the inverse of the stiffness matrix corresponding to this bilinear form (being the matrix representation of the linear mapping Z δ → Z δ defined by the bilinear form w.r.t. the basis for Z δ and the corresponding dual basis for Z δ ).Then K δ Z ∈ L(Z δ , Z δ ), defined as the operator whose matrix representation is K δ Z w.r.t. the aforementioned bases of Z δ and Z δ , is the preconditioner that satisfies our needs.
Notice that the choice of the basis for Z δ is irrelevant.Indeed, denoting the aforementioned stiffness matrix as C δ Z with corresponding operator µ with a basis of type ∪ µ∈∨ Ψ ψ µ ⊗ Φ δ µ , the resulting stiffness matrix reads as blockdiag[A δ µ ] µ∈∨ Ψ , where , the matrix representation of the optimal preconditioner reads as It is well-known that when V δ µ is a finite element space, possibly w.r.t. a locally refined partition, suitable K δ µ of multi-grid type are available.These K δ µ can be applied in linear complexity, and so can K δ Y .Remark 5.5.To show, in Theorem 4.9, r-linear convergence of our adaptive Algorithm 4.8, we require the constant κ ∆ from (3.9) to be sufficiently small, i.e., the eigenvalues of the K δ Y E δ Y AE δ Y to be sufficiently close to 1.Given an initial optimal, self-adjoint, coercive preconditioner K δ Y , and some upper and lower bounds on the spectrum of the preconditioned system, one can satisfy the latter condition by polynomial acceleration using Chebychev polynomials of sufficiently high degree.In our numerical experiments, this 'acceleration' turned out to be unnecessary.5.6.2.Preconditioner at the 'trial side'.Let Z = X.The preconditioner presented in this section is inspired by constructions of preconditioners in [And16,NS19] for parabolic problems discretized on a tensor product of temporal and spatial spaces.
With u denoting the representation of u w.r.t.Φ δ λ , we have sup where is the matrix representation of an optimal preconditioner.
Remark 5.6.Notice that (5.9) requires an optimal preconditioner of a discretized reaction-diffusion equation that is robust w.r.t. the size of the (constant) reaction term.In [OR00] it was shown that, under a 'full-regularity' assumption, for quasiuniform meshes multiplicative multi-grid yields such a preconditioner, whose application can even be performed at linear cost.Although we expect that this assumption can be avoided using the theory of subspace correction methods, and furthermore that the optimality, robustness and linear complexity result extends to locally refined meshes, proofs of such extensions seem not to be available.

A CONCRETE REALIZATION
6.1.The collection O of finite element spaces, and the mapping δ → δ.We further specify the collection O of finite element spaces, construct a linearly independent set in H 1 0,Γ D (Ω) known as the hierarchical basis, and equip it with a tree structure such that there exists a 1-1 correspondence between the finite element spaces in O, and the spans of subsets of the hierarchical basis that form trees.
With this specification of O, there will be a 1-1 correspondence between the spaces X δ = ∑ λ∈σ λ σ λ ⊗ W δ λ with W δ λ ∈ O that satisfy (5.4), and the spans of collections of tensor products of wavelets σ λ and hierarchical basis functions whose sets of index pairs are lower, also known as downward closed.Given such a X δ , we will define X δ by a certain enlargement the lower set.
For d ≥ 2, let T be the family of all conforming partitions of a polytope Ω ⊂ R d into (closed) d-simplices that can be created by NVB starting from some given conforming initial partition T ⊥ with an assignment of the newest vertices that satisfies the matching condition, see [Ste08].We define a partial order on T by writing T T when T is a refinement of T .With some small adaptations that we leave to the reader, in the following the case d = 1 can be included by letting T be the family of partitions of Ω into (closed) subintervals constructed by bisections from T ⊥ = {Ω} such that the generations of any two neighbouring subintervals in any T ∈ T differ by not more than one.
The collection O that we will consider is formed by the spaces W = W T of continuous piecewise linears w.r.t.T ∈ T, zero on a possible Dirichlet boundary Γ D being the union of ∂T ∩ ∂Ω for some T ∈ T ⊥ .We expect that generalizations to finite element spaces of higher order do not impose essential difficulties.
For T ∈ T := ∪ T ∈T {T : T ∈ T }, we set gen(T) to be the number of bisections needed to create T from its 'ancestor' T ∈ T ⊥ .With N being the set of all vertices (or nodes) of all T ∈ T, for ν ∈ N we set gen(ν) := min{gen(T) : T ∈ T, ν ∈ T}.
Any ν ∈ N with gen(ν) > 0 is the midpoint of an edge of one or more T ∈ T with gen(T) = gen(ν) − 1.The vertices ν of these T with gen( ν) = gen(ν) − 1 are defined as the parents of ν.We denote this relation between a parent ν and a child ν by ν N ν, see Figure 1.Vertices ν ∈ N with gen(ν) = 0 have no parents.
An (essentially) non-overlapping partition T of Ω into T ∈ T is in T if and only if the set N T of vertices of all T ∈ T forms a tree, meaning that it contains all ν ∈ N with gen(ν) = 0 as well as all parents of any ν ∈ N T with gen(ν) > 0, cf.[DKS15] for the d = 2 case.Definition 6.1.For any T ∈ T, we define T d+ ∈ T (denoted as T ++ in [DKS15] for the d = 2 case) by replacing any T ∈ T by its 2 d 'descendants' of the dth generation, see Figure 1.Since this refinement adds exactly one vertex at the midpoint on any edge of all T ∈ T , one infers that indeed T d+ ∈ T. The corresponding tree N T d+ is created from N T by the addition of all descendants up to generation d of all ν ∈ N T . 1or ν ∈ N, we set φ ν as the continuous piecewise linear function w.r.t. the uniform partition {T ∈ T : gen(T) = gen(ν)} ∈ T, which function is 1 at ν and 0 at all other vertices of this partition.Setting N 0 := N \ Γ D and, for any T ∈ T, N T ,0 := N T \ Γ D , the collection {φ ν : ν ∈ N 0 } is known as the hierarchical basis, and for any T ∈ T, it holds that W T = span{φ ν : ν ∈ N T ,0 }.
With above specification of the collection O of finite element spaces, there exists a 1-1 correspondence between the spaces ∑ λ∈∨ Σ σ λ ⊗ W δ λ with W δ λ ∈ O that satisfy (5.4), and the spaces of the form (6.1) for some finite I δ ⊂ ∨ Σ × N being a lower set in the sense For above specification of X δ , from Proposition 5.2 with the specification (5.8) one infers that the corresponding space Y δ is given by Remark 6.2 (Complexity of matrix-vector multiplications).The fact that the index sets of the bases for X δ and Y δ are lower sets is the key to computing residuals of the system S δδ u δ = f δ ((4.8)) in O(dim X δ ) operations.The algorithm that realizes this complexity makes a clever use of multi-to single-scale transformations alternately in time and space.In a 'uniform' sparse-grid setting, i.e., without 'local refinements', this algorithm was introduced in [BZ96], and it was later extended to general lower sets in [KS14].The definition of a lower set in [KS14], there called multi-tree, is more restrictive than our current definition that allows more localized refinements, and details about the matrix-vector multiplication and a proof of its optimal computational complexity can be found in [vVW21b].Definition 6.3.Given X δ = span{σ λ ⊗ φν : (λ, ν) ∈ I δ,0 } for some lower set I δ ⊂ ∨ Σ × N, we define the lower set I δ, and with that X δ, by adding, for each (λ, ν) ∈ I δ and any child λ of λ and any descendant ν of ν up to generation d, all pairs ( λ, ν) and (λ, ν) to I δ .6.2.The collection Θ δ such that X δ = X δ ⊕ Θ δ .Recall that for the bulk chasing process we need an 'X-stable' basis Θ δ that spans an 'X-stable' complement space of X δ in X δ, i.e., a collection that satisfies (4.2).For that goal we define a modified hierarchical basis { φν : ν ∈ N 0 } by φν := φ ν when gen(ν) = 0, and otherwise.Notice that for those ν with gen(ν) > 0 that have all their parents not on Γ D it holds that Ω φν dx = 0, i.e., φν has a vanishing moment, and furthermore that for any T ∈ T, and thus for any lower set Moreover, for any T ∈ T, the basis transformation from the modified to unmodified hierarchical basis for W T can be applied in linear complexity traversing from the leaves to the roots.
Given δ, the collection Θ δ will be the set of properly normalized functions σ λ ⊗ φν for (λ, ν) ∈ I δ,0 \ I δ,0 .In order to demonstrate (4.2), we have to impose some gradedness assumption on the lower sets I δ .Definition 6.4.The gradedness constant of a lower set I δ ⊂ ∨ Σ × N is the smallest L δ ∈ N such that for all (λ, ν) ∈ I δ for which ν has an ancestor ν ∈ N with gen(ν) − gen( ν) = L δ , it holds that ( λ, ν) ∈ I δ for any child λ ∈ ∨ Σ of λ.Remark 6.5 (Uniform boundedness of the gradedness constants).Under the (unproven) assumption that our adaptive method creates a sequence of spaces X δ which are quasi-optimal for the approximation of the solution of the the parabolic PDE, one may hope that these spaces have a uniformly bounded gradedness constant, unless (locally) the solution u is extremely more smooth as function of t than as function of the spatial variables.
To see this, for some constant L ∈ N, consider the non-adaptive sparse grid index sets of the form {(λ, ν) ∈ ∨ Σ × N : L|λ| + gen(ν) ≤ N}, which are appropriate when the behaviour of u as function of t and of the spatial variables is globally similar.Their gradedness constant is L. In the corresponding trial spaces, the smallest spatial resolution equals the smallest temporal resolution to the power L/d.So only when such a polynomial decay of the spatial resolution as function of the temporal resolution does not suffice for a proper approximation of u, a uniformly bounded gradedness constant cannot be expected.Proposition 6.6.
The proof of Proposition 6.6 was based on the following lemma.
Lemma 6.7.For T ∈ T, and either T T T and v ∈ W T , or T = ∅, N T ,0 := ∅, and v = 0, and scalars (d ν ) ν∈N T ,0 \N T ,0 , it holds that with the constants hidden in the -symbols only dependent on M T T := max{gen( T) − gen(T) : T T ⊂ T ∈ T } or M T T := max{gen( T) : T ∈ T } for T = ∅.
Proof.Once the equivalences are shown uniformly in any T T with M T T = 1, repeated application shows them for the general case, with constants only dependent on M T T .So in the following, it suffices to consider the case that M T T = 1.The case T = ∅ is easy, so we will consider the case that T ∈ T.
Let Φ T = {φ T ,ν : ν ∈ N T ,0 } denote the standard nodal basis for W T .For any weight function 0 < w T ∈ ∏ T∈ T P 0 (T), with • L 2,w T (Ω) := w Ω) , only dependent on the spectrum of the element mass matrix on a reference element, i.e., on the space dimension d, so independent of the weight function w T .We refer to this equivalence by saying that Φ T is (uniformly) stable w.r.t.• L 2,w T (Ω) .
Notice that for ν ∈ N T ,0 \ N T ,0 , it holds that φ T ,ν = φ ν .W.r.t. the splitting N T ,0 = N T ,0 + N T ,0 \ N T ,0 , the basis transformation from To show (6.5), let P T : W T → W T be the projector with ran P T = W T and ran(Id − P T ) = span{ φν : ν ∈ N T ,0 \ N T ,0 }.Using the form of the basis transform from where I T is the nodal interpolator onto W T , and J T is defined by J T φ ν = φν .Since both { φν : ν ∈ N T ,0 \ N T ,0 } and {φ ν : ν ∈ N T ,0 \ N T ,0 } are uniformly stable w.r.t.h −1 T • L 2 (Ω) , and 1, and so With the common inverse inequality on W T , we see that Id − P T is uniformly bounded in H 1 (Ω)-norm, and that on ran(Id − P T ), . The proof of (6.5) is completed by the uniform stability of 2 ( 2 d −1)gen(ν) .Moving to (6.6), either by Ω φν dx = 0, or otherwise using the proximity of the Dirichlet boundary Γ D by an application of Poincaré's inequality, it holds that By using that for T ∈ T the number of ν ∈ N T ,0 \ N T ,0 for which supp φν has non-empty intersection with T is uniformly bounded, and furthermore that Φ T ∪ { φν : ν ∈ N T ,0 \ N T ,0 } is uniformly stable w.r.t.h T • L 2 (Ω) , we infer that for any level > 1 2 0 2 − level 0 level 1 FIGURE 2. Three-point hierarchical basis Σ.On level 0 there are two wavelets, and on level 1 there is one wavelet, whose parents are both wavelets on level 0. On each level > 1 there are 2 −1 wavelets, among them near each boundary one boundaryadapted wavelet, where each wavelet has one parent being the wavelet on level − 1 whose support includes the support of its child (so S Σ (λ) can be taken equal to supp σ λ ).All but one wavelets have one (bdr.wav) or two vanishing moments.
the last inequality by application of a less common inverse inequality which proof can be found in [SvV20, Lemma 3.4] for general dimensions d.From (6.8) it follows that Id − P T is uniformly bounded in the H 1 0,Γ D (Ω) -norm, and also that , where we used that . The proof of (6.6) is completed.
6.3.The wavelet collections Σ and Ψ.As wavelet basis Σ = {σ λ : λ ∈ ∨ Σ } we select the three-point hierarchical basis illustrated in Figure 2.This basis is known to be a Riesz basis for L 2 (I), and, after re-normalization, for H 1 (I) (see [Ste96]).It also satisfies the other assumptions made in Sect.5.2.The wavelets up to level span the space of continuous piecewise linear functions on I w.r.t. the uniform partition into 2 subintervals.
As wavelet basis Ψ = {ψ µ : µ ∈ ∨ Ψ } we take the orthonormal (discontinuous) piecewise linear wavelets, see Figure 3.The wavelets up to level span the space of (discontinuous) piecewise linear functions on I w.r.t. the uniform partition into 2 subintervals.is biorthogonal to {φ ν : ν ∈ N}.With { φλ : λ ∈ ∨ Σ } ⊂ C(I) defined analogously for the temporal axis, for g ∈ C(I × Ω) and u 0 ∈ C(Ω) we define the interpolants Since we expect that for sufficiently smooth g and u 0 , the errors g − δ g Y and u 0 − δ u 0 L 2 (Ω) are of higher order than the approximation error inf w∈X δ u − w X , for our convenience in the adaptive Algorithm 4.8 we ignore errors caused by dataoscillation by setting η(•) ≡ 0.
Notice that setting up the matrix vector formulation of the system (4.8) that defines our approximation u δ requires computing the vectors which can be performed in O(dim X δ ) operations because I δ and I Y δ,0 are lower sets (and #I Y δ,0 #I δ ).
For a given level N ∈ N, span{σ λ : |λ| ≤ N} coincides with the span of the continuous piecewise linears on an N-times recursive dyadic refinement of I, and span{φ ν ∈ Ξ : gen(ν) ≤ 2N} coincides with that of the continuous piecewise linears, zero at ∂Ω, on a 2N-times recursive bisection refinement of an initial partition T ⊥ .Therefore, the span of the 'full' tensor product {σ λ : |λ| ≤ N} ⊗ {φ ν : gen(ν) ≤ 2N} equals a space of lowest order continuous finite elements w.r.t. a quasi-uniform shape regular product mesh into prismatic elements.
Taking only those index pairs (λ, ν) for which 2|λ| + gen(ν) ≤ 2N produces a 'sparse' tensor product on level N. Sparse tensor products overcome the curse of dimensionality in the sense that for smooth solutions they achieve a rate in X-norm equal to the best rate in H 1 (Ω)-norm that can be expected for the corresponding stationary problem on the spatial domain, here the Poisson equation; cf.Sect.5.5.
We run our adaptive Algorithm 4.8 with θ = 0.5 and ξ = 1 Remark 7.1.Rather we would have applied an algorithm that produces a I δ such that I δ \ I δ is guaranteed to have an, up to a multiplicative factor, smallest cardinality among all lower sets I δ ⊃ I δ that realize the bulk criterion.Such an algorithm was introduced in [BD04, BFV19] for 'single-tree' approximation, but seems not to be available for the 'double-tree' (i.e.lower set) constraint that we need here.
We compare adaptive refinement with non-adaptive full-and sparse tensor products, and monitor the X-norm error estimator from Proposition 4.5, the residual error estimator from Proposition 4.3, and the L 2 (Ω) trace error at t = 0. 7.1.Condition numbers of preconditioner.For the calibration of our preconditioners, we consider Ω := (0, 1) 2 , and compare uniformly refined space-time meshes with locally refined meshes with refinements towards {0} × ∂Ω.
The replacement of the nonlocal operator (E¯δ Y AE¯δ Y ) −1 in the forward application of S δδ by the block-diagonal preconditioner K¯δ Y from Sect.5.6.2 is only guaranteed to result in a convergent algorithm when the eigenvalues of K¯δ Y E¯δ Y AE¯δ Y are sufficiently close to one.
In preconditioners K¯δ µ corresponding to n V-cycles.In each V-cycle we applied one pre-and one post Gauss-Seidel smoother.In case of a locally refined spatial mesh, on each level these Gauss-Seidel updates were restricted to the vertices whose generation is equal to that level as well as both endpoints of the edge on which these vertices were inserted ([WZ17]).We see that for both uniform and locally refined space-time meshes, κ δ converges to 1 rapidly in n, and is essentially independent of dim X δ .In our examples, κ δ is close enough to one already for n = 1.
Fixing n = 1 for the forward application of S δδ , we want to precondition S δδ it- self as well.Following Sect.5.6.2,we build a block-diagonal preconditioner taking K δ λ to correspond to m V-cycles of the aforementioned multigrid method now applied to A δ λ + 2 |λ| M δ λ with A δ λ and M δ λ being stiffness-and mass-matrices.Table 2 shows the condition numbers of the preconditioned matrix.We again see fast stabilization in m as well as in dim X δ .We fix m = 3 in the sequel.Most interestingly, in every of our example problems, only one or two PCG iterations suffice to fulfill the stopping criterion tδ ≤ t δ 2 on the algebraic error in Algorithm 4.8.(right) the moving peak problem.Shown: estimated X-norm error (solid line), residual norm (dashed), and t = 0 trace error (dotted) as a function of dim X δ for adaptive (black), sparse grid (red), and full grid refinement (orange).7.2.Smooth problem.We consider the square domain Ω := (0, 1) 2 and prescribe with derived data u 0 and g.For this smooth solution, full and sparse tensor products are expected to yield the best possible error decays proportional to (dim X δ ) −1/3 and (dim X δ ) −1/2 , respectively.
The left side of Figure 5 shows the error progressions for the smooth problem.We plot the error estimator δ u − ũδ X δ∞ δ u − ũδ X from Proposition 4.5, the residual error estimator r δ , and γ 0 ( δ u − ũδ ) L 2 (Ω) .We see that the error progressions are as expected.For this solution, adaptive refinement yields no advantage over sparse grid refinement.We observe a higher order of convergence for the trace at t = 0 measured in L 2 (Ω).The solution is smooth, and almost zero everywhere except on a small strip near the diagonal from (0, 0, 0) to (1, 1, 1) of the space-time cylinder.As u is smooth, we expect sparse grid refinements to asymptotically yield the optimal error decay proportional to (dim X δ ) −1/2 , albeit with a terrible constant.Adaptive refinement should be able to achieve the same rate at quantitatively smaller doubletrees.The right of Figure 5 shows that the sparse grid rate is not (yet) optimal, while our adaptive routine is able to find the optimal rate from dim X δ ≈ 10 3 onwards.Figure 6 shows the number of basis functions σ λ ⊗ φ ν whose supports intersect given points in the time-space cylinder.We see the adaptation to the moving peak.7.4.Cylinder problem.Selecting the L-shaped domain Ω := (−1, 1) 2 \ (−1, 0] 2 with data u 0 = 0 and g(t, x, y) := t • 1 {x 2 +y 2 <1/4} , the true solution is known to be singular at the re-entrant corner and at the wall of the cylinder {(t, x, y) : x 2 + y 2 = 1/4}.We took this example from [FK21].The left side of Figure 7 shows the error progression for this cylinder problem.We see that the full grid error decay proportional to (dim X δ ) −1/4 is improved to (dim X δ ) −1/3 by considering sparse grids.Adaptive refinement, however, achieves the best possible error decay proportional to (dim X δ ) −1/2 , recovering the rate for a smooth solution.7.5.Singular problem.Again for Ω := (−1, 1) 2 \ (−1, 0] 2 , we select data u 0 =1, and g = 0.The solution has a strong singularity along {0} × ∂Ω due to the incompatibility of initial-and boundary conditions, in addition to the singularity at the re-entrant corner (0, 0).At the right of Figure 7, for uniform refinement, we 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 6 (right) the singular problem.Shown: estimated X-norm error (solid line), residual norm (dashed), and t = 0 trace error (dotted) as a function of dim X δ for adaptive (black), sparse grid (red), and full grid refinement (orange).see the extremely slow error decay proportional to (dim X δ ) −1/11 , already found in [FK21].Interestingly, sparse grid refinement offers no rate improvement over full grid refinement.The adaptive algorithm yields a much better error decay proportional to (dim X δ ) −2/5 .We observed that increasing the Dörfler marking parameter to θ = 0.7 decreases the convergence rate to −1/3, whereas a θ smaller than 0.5 did not improve the rate beyond −2/5.Looking at Figure 8, we see strong adaptivity towards {0} × ∂Ω and I × {(0, 0)}, even seeing basis functions σ λ ⊗ φ ν of X δ whose barycenter is at t = 2 −14 ≈ 10 −4 .7.6.Gradedness and error reduction.In Sect. 4 we used (4.2) to demonstrate proportionality of r δ and u − u δ X , as well as a constant error reduction in each iteration of the adaptive algorithm.In Proposition 6.6, we showed that (4.2) holds when the gradedness L δ of Definition 6.4 is uniformly bounded.
In the left picture of Figure 9, we see however a more than expected increase in gradedness, where in particular for the singular problem we observe a logarithmic increase in terms of dim X δ .However, this turns out not to be a problem in practice: Figures 5 and 7 demonstrate that the residual error r δ and the estimated X-norm error δ u − ũδ X δ∞ are very close, and even converge for the singular problem.Moreover, in the right picture of Figure 9, we see a constant error reduction of ρ ≈ 0.89 at every step of the adaptive algorithm, and hence, that the conclusion of Theorem 4.9 holds in practice.7.7.Total runtime and memory consumption.Figure 10 shows the total runtime and peak memory consumption after every iteration of the adaptive algorithm.The top row shows absolute values, and the bottom row values relative to dim X δ .
The left of the figure shows that the adaptive algorithm runs in optimal linear time in the dimension of the current trial space.The right of the figure shows that the peak memory is linear as well, stabilizing to around 15kB per degree of freedom.This is relatively high, mainly because our implementation uses trees rather than hash maps to represent vectors to ensure a linear-time implementation of the matrix-vector products (cf.Rem.6.2).

CONCLUSION
We constructed an adaptive solver for a space-time variational formulation of parabolic evolution problems.The collection of trial spaces are given by the spans  of sets of tensor products of wavelets-in-time and hierarchical basis functions-inspace.Compared to our previous works [CS11,RS19] where we employed 'true' wavelets also in space, the theoretical results are weaker.We demonstrated rlinear convergence of the adaptive routine, but have not shown optimal rates at linear complexity.On the other hand, the runtimes we obtained with the current approach are much better.

1. 8 .
Notations.In this work, by C D we will mean that C can be bounded by a multiple of D, independently of parameters which C and D may depend on.Obviously, C D is defined as D C, and C D as C D and C D.

FIGURE 5 .
FIGURE 5. Error progressions for (left) the smooth problem and (right) the moving peak problem.Shown: estimated X-norm error (solid line), residual norm (dashed), and t = 0 trace error (dotted) as a function of dim X δ for adaptive (black), sparse grid (red), and full grid refinement (orange).

FIGURE 7 .
FIGURE 7. Error progressions for (left) the cylinder problem and (right) the singular problem.Shown: estimated X-norm error (solid line), residual norm (dashed), and t = 0 trace error (dotted) as a function of dim X δ for adaptive (black), sparse grid (red), and full grid refinement (orange).

FIGURE 8 .FIGURE 9 .
FIGURE 8. Barycenters of supports of basis functions σ λ ⊗ φ ν spanning X δ generated by Algorithm 4.8 of dimension 81 074 for the singular problem.Left: a top-down view, with a 10× zoom to the origin; right: centers in spacetime, logarithmic in time.

FIGURE 10 .
FIGURE 10.Total runtime and peak memory consumption as function of dim X δ , measured after every iteration of the adaptive loop, for the four different model problems.

TABLE 2 .
Spectral condition numbers of K δ X S δδ , using spatial multigrid with m V-cycles.