Improved rates for a space-time FOSLS of parabolic PDEs

We consider the first-order system space-time formulation of the heat equation introduced in [Bochev, Gunzburger, Springer, New York (2009)], and analyzed in [F\"uhrer, Karkulik, Comput. Math. Appl. 92 (2021)] and [Gantner, Stevenson, ESAIM Math. Model. Numer. Anal.} 55 (2021)], with solution components $(u_1,{\bf u}_2)=(u,-\nabla_{\bf x} u)$. The corresponding operator is boundedly invertible between a Hilbert space $U$ and a Cartesian product of $L_2$-type spaces, which facilitates easy first-order system least-squares (FOSLS) discretizations. Besides $L_2$-norms of $\nabla_{\bf x} u_1$ and ${\bf u}_2$, the (graph) norm of $U$ contains the $L_2$-norm of $\partial_t u_1 +{\rm div}_{\bf x} {\bf u}_2$. When applying standard finite elements w.r.t. simplicial partitions of the space-time cylinder, estimates of the approximation error w.r.t. the latter norm require higher-order smoothness of ${\bf u}_2$. In experiments for both uniform and adaptively refined partitions, this manifested itself in disappointingly low convergence rates for non-smooth solutions $u$. In this paper, we construct finite element spaces w.r.t. prismatic partitions. They come with a quasi-interpolant that satisfies a near commuting diagram in the sense that, apart from some harmless term, the aforementioned error depends exclusively on the smoothness of $\partial_t u_1 +{\rm div}_{\bf x} {\bf u}_2$, i.e., of the forcing term $f=(\partial_t-\Delta_x)u$. Numerical results show significantly improved convergence rates.


INTRODUCTION
1.1.Space-time variational formulation of the heat equation.This paper is about the numerical solution of second-order parabolic evolution equations in a simultaneous space-time variational formulation.For convenience only we restrict the discussion to the model problem of the heat equation with homogeneous Dirichlet (lateral) boundary conditions on the time-space cylinder Q := I × Ω, where I := (0, T), and Ω ⊂ R d is a bounded Lipschitz domain.The common space-time variational formulation of finding u that, for given data f , u 0 , satisfies for all 'test functions' v, w, induces a boundedly invertible operator from X to Y ′ × L 2 (Ω), where X := L 2 (I; H 1 0 (Ω)) ∩ H 1 (I; H −1 (Ω)), Y := L 2 (I; H 1 0 (Ω)) (see, e.g., [DL92] or [Wlo82]).Already because X = Y × L 2 (Ω), the corresponding bilinear form is non-coercive, so that one cannot resort to simple conforming Galerkin discretizations.See, however, [Ste15,LMN16] for methods that apply in the case of a homogeneous initial condition.1.2.Minimal residual discretization.The approach introduced by R. Andreev in [And13] is to consider minimal residual (or least squares) discretizations.They amount to minimizing the residual over a selected finite-dimensional trial space X δ ⊂ X.Since the residual of the PDE lives in the dual space Y ′ , however, its norm cannot be evaluated exactly and so has to be approximated by the dual norm over a suitable finite dimensional test subspace Y δ ⊂ Y.The resulting minimal residual solution from X δ is a quasi-best approximation from X δ when the pair (X δ , Y δ ) satisfies a Ladyshenskaja-Babuska-Brezzi (LBB) condition.In view of the application with finite element subspaces w.r.t.partitions of Q that preferably are as general as possible, it is a drawback that so far the verification of this LBB condition has been restricted to pairs of finite element spaces w.r.t.partitions of Q that permit a decomposition into time-slabs ( [SW21b,SW21a]).
Remark 1.1.In [SvVW22] uniform LBB-stability was demonstrated for trial and test spaces that are spanned by (adaptively generated) sets of tensor products of temporal wavelets and spatial finite element spaces.1.3.FOSLS.By writing the forcing function f ∈ Y ′ (non-uniquely) as f 1 + div x f 2 for some f 1 ∈ Y ′ and f 2 ∈ L 2 (Q) d , the heat equation for u = u 1 can equivalently be written as the first-order system From the well-posedness of the space-time variational formulation of the secondorder equation, and the boundedness of div x : L 2 (Q) d → Y ′ and ∇ x : X → L 2 (Q) d , one can infer that G is a boundedly invertible operator from [RS18,Lem. 2.3 & Rem. 2.4]).Because of the presence of the dual space Y ′ , however, least-squares finite element discretizations of this first-order system suffer from the same restrictions imposed by an LBB condition.
As was first suggested in [BG09, §9.1.4],a solution for this problem is to restrict to functions u = (u 1 , u 2 ) for which div u := ∂ t u 1 + div x u 2 ∈ L 2 (Q).Doing so, in [FK21] it was proven that G is a boundedly invertible operator from equipped with the graph norm, to its range in i.e., Gu L u U (u ∈ L) 1 .Consequently, for any f = ( f 1 , f 2 , u 0 ) ∈ ran G, and any closed subspace U δ ⊂ U, finding u δ := argmin v∈U δ f − Gv L is a wellposed first order system least-squares (FOSLS) discretisation of Gu = f.Because the L-norm can be efficiently exactly evaluated, in [BG09]  discretisation is referred to as being 'practical'.The resulting u δ is a quasi-best approximation from U δ to the solution u in the norm on U, i.e., u − u δ U inf v∈U δ u − v U .
Adding to the results from [FK21], in [GS21] the aforementioned range of G was shown to be the full space L. Furthermore, it was shown that which always exist by Riesz' representation theorem, it was shown that the firstorder system formulation with spaces U and L applies whenever the space-time variational formulation of second order from Section 1.1 does.We also mention our recent generalization to the instationary Stokes problem with so-called slip boundary conditions [GS22b].
A main advantage of the FOSLS approach is that it corresponds to a bilinear form that is symmetric and positive definite which is beneficial in several applications (cf.[FK22,GS22a]).Being of least-squares type, the method comes with a built-in efficient and reliable a posteriori estimator for the error in any approximation, being the L-norm of its residual.The square of this norm naturally splits into contributions associated to the individual elements which can be used to drive an adaptive solution routine.Plain convergence of this routine has been demonstrated in [GS21], where in practice one observes that this adaptive routine selects a sequence of partitions that results in the best possible convergence rate.1.4.Low rates for non-smooth solutions.For smooth solutions, this FOSLS works fully satisfactorily.When applying (vectorial) continuous piecewise polynomial approximation of degree k w.r.t. a quasi-uniform partition of Q into (d + 1)-simplices the error in U-norm is of the optimal order DoF − k d+1 .Obviously for solutions that are insufficiently smooth, reduced rates are obtained.The experiments reported in [FK21], however, show that for several problems adaptivity hardly improves these rates, which remain disappointingly low.For example, for the heat equation on Ω = (0, 1), with forcing function f (= f 1 ) = 2 and initial condition u 0 = 1, with a quasi-uniform partition of Q = I × Ω into triangles and k = 1, in [FK21] a rate equal to 0.08 was reported.A low rate was to be expected due to the singularities in the solution at the corners (0, 0) and (0, 1) of Q as a consequence of the discontinuity between initial and boundary conditions.But even with adaptivity this rate improved to only 0.17.The observed rate for Ω = (0, 1) 2 , f = 0, and u 0 = 1 and a quasi-uniform partition of Q into tetrahedra was 0.07, which even did not improve at all by adaptive refinements.
The disappointingly low rates for non-smooth solutions with the FOSLS method whilst applying adaptive refinements can be seen as a prize to be paid for the inclusion of the condition div u ∈ L 2 (Q) in the definition of the space U. Indeed, with for example linear trial spaces for u 1 and u 2 , the (local) approximation error in u measured in div • L 2 (Q) , being part of the graph norm on U, is of the order of the (local) mesh-size times the maximum of the (local) H 2 semi-norms of u 1 and u 2 .Realizing that u 1 = u, and, for f 2 = 0, u 2 = −∇ x u 1 , we conclude that this (local) approximation error is of the order of the (local) mesh-size times the maximum of the (local) L 2 -norm of second-and thirdorder derivatives of u.
Similar bounds on the (local) approximation errors in u 1 and u 2 in L 2 (I; H 1 0 (Ω))or L 2 (Q) d -norms involve at most second order derivatives of u.Obviously, similar considerations apply to finite element spaces of higher order.1.5.Towards a remedy.To cure this problem, an idea is to apply finite element spaces that are specially designed for the space H(div; Q), as e.g. the Raviart-Thomas finite elements.Thanks to their 'commuting diagram property', for, say, RT 0 -elements the (local) approximation error in u measured in div • L 2 (Q) is of the order of the (local) mesh-size times the (local) H 1 semi-norm of div u, the latter being equal to f 1 .In many cases, the condition that f 1 has some smoothness is much milder than the corresponding condition on the smoothness of both terms ∂ t u 1 and div x u 2 separately.Unfortunately, because of the condition u 1 ∈ L 2 (I; H 1 0 (Ω)) incorporated in the definition of U, H(div; Q)-conformity of the finite element space does not suffice, and therefore the Raviart-Thomas elements are not applicable.
In [CH18] the space of vectorial continuous piecewise linears was enriched with bubbles associated to the faces of the simplices such that the range of the divergence operator applied to this trial space equals the space of piecewise constants.This trial space, which is H 1 (Q)and thus U-conforming, comes with a quasi-interpolator (cf.also [BCH20]) that satisfies a commuting diagram so that the (local) approximation error in u measured in div • L 2 (Q) is of the order of the (local) mesh-size times the local H 1 semi-norm of div u.Unfortunately, in our setting, where u = (u 1 , u 2 ) and u 2 = −(f 2 + ∇ x u 1 ), it turns out that with this space the problem of the appearance of higher order derivatives of u in the local error bounds manifests itself at another place.Due to the fact that not all faces of a (d + 1)-simplex inside Q are either parallel or perpendicular to the bottom {0} × Ω of Q, the local quasi-interpolation error in u 1 in the L 2 ⊗ H 1 -norm depends on derivatives of u 2 , and thus on derivatives of u that are of one order higher than desirable.Numerical experiments with the FOSLS method for this trial space do not show better rates than without the inclusion of these bubbles 1.6.The approach from this work.We consider finite element subspaces of U w.r.t.prismatic partitions of the time-space cylinder Q.The space of local shape functions on each prism will be a Cartesian product of tensor products of finite element spaces in time and space.In Section 2, we construct a local quasi-interpolant that satisfies the desired commuting diagram.
Defining the global quasi-interpolant piecewise as this local quasi-interpolant would correspond to a non-conforming finite element space, as the u 1 -component would be discontinuous over element interfaces in the spatial direction and therefore not in L 2 (I; H 1 0 (Ω)).By constructing a global quasi-interpolant by averaging the local quasi-interpolants over these interfaces, the resulting global finite element space is conforming.Although consequently a truly commuting diagram is lost, in Section 3.1, for locally conforming prismatic partitions, we show a local upper bound for the global interpolation error that is satisfactory in the following sense: For the same convergence rate, it requires less local smoothness of the solution than a corresponding interpolation error bound with simplicial finite element spaces.
Having the aim to allow for local (adaptive) refinements, appropriate for nonsmooth solutions, nonconforming prismatic partitions have to be considered.In Section 3.2, we demonstrate that also on local patches which are nonconforming, the local upper bound for the global interpolation error has the right order, albeit under stronger smoothness requirements on the solution.
The numerical experiments for an adaptive solver presented in Section 4 demonstrate the advantage of the use of prismatic elements for various singular solutions compared to the use of standard simplicial elements.
We also mention that prismatic partitions provide the possibility to use different (local) mesh-sizes in time and space.For an ultra-weak DPG reformulation of the system (1.2) introduced in [DS22], a so-called parabolic scaling of the temporal and spatial mesh-sizes turns out to be beneficial for certain singular solutions, demonstrating the potential usefulness of anisotropic prisms.However, while some theoretical results in this work apply to anisotropic prismatic meshes as well, the main results and the numerical experiments are restricted to isotropic prismatic partitions.

THE FINITE ELEMENT
As element domain we consider a prism P = J × K, where J is a (closed) interval and K a (closed) d-simplex.We will use the notations h J := diam J and h K := diam K.
For ℓ, k ∈ N 0 , we consider the space of shape functions given by S ℓ,k (P) with RT k (K) being the Raviart-Thomas element of order k, see, e.g., [BF91, III.3].For d = 1, this space should be read as P k+1 (K).
The definition of the finite element is completed by the specification of the degrees of freedom (DoFs) for both P ℓ+1 (J) ⊗ P k (K) and P ℓ (J) ⊗ RT k (K).As DoFs for P ℓ+1 (J) we take (2.1) p(t) (t ∈ ∂J) and, when ℓ ≥ 1, J pq dt (q ∈ P ℓ−1 (J)), and as DoFs for P k (K), K pq dx (q ∈ P k (K)).
The DoFs for P ℓ+1 (J) ⊗ P k (K) are the tensor products of the DoFs for both factors.
As DoFs for P ℓ (J) we take (2.2) and as DoFs for where R k (∂K) = {q ∈ L 2 (∂K) : q| e ∈ P k (e) for all facets e of K}.Again, the DoFs for the tensor product P ℓ (J) ⊗ RT k (K) are the tensor products of the DoFs for both factors.
Remark 2.1.Notice that the DoFs for P ℓ+1 (J) were chosen so that in a certain sense this space plays the role of a one-dimensional Raviart-Thomas space of order ℓ.A difference, however, is that, for d > 1, RT k (K) P k+1 (K) d .

The local interpolant and a commuting diagram.
Above DoFs for S ℓ,k (P) are well-defined on On such a space they define a quasi-interpolator I P ℓ,k = I P,1 ℓ,k × I P,2 ℓ,k : v → I P ℓ,k v ∈ S ℓ,k (P) by imposing that I P ℓ,k v has the same DoFs as v.By construction, this I P ℓ,k is thus a projector onto S ℓ,k (P).Proposition 2.2 (Commuting diagram property).With Q P ℓ,k denoting the L 2 (P)orthogonal projector onto P ℓ (J) ⊗ P k (K), it holds that For p ∈ P ℓ (J), q ∈ P k (K) , and v = (v 1 , v 2 ), and with n = (n t , n x ) the outward pointing normal vector on ∂P, integration by parts gives by the definition of the DoFs.This completes the proof.
with hidden constants depending only on the shape regularity of K, the space dimension d, and the polynomial degrees ℓ and k.
Proof.Let ρ J ℓ , ρ K k denote the projectors on P ℓ+1 (J) and RT k (K) defined by the DoFs from (2.1) and (2.3), respectively, and let Q J ℓ and Q K k denote the L 2 -orthogonal projectors onto P ℓ (J) and P k (K), respectively, so that Similarly, we have (Id 3 The case s = 2 will be relevant later.
standard estimates for the orthogonal projectors give the results.
Remark 2.4.For simplicity Corollary 2.3 is restricted to L 2 -based Sobolev norms with integer smoothness indices.Error bounds in terms of general L p -based Sobolevnorms, possibly with fractional smoothness indices can be derived as well.
In the isotropic case of h J h K , the best choice for the polynomial degrees ℓ and k is ℓ + 1 = k.Indeed, increasing either ℓ or k will not improve the estimate of minimal order among the upper bounds of Corollary 2.3.By taking the bounds (i), (ii), and (iii), respectively, which minimizes the regularity conditions without affecting the minimal order, one arrives at the following upper bound for the interpolation error in the local U-norm: In particular, for u = (u 1 , u 2 ) being the solution of (1.2), i.e., u 1 = u being the solution of the heat equation, it holds that Remark 2.6.For comparison, to obtain a local error bound of order k with a trial space on a (d + 1)-simplex T, one needs a trial space that contains P k (T) d+1 .Then, with h T := |T| 1 d+1 , the bound on the local interpolation error in the • U,P -norm reads as h k T |v| H k+1 (T) d+1 , and so for v = u, as which requires more smoothness of u than the bound (2.6).As was already mentioned in the introduction, for these simplicial elements the addition of bubbles in order to realize a commuting diagram for the divergence operator does not give rise to better results.

GLOBAL FINITE ELEMENT SPACE
3.1.Conforming partition into prisms.Let P be an (essentially) disjoint partition of the time-space cylinder Q into prisms P = J × K for closed intervals J ⊂ I and uniformly shape-regular closed d-simplices K ⊂ Ω.In this subsection we assume that P is conforming in the sense that if the intersection of two different prisms from P has a positive d-dimensional measure, then these prisms share a common facet.This means that P has to be a Cartesian product of an essentially disjoint partition J of I into subintervals J and an essentially disjoint conforming partition we define the global finite element space S ℓ,k (P ) = {v ∈ U : v| P ∈ S ℓ,k (P) (P ∈ P )}.
Our aim is to construct a suitable interpolation operator into S ℓ,k (P ).For some s > 2 (s ≥ 2 when d = 1), let v ∈ C(I; L 1 (Ω)) × L 1 I; L s (Ω) d ∩ H(div; Ω) .The piecewise interpolation operator with ȊP,1 ℓ,k being defined by the replacement of the piecewise orthogonal projector w → (Q K k w| K ) K by the operator Qk defined as this piecewise orthogonal projector followed by averaging at the nodal points4 shared by multiple d-simplices K in the partition of Ω, or setting the value to zero at nodal points on ∂Ω.Since a piecewise polynomial on Ω is in H 1 0 (Ω) if (and only if) it is continuous and vanishes at ∂Ω, the mapping ȊP ℓ,k is a projector onto S ℓ,k (P ).
w L 2 (ω K ) , and for n ∈ {0, 1}, m ∈ {1, . . ., k + 1}, and w that vanishes at ∂Ω, The latter inequality follows by the usual homogeneity and polynomial reproduction arguments after realizing that, when applied to w with w| ∂Ω = 0, the definition of Qk does not change when the zero DOFs associated to nodal points on ∂Ω are replaced by the usual Scott-Zhang functionals in terms of a weighted integrals of w| ∂Ω (see [GL01,Appendix]).A comprehensive analysis of quasi-interpolators constructed by averaging can be found in [EG17].
Proof.The third estimate follows directly from Corollary 2.3(iii).
, the second estimate follows from the estimate for Id − ρ J ℓ that was already used in the proof of Corollary 2.3, and, for w that vanishes on ∂Ω, | Qk w| H 1 (K) |w| H 1 (ω K ) , and w ′ L 2 (J) , and, for w that vanishes on ∂Ω, By taking ℓ + 1 = k, and ℓ ′ = k = k ′′ , ℓ ′ + 1 = k = k ′ , and ℓ ′ = k = k ′ + 1 in the first, second, and third bound of Proposition 3.1, respectively, we arrive at the following upper bound for the interpolation error in the local U-norm (2.5) in the isotropic case: Corollary 3.2.For h P := h J h K and k ∈ N, it holds that In particular, for u being the solution of (1.2), it holds that mean- ing that the global interpolant ȊP ℓ,k does not give rise to a (truly) commuting diagram.Considering the case that ℓ + 1 = k, this upper bound for ℓ ′ = k = k ′′ of order h k J + h k K involves an L 2 -norm of a partial derivative of v 1 of total order k + 1, of which one order is in time and the other k orders are in space.
Still for ℓ norms of partial derivatives of v 1 of total order k + 1, one that has k orders in time and one in space, and the other that has all k + 1 orders in space.For v 1 = u being a (nonsmooth) solution of the heat equation with a (relatively) smooth forcing term f , temporal derivatives of u can be expected to grow faster towards a singularity than spatial derivatives of the same order do.In view of this, the additional term in the upper bound of the interpolation error as a consequence of the absence of a truly commuting diagram is harmless.Because of this, in Corollary 3.2, we have bounded the three (local) L 2 -norms of partial derivatives of v 1 from the upper bounds of Proposition 3.1 by the single term

General partition into prisms. When having the aim to allow local (adaptive)
refinements, it is needed to consider prismatic partitions that are nonconforming.In this section, we therefore consider (essentially) disjoint partitions P of the timespace cylinder Q into prisms P = J × K, for closed intervals J ⊂ I and uniformly shape-regular closed d-simplices K ⊂ Ω, such that for any two different P, P ′ ∈ P, where P ′ = J ′ × K ′ , if meas d (P ∩ P ′ ) > 0, then |J| |J ′ | and |K| |K ′ |, and either P and P ′ share a common facet, or P ∩ P ′ is a complete facet F of one the two prisms, say P, and a proper subset of a facet F ′ of P ′ .In the latter case we refer to P ′ (or F ′ ) as the master of the slave P (or F).We assume that P is 1-irregular in the sense that if F ′ ⊂ P ′ is a master facet, then it has non-empty intersection with possible slave facets of P ′ .A suitable refinement strategy generating partitions that satisfy all given requirements will be presented in Section 4.1.Facets that are parallel to the bottom {0} × Ω of the time-space cylinder will be referred to as horizontal facets, and the other ones as lateral facets.As in the case of having a conforming prismatic partition, we set S ℓ,k (P ) := {v ∈ U : v| P ∈ S ℓ,k (P) (P ∈ P )}.
We have to adapt the definition of the quasi-interpolant ȊP ℓ,k = ȊP,1 ℓ,k × I P,2 ℓ,k from (3.1) to generally nonconforming partitions.We will construct an adapted quasiinterpolant, denoted by " I P ℓ,k = " I P,1 ℓ,k × " I P,2 ℓ,k , that is a projector onto S ℓ,k (P ) such that, for h J h K and ℓ + 1 = k (ℓ = k ≥ 1 when d = 1), an estimate like (3.3) from Corollary 3.2 is valid, albeit with stronger norms on v = (v 1 , v 2 ) in the upper bound.
The definition " I P ℓ,k will show that for all P ∈ P for which the local patch ω P = ω P P := ∪{P ′ ∈ P : P ′ ∩ P = ∅} is conforming, it holds that ( " I P ℓ,k v)| P = ( ȊP ℓ,k v)| P so that the favourable estimate (3.3) from Corollary 3.2 remains valid.
Although for the other P ∈ P we will assume locally sufficient additional regularity of v, a proof of the interpolation error in local • U,P -norm being of order h k P is nevertheless not immediate.Indeed, because P k (J × K) d+1 is not included in our local finite element space S k−1,k (P), it is not a consequence of the Bramble-Hilbert lemma. 63.2.1.Definition and analyis of " I P,1 ℓ,k .Given ℓ ∈ N 0 , k ∈ N, we define the set of nodal points in P ∈ P as the Cartesian product of the principal lattice of order ℓ + 1 on J and the principal lattice of order k on K.A nodal point ν in P that is not on a slave facet of P is called a free node in P. Notice that a function in S ℓ,k (P) is uniquely determined by its values in the nodal points in P, and that the restriction of such a function to a facet of P is determined by its values in the nodal points on that facet.
Let v 1 ∈ C(I; L 1 (Ω)), and let ν be a node of P. For ν ∈ P \ ∂P, we set ( " I P,1 ℓ,k v 1 )(ν) := (I P,1 ℓ,k v 1 )(ν), and for ν ∈ I × ∂Ω, we set ( " I P,1 ℓ,k v 1 )(ν) := 0. For ν ∈ ∂P \ ∂Ω being a free node in P, we define ( " I P,1 ℓ,k v 1 )(ν) as the average of (I P ′ ,1 ℓ,k v 1 )(ν) over all P ′ ∈ P in which ν is a free node.In the remaining case of ν being on a slave facet F of P with master facet F ′ of P ′ , we set ( " I P,1 ℓ,k v 1 )(ν) equal to the value at ν of ( " I P,1 ℓ,k v 1 )| F ′ .Note that the condition of P being 1-irregular ensures that all nodes of P ′ that are on F ′ are free so that the latter term is well-defined.
By construction, " I P,1 ℓ,k v 1 is continuous and vanishes at I × ∂Ω.The same arguments used to demonstrate (3.2) together with standard inverse inequalities show 6 Literature on Raviart-Thomas or similar H(div)-elements w.r.t.nonconforming partitions seems scarce.In [AP02], RT k -elements w.r.t.nonconforming quadrilateral partitions are considered.Interpolation error estimates of optimal order w.r.t.L 2 -norms are given (even with an explicit dependence of constants on k).Such estimates for H(div)-norms are however missing (with the exception of [AP02, Thm.15] based on a commuting diagram which, however, is not valid for nonconforming partitions).
that for P = J × K, j ∈ {0, . . ., min(ℓ + 1, k)}, and v 1 that vanishes at I × ∂Ω, Because we are going to estimate div(v − " here we notice that similarly to (3.4), for v 1 that vanishes at I × ∂Ω it holds that 3.2.2.Definition and analysis of " for F being a lateral slave facet of P with corresponding master P ′ , in the DoFs of type F v 2 (t, x) • n x q(x)p(t) dx dt that define Below, we derive local bounds for the quasi-interpolation error.
The DoFs for RT k (K) given in (2.3) define a local quasi-interpolator as the sum over these DoFs of the product of the DoF and the associated dual basis function of RT k (K).If we restrict this sum to those DoFs that are associated to a facet e of K, then the resulting operator, which we denote by I e , satisfies as shown in [EG21,Ch. 17,(17.20)].Since I e w depends on the normal trace of w on e only, equally well it holds that where K ′′ is another uniformly shape-regular d-simplex that has e as a face.By an application of an inverse inequality, it also shows that By extending this to the local quasi-interpolator on P = J × K, with P ℓ (J) ⊗ RT k (K) equipped with tensor DoFs built from (2.2) and (2.3), and lateral facet F = J × e, we obtain for the restriction of this quasi-interpolator to the tensor DoF's associated to F, which we denote by I F , (3.9) To proceed, it suffices to consider the situation that P = J × K has one lateral slave facet J × e, where P ′ = J ′ × K ′ denotes the corresponding master.Then we can select a uniformly shape-regular d-simplex K ′′ ⊂ K ′ that has facet e. Recalling the definition of " I P,2 ℓ,k , the application of (3.9) with w = (Id − I P ′ ,2 ℓ,k )v 2 yields (3.11) We estimate both terms in the upper bound separately.
3.2.4.The case d = 1.So far we have excluded the case d = 1 from the definition and analysis of " I P,2 ℓ,k in Section 3.2.2, and so consequently from our conclusions presented in Section 3.2.3.Everything that was presented in Section 3.2.2,however, is also valid for d = 1, where the condition s > 2 can even be read as s ≥ 2. Different from the d ≥ 2 case, however, for d = 1 the value of s ∈ [2, ∞] cannot be chosen such that the exponent d( 1 2 − 1 s ) − 1 in (3.8) is equal to 0. Via (3.10), (3.14), and (3.15), it means that in (3.19) we do not obtain an upper bound of order h k P unless we can take ℓ ′ = k + 1 (more precisely ℓ ′ = k + 1 2 + 1 s ) in (3.13).The latter however requires ℓ = k instead of ℓ + 1 = k.
Taking therefore ℓ = k ≥ 1, i.e., S k,k (P ) instead of S k−1,k (P ) that was found most appropriate for the case of conforming partitions, and s = 2, similar to (3.17) and (3.18), we obtain whereas instead of (3.19) we have div(v − " In our numerical experiments, however, we did not observe better results for S k,k (P ) than for S k−1,k (P ), and therefore we present only results for the latter option.

NUMERICAL RESULTS
4.1.Adaptive algorithm.In the following numerical examples, we consider the least-squares formulation (1.2) of the heat equation (1.1) for d = 1 and d = 2. On prismatic partitions P δ of the space-time cylinder Q = I × Ω consisting of isotropic elements of the form P = J × K for closed intervals J ⊂ I and closed d-simplices K ⊂ Ω, we compute the corresponding least-squares approximation u δ := argmin v∈S 0,1 (P δ ) f − Gv L (i.e., ℓ + 1 = k as suggested in Remark 3.3, and the lowest order case k = 1).For convenience of the reader, we recall the definition of the discrete space S 0,1 (P δ ) = {v ∈ U : v| P ∈ P 1 (J) ⊗ P 1 (K) × P 0 (J) ⊗ RT 1 (K) (P ∈ P δ )}, where RT 1 (K) should be read as P 2 (K) in the case that d = 1.Note that u δ can simply be computed as the solution of the finite-dimensional linear system To measure the error, we use the reliable and efficient a posteriori error estimator η(f, v) := f − Gv L u − v U (v ∈ U), with corresponding local error indicators η(P; f, v) := f − Gv L(P) (P ∈ P δ ), where While the a priori estimates of Section 3 only guarantee the convergence rate O(dofs − 1 d+1 ) for a sequence of quasi-uniform meshes and sufficiently smooth solution u, we will also investigate adaptive refinement of elements M δ ⊆ P δ with "large" error indicator for singular solutions u.We use Dörfler marking θη(f, u δ ) 2 ≤ ∑ P∈M δ η(P; f, u δ ) 2 with some 0 < θ ≤ 1 to determine the marked elements.
Starting from a conforming initial partition of Q into (closed) prisms, we consider the following refinement procedure: In the spatial direction a d-simplex to be refined is decomposed into 2 d -simplices by some rule such that a uniform refinement of a conforming partition of Ω results in a conforming decomposition, and a repeated application of this rule produces uniformly shape-regular simplices, and we combine this rule with bisection in the temporal direction.Now the level of each prism that is produced is defined as the number of these product decompositions needed to create the prism from a prism in the initial partition, whose level is set to zero.All prisms that can be produced in this way are uniformly shape-regular, and so in particular (uniformly) isotropic.
In our experiments for spatial domains of dimension d = 1 or d = 2, as the rule to refine a d-simplex we applied bisection or 'red'-refinement, respectively.
In the case of local refinements, apart from the prisms marked for refinement, a minimal number of other prisms are refined such that the difference in levels of any two prisms in the new partition that have non-empty intersection is less or equal to one.This condition guarantees that all partitions that can be created are 1-irregular.
Overall, we thus consider the following adaptive algorithm.
Output: Refined meshes P j , corresponding exact discrete solutions u j , and error estimators η(f, u j ) for all j ∈ N 0 .
For comparison, we also consider the adaptive algorithm of [FK21], i.e., Algorithm 4.1 for conforming simplicial meshes T δ instead of prismatic meshes P δ , refined by newest vertex bisection [Ste08], where, as in [FK21], we refine marked elements into 2 d+1 elements by repeated bisection, with associated finite element space Although, we present numerical comparisons only for some of the test examples of [FK21] (plus some new ones), we emphasize that we computed numerical results for all these examples and observed that in each case our prismatic finite element space S 0,1 (P δ ) gave either better or the same convergence rates of the estimator, for both uniform refinement, where in each step all elements are marked for refinement, and adaptive refinement.
Remark 4.2.In [GS21], we proved plain convergence of the adaptive algorithm from [FK21], even for arbitrary polynomial degree of the employed discrete functions.Exploiting the local approximation properties derived in Section 3.2, the proof easily extends to Algorithm 4.1 (with prismatic meshes), even for arbitrary polynomial degrees ℓ ∈ N 0 , k ∈ N in the discrete space S ℓ,k (P j ) (instead of S 0,1 (P j )).Convergence is even satisfied if the marking step (iii) is replaced by determining a (not necessarily minimal) set of marked elements M j ⊆ P j with the property max P∈P j \M j η(P; f, u j ) ≤ M( max P∈M j η(P; f, u j )) for some fixed marking function M : [0, ∞) → [0, ∞) that is continuous at 0 with M(0) = 0.

Experiments in 1+1D.
For the following experiments, we fix the domain Ω := (0, 1) and the end time point T := 1.As initial prismatic mesh of the resulting space-time cylinder Q, we take P 0 = {Q}.For the initial triangular mesh T 0 , we split Q into four triangles.We consider uniform refinement, where in each step all elements are marked for refinement, and adaptive refinement with Dörfler parameter θ = 0.5.

4.2.1.
Non-matching initial datum.We consider [FK21, Example 4], i.e., Note that the initial datum does not match with the homogenous Dirichlet boundary conditions so that the solution u of the heat equation and the associated u = (u, −∇ x u) are singular for (t, x) ∈ {(0, 0), (0, 1)}. Figure 1 displays the resulting error estimators η for simplicial and prismatic meshes, and uniform and adaptive refinement vs. the degrees of freedom.The rates 0.08 and 0.17 for simplicial meshes have been already reported in [FK21].For prismatic meshes, we observe significantly improved rates 0.13 and 0.43.As mentioned before, the optimal rate for smooth solutions would be 1/2.FIGURE 2. Convergence plot of Section 4.2.2 for spatial domain Ω = (0, 1), right-hand side ( f 1 , f 2 , u 0 ) = 1, 0, x → 1 − 2 x − 1 2 , simplicial and prismatic meshes, and uniform and adaptive refinement.4.2.2.Initial datum with a "singularity" in the interior.We consider [FK21, Example 2], i.e., ( f 1 , f 2 , u 0 ) = 1, 0, x → 1 − 2 x − 1 2 .Clearly, the (at x = 1 2 ) irregular initial datum yields a reduced regularity of the unknown solution u of the heat equation and the associated u = (u, −∇ x u). Figure 2 displays the resulting error estimators η for simplicial and prismatic meshes, and uniform and adaptive refinement over the degrees of freedom.The rates 0.25 and 0.42 for simplicial meshes have been already observed in [FK21].Only adaptive refinement for prismatic meshes leads to the optimal rate 1/2, while uniform refinement yields the rate 0.38.4.2.3.Initial datum with a "singularity" at the boundary.We consider Clearly, the (at x = 0) irregular initial datum yields a reduced regularity of the unknown solution u of the heat equation and the associated u = (u, −∇ x u). Figure 3 displays the resulting error estimators η for simplicial and prismatic meshes, and uniform and adaptive refinement over the degrees of freedom.As in the previous example, only adaptive refinement for prismatic meshes leads to the optimal rate 1/2, while uniform refinement yields the rate 0.26.For simplicial meshes, we observe the rates 0.19 and 0.33.

Experiments in 2+1D.
For the following experiments, we fix the domain Ω := (0, 1) 2 and the end time point T := 1.The initial prismatic mesh of the resulting space-time cylinder Q is obtained by splitting Q into two prisms with triangular base.For the initial tetrahedral mesh T 0 , we split Q into twelve tetrahedra.
Note that the initial datum does not match with the homogenous Dirichlet boundary conditions so that the solution u of the heat equation and the associated u = (u, −∇ x u) are singular for (t, x) ∈ {0} × ∂Ω. Figure 4 displays the resulting error estimators η for simplicial and prismatic meshes, and uniform and adaptive refinement vs. the degrees of freedom.The rates 0.07 and 0.08 for simplicial meshes have been already reported in [FK21].For prismatic meshes, we observe somewhat improved rates 0.09 and 0.14.As mentioned before, the optimal rate for FIGURE 5. Convergence plot of Section 4.3.2 for spatial domain Ω = (0, 1) 2 , right-hand side ( f 1 , f 2 , u 0 ) = 0, 0, x → (x 1 − 1 2 ) 2 + (x 2 − 1 2 ) 2 sin(πx 1 ) sin(πx 2 ) , simplicial and prismatic meshes, and uniform and adaptive refinement.
Clearly, the (at x 1 = 0) irregular initial datum yields a reduced regularity of the unknown solution u of the heat equation and the associated u = (u, −∇ x u). Figure 6 displays the resulting error estimators η for simplicial and prismatic meshes, and uniform and adaptive refinement over the degrees of freedom.As in the previous example, only adaptive refinement for prismatic meshes leads to the optimal rate 1/3, while uniform refinement yields the rate 0.3.For simplicial meshes, we observe the rates 0.23 and 0.27.
such a least squares 1 In this work, by C D we will mean that C ≥ 0 can be bounded by a multiple of D ≥ 0, independently of parameters on which C and D may depend.Obviously, C D is defined as C D, and C D as C D and C D.