Variational Time Discretizations of Higher Order and Higher Regularity

We consider a family of variational time discretizations that are generalizations of discontinuous Galerkin (dG) and continuous Galerkin-Petrov (cGP) methods. The family is characterized by two parameters. One describes the polynomial ansatz order while the other one is associated with the global smoothness that is ensured by higher order collocation conditions at both ends of the subintervals. The presented methods provide the same stability properties as dG or cGP. Provided that suitable quadrature rules of Hermite type for evaluating the integrals in the variational conditions are used, the variational time discretization methods are connected to special collocation methods. For this case, we will present error estimates, numerical experiments, and a computationally cheap postprocessing that allows to increase both the accuracy and the global smoothness by one order.


Introduction
One way to solve parabolic partial differential equations starts with a semi-discretization in space to obtain a huge system of ordinary differential equations that is handled by a suitable temporal discretization.If the spatial discretization gets finer, the system of ordinary differential equations becomes stiffer.Hence, implicit methods are preferable in order to exclude upper bounds for the time step length.Moreover, the used time discretization should be at least A-stable to ensure suitable stability properties.This means that the order of BDF methods can be at most two.In order to have A-stable temporal discretizations of higher order, implicit Runge-Kutta methods, discontinuous Galerkin (dG), or continuous Galerkin-Petrov (cGP) schemes could be applied.
This paper deals with a family of variational time discretizations that generalizes dG and cGP methods.In addition to variational equations, collocation conditions will be used.The considered family is characterized by two parameters: the first one is the local polynomial ansatz order and the second parameter is related to the global smoothness of the numerical solution that is ensured by higher order collocation conditions at both ends of the subintervals.With respect to their stability behavior, the family of variational time discretizations can be divided into two groups.While the first group shares its stability properties with the cGP method, the second group behaves like the dG method.These observations, considered in [9], suggest that the whole family of methods is appropriate to handle stiff problems.Since the test space for each family member is allowed to be discontinuous with respect to the time mesh, the discrete problem can be solved in a time-marching process, i.e., by a sequence of local problems on the subintervals.
Our studies can be build on a broad knowledge base on various aspects of dG and cGP time discretization methods in the literature.The a priori and a posteriori analysis of dG methods is well understood, see [12,25].Moreover, dG methods are known to be strongly A-stable.Investigations of cGP(r)-method for r = 1 and systems of linear ordinary differential equations can be found in [12,Sect. 9.3].Optimal error estimates and superconvergence results for the fully discretized heat equation, based on a finite element method in space and a cGP(r) method in time, are given in [7].The cGP(r) method was analyzed in [24] for the affine linear case in an abstract Hilbert-space setting and a general nonlinear system of ordinary differential equations in the ddimensional Euclidean space.Using energy arguments, the A-stability of cGP(r) methods was shown.In addition, cGP(r) provides an energy decreasing property for the gradient flow equation of an energy functional, see [24].
Postprocessing techniques for dG and cGP methods applied to systems of ordinary differential equations have been given in [21].They allow to obtain an improved convergence by one order in integral-based norms.Furthermore, the postprocessed dG-solution will be continuous while the postprocessing of cGP-solutions leads to continuously differentiable trajectories.This paper transfers and generalizes the postprocessing ideas to the whole family of variational time discretizations.The postprocessing creates an improved solution where the global smoothness is increased by one differentiation order.Moreover, the postprocessing lifts the originally obtained numerical solution on each time subinterval to the polynomial space with one degree higher.This results in an increased accuracy in integral-based norms, see [1,3,4,8,13,18,19,21].Please note that the postprocessing comes with almost no computational costs since just jumps of derivatives of the discrete solution are needed.Beside the improvements of accuracy and global smoothness, the postprocessing can be used to drive an efficient adaptive time step control, see [2].
As mentioned above, the variational time discretizations analyzed in this paper use collocation conditions at the end points of the time subintervals.We will show connections between pure collocation methods and numerically integrated variational time discretizations provided a suitable quadrature rule of Hermite type is applied.Based on these connections the existence and uniqueness of discrete solutions will be shown.Moreover, optimal error estimates follow.The connection between collocation methods and postprocessed numerically integrated discontinuous Galerkin methods using the right-sided Gauss-Radau quadrature formulas was considered in [26].Moreover, connections between collocation methods and the numerically integrated continuous Galerkin-Petrov methods (using interpolatory quadrature formulas with as many quadrature points as number of independent variational conditions) are shown in [16,17].
For affine linear systems of ordinary differential equations with time-independent coefficients, an interpolation cascade is presented that allows multiple postprocessing steps leading to very accurate solutions with low computational costs.Moreover, temporal derivatives of discrete solutions to affine linear systems with time-independent coefficients form also solutions of variational time discretization schemes.This relation was used to prove optimal error estimates for stabilized finite element methods for linear first-order partial differential equations [13] and for parabolic wave equations [6,8].
The paper is organized as follows.Section 2 provides some notation and formulates the family of variational time discretizations.The general postprocessing technique is considered in Section 3. The connection between collocation methods and numerically integrated variational time discretizations will be given in Section 4. This connection is exploited to provide results on the existence of unique solutions and to obtain error estimates.We present in Section 5 for affine linear ODE systems an interpolation cascade that allows multiple postprocessing steps.Properties of derivatives of solutions to variational time discretization methods will be discussed in Section 6. Numerical experiments supporting the theoretical results are presented in Section 7.

Notation and formulation of the methods
We consider the initial value problem where M ∈ R d×d is a regular matrix and F , sufficiently smooth, satisfies a Lipschitz-condition with respect to the second variable.Furthermore, let I = (t 0 , t 0 + T ] be an arbitrary but fixed time interval with positive length T .The value u 0 at t = t 0 will be called the initial value in the following. If the ODE system (2.1) originates from a finite element semi-discretization in space of a parabolic partial differential equation then M is the time-constant mass matrix.The explicit appearance of M allows to see easily if for certain ideas some systems of linear equations have to be solved and additional effort is necessary.
To describe the vector-valued case (d > 1) in an easy way, let (•, •) be the standard inner product and • the Euclidean norm on R d , d ∈ N. Besides, let e j be the jth standard unit vector in R d , 1 ≤ j ≤ d.
For an arbitrary interval J and q ∈ N, the spaces of continuous and p times continuously differentiable R q -valued functions on J are written as C(J, R q ) and C p (J, R q ), respectively.Furthermore, the space of square-integrable R q -valued functions shall be denoted by L 2 (J, R q ) or, for convenience, sometimes also by C −1 (J, R q ).For s ∈ Z, s ≥ 0, we write P s (J, R q ) for the space of R q -valued polynomials on J of degree less than or equal to s.Further notation will be introduced later at the beginning of the sections where it is needed.
In order to describe the methods, we need some time mesh.Therefore, the interval I is decomposed by For any piecewise continuous function v we define by the one-sided limits and the jump of v at t n .In this paper C denotes a generic constant independent of the time mesh parameter τ .

Local formulation
Let r, k ∈ Z with 0 ≤ k ≤ r.In order to numerically solve the initial value problem (2.1), we shall introduce special variational time discretization methods VTD r k with parameters r and k.Using a standard time-marching strategy, the discrete solution is successively determined on I n , n = 1, . . ., N , by local problems of the form and where U (t − 0 ) = u 0 and δ i,j denotes the Kronecker Delta.Moreover, the integrator I n represents either the integral over I n or the application of a quadrature formula for approximate integration.Details will be described later on.Note that the formulation can be easily extended to the case k = r + 1.Then the variational condition (2.2d) needs to hold formally for all ϕ ∈ P −1 (I n , R d ) that should be interpreted as "there is no variational condition".Hence, only conditions at both ends of the interval I n are used.
The method VTD r k can be shortly described by trial space: P r , if k ≥ 1 : initial condition, test space: The notation ODE (i) means that the discrete solution fulfills the ith derivative of the system of ordinary differential equations.Counting the number of conditions leads for k ≥ 1 to while we have dim P r = r + 1 conditions if k = 0.The number of degrees of freedom equals for all k to dim P r = r + 1.Hence, the number of conditions coincides for all cases with the number of degrees of freedom.

Remark 2.1
The VTD r k framework generalizes two well-known types of variational time discretization methods.The method VTD r 0 is the discontinuous Galerkin method dG(r) whereas the method VTD r 1 equates to the continuous Galerkin-Petrov method cGP(r).
On closer considerations we see that methods VTD r k with even k are dG-like since there are point conditions on the k 2 th derivative of the discrete solution but this derivative might be discontinuous.The methods VTD r k with odd k are cGP-like since there are point conditions up to the k 2 th derivative of the discrete solution and this derivative is continuous.We have in detail where we used and generalized the definitions and notation of [21].Note that there is also another reason for naming the methods like this.All methods with odd k share their A-stability with the cGP method while methods with even k are strongly A-stable as the dG method.♣ In order to obtain a fully computable discrete problem usually a quadrature formula Q n is chosen as integrator, i.e., I n = Q n .To indicate this choice, we simply write Q n -VTD r k .Moreover, we agree that integration over I n is used if no quadrature rule is specified.We shall mostly use quadrature rules that are exact for polynomials of degree up to 2r − k.This ensures in the case of an affine linear right-hand side F (t, u) = f (t) − Au with time-independent A that all u depending terms in (2.2d) are integrated exactly.
The special structure of the method (2.2) motivates to use an assigned interpolation operator that conserves derivatives at the end points of the interval up to a certain order.In detail, we define on the reference interval [−1, 1] the interpolation operator ) that uses the interpolation points at the left end: derivatives up to order k−1 2 in −1 + , at the right end: derivatives up to order k 2 in 1 − , in the interior: zeros ti ∈ (−1, 1) of the (r − k)th Jacobi-polynomial with respect to the weight (1 + t) Note that there is not even a single point value at the left end for k = 0. Thinking of multiple counting in the case of derivatives, a total number of interpolation points is obtained.Hence, the number of conditions coincides with the dimension of P r .The interpolation operator I r k is of Hermite-type and provides the standard error estimates for Hermite interpolation.
In addition, we define by that is in a natural way assigned to the method VTD r k .The quadrature rules Q r k are known in the literature as generalized Gauss-Radau or Gauss-Lobatto formulas, respectively, see e.g.[14,20].The weights of the quadrature rule Q r k could be calculated by integrating the appropriate Hermite basis functions on [−1, 1].Finally, we obtain The quadrature rule Q r k is exact for polynomials up to degree 2r − k.It can be shown that all quadrature weights are different from zero, see [20].In detail, we have so even the sign of the weights is known.Note that in general (for k ≥ 2) not all weights are positive.Semi-explicit or recursive formulas for the weights of these methods can be found in [23].
Transferring the quadrature rule Q r k and the interpolation operator I r k from [−1, 1] to the interval I n , we obtain Q r k and I r k where we usually skip to indicate the interval I n since this will be clear from context.Hence, we have where t n,i = 1 2 t n−1 + t n + τ n ti ∈ I n , i = 1, . . ., r − k.The appearing factor τn 2 i originates from the chain rule.

Remark 2.2
The quadrature rule Q r 0 is the well-known right-sided Gauss-Radau quadrature formula with r + 1 points that is typically used for the discontinuous Galerkin method dG(r).Q r 1 is the Gauss-Lobatto quadrature rule with r + 1 points that is often used together with the continuous Galerkin-Petrov method cGP(r).♣

Remark 2.3
For 1 ≤ k ≤ r the VTD r k method with exact integration could be analyzed also in a generalization of the unified framework of [5] as we shall show below, see (2.5).Note that the dG-method (k = 0) has been already fitted in this framework there.
We define for k ≥ 1 a projection operator and Then an equivalent formulation of (2.2) with 1 ≤ k ≤ r and exact integration reads where U (t 0 ) = u 0 .Indeed, if U solves (2.2) then U ′ ∈ P r−1 (I n , R d ) obviously satisfies all conditions of (2.4) with v = F •, U (•) .Since P n v is uniquely defined we directly get (2.5).
Otherwise let U solve (2.5).Since there are polynomials on both sides, we can differentiate the equation by any order.With (2.4a) and (2.4b) we have for t = t − n and all i = 0, . . ., k 2 − 1 if k ≥ 2, as well as for t = t + n−1 and all i = 0, . . ., k−1 2 − 1 if k ≥ 3, respectively.Hence, the conditions (2.2b) and (2.2c) hold.Taking the inner product of (2.5) with an arbitrary ϕ ∈ P r−k (I n , R d ) and integrating over I n yield together with (2.4c) Note that certain numerically integrated versions of VTD r k could also be written in the form (2.5) if appropriate projections are applied.♣

Global formulation
For s ∈ Z, s ≥ 0, we define the space Y s of vector-valued piecewise polynomials of maximal degree s by Studying the conditions (2.2a), (2.2b), and (2.2c) we see that the solution Consequently, the method could be reformulated as follows for all n = 1, . . ., N , and where , which includes the initial value u 0 in the problem formulation.We agree on defining u (j) (t 0 ) recursively using the differential equation, i.e., (2.7) The term d j−1 dt j−1 F t, u(t) t=t0 depends only on u(t 0 ), . . ., u (j−1) (t 0 ) and can be calculated using some generalization of Faà di Bruno's formula, see e.g.[11,22].If F is affine linear in u, i.e., by Leibniz' rule for the (j − 1)th derivative.Note that since the test space Y r−k in (2.6c) allows discontinuities at the boundaries of subintervals, the problem can be decoupled by choosing test functions ϕ supported on a single time interval I n only.Moreover, exploiting for k ≥ 1 the property U ∈ C ⌊ k−1 2 ⌋ as well as (2.6a) and (2.6b), we also obtain (2.2a) and (2.2c).Therefore, the global problem (2.6) can be converted back into a sequence of local problems (2.2) in time on the different subintervals I n , n = 1, . . ., N .
On each subinterval I n , the local solution U | In belongs to P r (I n , R d ).Hence, it is completely described by r+1 vector coefficients from R d .However, k−1 2 +1 = k+1 2 of them are already fixed by data inherited from the previous subinterval I n−1 or the initial conditions since U ∈ C ⌊ k−1 2 ⌋ .This means that the size of the system to be solved on each subinterval is r − k−1

2
. The extreme case k = r leads to a system size of r 2 + 1 that is roughly half of the size r + 1 obtained for k = 0.

Postprocessing
We shall present a simple postprocessing in this section.Recall (2.3) for the definition of the quadrature points of the quadrature rule Q r k which is exact for polynomials up to degree 2r − k.
where ϑ n vanishes in all (r +1) quadrature points of Q r k and additionally satisfies ϑ Proof: We have to verify that U satisfies all conditions for Q r k -VTD r+1 k+2 where Q r k is the quadrature rule associated to VTD r k which is exact for polynomials up to degree 2r − k.First of all we show an identity needed later.The special form of ϑ n , the exactness of Q r k , and integration by parts yield Precisely, we used that both ϑ ′ n ϕ and ϑ n ϕ ′ are polynomials of maximal degree 2r − k and that ϑ n vanishes in all quadrature points, especially in t − n and for k ≥ 1 also in , for details see (iii) below.The remaining conditions can be verified as follows.
Just like above we get, additionally using the definition of a n , ).Actually, we can even test with functions ϕ ∈ P r−k (I n , R d ).
We first study the case k ≥ 1.By the definitions of U and U , the identity (3.2), and the fact that U and U coincide at all quadrature points we have Now let k = 0.The same arguments as for k ≥ 1 yield for all ϕ ∈ P r (I n , R d ) We study the last two terms.Using the definitions of the jump U n−1 and of U , we find where we also exploited that ϑ n−1 (t − n−1 ) = 0. Hence, we have Choosing the special test functions ϕ j ∈ P r (I n , R d ), j = 1, . . ., d, that vanish in the r inner quadrature points of Q r 0 and satisfy ϕ j (t + n−1 ) = e j as well as having in mind (ii), we component-wise find U n−1 = 0. Thereby, at once we have proven the initial condition and verified the needed variational condition since now also the jump term in (3.4) can be dropped.
(iv) Conditions at t + n−1 for 0 − 1: With an argumentation similar to that in (i) we gain We use the variational condition for U with specially chosen test functions ϕ j ∈ P r−k (I n , R d ), j = 1, . . ., d, that vanish at all inner quadrature points of Q r k , i.e., ϕ j (t n,i ) = 0, i = 1, . . ., r − k, and satisfy As shown in (iii) we have The special choices of ϕ j , the definition of the quadrature rule, and the already known identities from (i), (ii), and (iv) yield after a short calculation using Leibniz' rule for the ith derivative that .
Note that we also used that Collecting the above arguments, we see that From the definition (3.1), it seems that a linear system with the mass matrix M has to be solved in every time step in order to obtain the correction vector a n .However, the computational costs for calculating a n can be reduced significantly as we shall show now.

Proposition 3.2
The correction vectors a n ∈ R d defined in (3.1) for the postprocessing presented in Theorem 3.1 can be alternatively calculated by and

7).
Proof: For k = 0, we get from (3.3) combined with U n−1 = 0, which was shown just below (3.4), Otherwise, for k ≥ 1, using the definition of the postprocessing and (v) of the proof of Theorem 3.1, we obtain that .
Furthermore, we have ϑ where also ϑ n−1 (t − n−1 ) = 0 for i = 0, . . ., k 2 and (i) or (ii) of the proof of Theorem 3.1 were used.Altogether exploiting that M is regular an easy manipulation of the identities yields A similar formula can also be established for n = 1.Since U satisfies (2.2c) and U (t 0 ) := u 0 , we obviously obtain, recalling the definition (2.7) of u (i) (t 0 ), that Therefore, we have in (3.5) for n = 1 This results in Hence, the alternative calculation provides the same correction vector.
Note that a n can be calculated in this way without solving a system of linear equations.From the structure of a n we see that the postprocessing can be interpreted as a correction of the jump in the lowest order derivative of the discrete solution that is not continuous by construction.
Since the division by ϑ ) changes the normalization of ϑ n only, we conclude the following.where with ϑ n from Theorem 3.1, i.e., θn vanishes in all (r + 1) quadrature points of Q r k and additionally satisfies where A direct proof for the alternative postprocessing is given in Appendix A.

Connections between numerically integrated variational time discretization methods and collocation methods
We shall prove that the (local ) can be characterized as the solution of the (local) collocation problem with respect to the quadrature points of Q r k , i.e., where U (t 0 ) = u 0 .Here, t n,i are the zeros of the (r − k)th Jacobi-polynomial with respect to the weight (1 + t) , see also (2.3).Methods similar to (4.1) are known as collocation methods with multiple nodes, see e.g.[15, p. 275].Unfortunately, existing results in the literature often neglect to study the unique solvability or conditions on F are not explicitly given.However, the connections mentioned above directly imply that all these methods are equivalent as we will prove now.
if and only if U solves the collocation method (4.1) with respect to the quadrature points of Q r k .Proof: For clarity and convenience, we recall the conditions of the Q r k -VTD r+1 l method with 0 ≤ k ≤ r and 1 First of all, assume that U solves (4.1).Then because of (4.1a) and (4.1b) obviously U satisfies the conditions (4.2a) and (4.2b) since l ≤ k + 2. In order to gain a better understanding of the numerically integrated variational condition, we have a look at its detailed definition.For ϕ ∈ C ⌊ k 2 ⌋ (I n , R d ) we have by definition of the quadrature rule Applying Leibniz' rule for the ith derivative, the right-hand side above can be rewritten as Using the collocation conditions (4.1b), (4.1c), and (4.1a), we see that all three sums equal to 0. Hence, we obtain which immediately gives (4.2c).Now, we study the other direction and assume that U solves (4.2).In order to verify (4.1a), for example, we need to prove that (4.2a) also holds for i = l 2 , . . ., k 2 .For this purpose (in the case that l 2 ≤ k 2 ) we use the special test function with an arbitrary vector c ∈ R d where t n,i , i = 1, . . ., r − k, denote the inner quadrature points of Q r k .Then (4.2c), the special construction of ψ, (4.2a), and (4.2b) yield Furthermore, we have that Thus, the above identity simplifies to , and the vector c ∈ R d can be chosen arbitrarily, it follows Using the test functions with an arbitrary vector c ∈ R d and j = 1, . . ., k 2 − l 2 we iteratively also prove A similar argument can also be used for the missing point conditions at t + n−1 .Note that we needed for the above implications that the weights of Q r k do not vanish which has been proven in [20].
It remains to verify (4.1c).Since we already know that U satisfies the collocation conditions (4.1a) and (4.1b), the variational condition (4.2c) reduces to Also recall that w I j > 0 for all 1 ≤ j ≤ r − k.Thus, choosing in (4.4) test functions of the form with an arbitrary vector c ∈ R d , we get the collocation condition (4.1c) in t n,j .Hence, the stated equivalence has been proven.
Summarizing, we have shown that a solution of Q r k -VTD r+1 k+2 also solves Q r k -VTD r+1 l with 1 ≤ l ≤ k + 2 as well as a collocation with respect to the quadrature points of Q r k and vice versa.Shortly, we have = collocation with respect to the quadrature points of Q r k .Remark 4.2 Independent of the above findings, the connection between collocation methods and (postprocessed) numerically integrated discontinuous Galerkin methods (using the right-sided Gauss-Radau quadrature), i.e., Theorem 4.1 for k = 0 ≤ r and l = 2, was already observed in [26].Moreover, connections between collocation methods and the numerically integrated continuous Galerkin-Petrov methods (using interpolatory quadrature formulas with as many quadrature points as number of independent variational conditions) are shown in [16,17].♣

Error estimates for collocation methods
The method defined in (4.1) is a collocation method with multiple nodes as considered for example in [15, p. 275].Thus, the usual error analysis for collocation methods also applies here, provided that F and u are sufficiently smooth.According to [15, p. 276], we have the following error estimate.

Proposition 4.3
The collocation method (4.1) possesses the same order 2r − k + 1 as the underlying quadrature formula Q r k .Hence, it holds where U and u denote the solutions of the collocation method (4.1) and the initial value problem (2.1), respectively. Moreover and where the derivatives are understood in an interval-wise sense.
The term 2r − k + 1 inside the minimum is due to the fact that the convergence order of these collocation methods is limited by the accuracy of the underlying quadrature formula Q r k that is exactly 2r − k + 1.Note that the limitation is active for r = k only.

Reversed postprocessing
In Section 3 we studied a postprocessing for Q r k -VTD r k .We have seen that starting from a solution U of Q r k -VTD r k we can easily construct a solution U of Q r k -VTD r+1 k+2 .This already implies uniqueness of U provided that solutions of Q r k -VTD r+1 k+2 or (4.1) are unique.Indeed, if U 1 and U 2 solve Q r k -VTD r k and their postprocessed solutions are identical, then by construction of the postprocessing U 1 and U 2 coincide in the (r + 1) quadrature points of Q r k .Thus, since both are polynomials of degree r, it follows We now ask whether or not this postprocessing step can be reversed for 0 ≤ k ≤ r.Then this would imply the solvability of Since I r k conserves the derivatives up to order k 2 at t − n and up to order k−1 2 at t + n−1 , respectively, we have and analogously These are in fact (2.2b) and (2.2c).
It remains to prove that I r k U also satisfies the variational condition (2.2d) with Note that originally from the definition of Q r k -VTD r+1 k+2 the variational condition is postulated for and θn ∈ P r+1 (I n , R) as defined in Corollary 3.3.Hence, using (3.2) we conclude for k = 0, completes the argument.

Consequences for existence, uniqueness, and error estimates
The connections between numerically integrated variational time discretization methods and collocation methods with multiple nodes observed in the previous subsections can now be used to obtain results on the existence and uniqueness of solutions of Q r k -VTD r k as well as give rise to global error estimates and superconvergence estimates in the time mesh points.

Corollary 4.6 (Existence and uniqueness)
If there is a solution U ∈ P r+1 (I n , R d ) of the collocation method with multiple nodes defined by (4.1) Furthermore, if U is uniquely defined as solution of (4.1) then so is U as solution of Q r k -VTD r k .

Corollary 4.7 (Global error estimates)
Let the error estimates of Proposition 4.4 hold for the solution U of (4.1) and the exact solution u of (2.1).Then we have for the solution where the derivatives are understood in an interval-wise sense.
Proof: Since U = I r k U , Proposition 4.4 and standard error estimates yield In order to estimate sup t∈I U (r+1) (t) we again use Proposition 4.4 as follows Because of 0 ≤ k ≤ r this completes the proof of the first statement.The second estimate can be proven similarly.

Corollary 4.8 (Superconvergence in time mesh points)
Let the error estimates of Proposition 4.3 hold for the solution U of (4.1) and the exact solution u of (2.1).Then we have Proof: Recall that U = I r k U and that I r k especially preserves the function value in t − n .Hence, and the estimate follows immediately from Proposition 4.3.
Remark 4.9 (Superconvergence in quadrature points) We obtain under the assumptions of Corollary 4.7 also a (lower order) superconvergence estimate for the solution In fact, let t n,i , i = 1, . . ., r − k, denote the local quadrature points of Q r k in the interior of I n .Then, we have In addition, we obtain These superconvergence estimates especially imply gives an extra order of convergence.♣ Remark 4.10 (Superconconvergence of derivative(s) in time mesh points) From the point conditions (2.2b) and the bound of Corollary 4.8 we also gain superconvergence estimates up to the k 2 th derivative of the solution U of Q r k -VTD r k in t − n , provided that F satisfies certain Lipschitz-conditions. Indeed, supposing that by iteration over i = 0, . . ., k 2 − 1. ♣

Interpolation cascade
This section is restricted to study affine linear problems of the form Find u : where M, A ∈ R d×d are time-independent matrices and M is regular, i.e., in the general setting we have F (t, u) = f (t) − Au.

A slight modification of the method
Let 0 ≤ k ≤ r.In order to solve (5.1) numerically, we define the and where U (t − 0 ) = u 0 and g will be chosen later on depending on f .As before I n denotes an integrator, typically the integral over I n or a quadrature formula.
Recall that Q r k denotes the quadrature rule associated to VTD r k determined by (2.3).This quadrature rule is exact for polynomials up to degree 2r − k.Furthermore, I r k is the Hermite interpolation associated to the quadrature rule Q r k .In a first step, we will consider Q r k -VTD r k I r+1 k+2 f for 0 ≤ k ≤ r − 1.Note that the case r = k needs to be excluded since otherwise I r+1 k+2 f would not be well defined.In view of the postprocessing of Section 3 the modified method has some interesting properties as we shall show now.
Proof: First of all, note that we also could use I r+1 k+2 f instead of f in the definition of the correction vector a n ∈ R d for the postprocessing, see (3.1), since I r+1 k+2 preserves all occurring point values.Hence, postprocessing U as in Section 3 yields a function and where U (t − 0 ) = u 0 .Also in (5.3b) and (5.3c) the interpolation operator I r+1 k+2 preserves all occurring point values and therefore could be dropped.Moreover, we see that only polynomials of maximal degree 2r − k appear in (5.3d).Since both quadrature rules Q r k and Q r+1 k+2 are exact in this case and the interpolation operator I r+1 k+2 uses the quadrature points of Q r+1 k+2 , we obtain that Summarizing, we have seen that the postprocessed solution Within the above argument we proved that the method Q r+1 k+2 -VTD r+1 k+2 f and the method Q r k -VTD r+1 k+2 I r+1 k+2 f are equivalent for 0 ≤ k ≤ r − 1.Similarly, one can show that the method Q r k -VTD r+1 k+2 f and the method Q r+1 k+2 -VTD r+1 k+2 I r k f are equivalent for 0 ≤ k ≤ r − 1.Note that also I r k preserves all derivatives that appear in the point conditions at both ends of the interval.♣

Interpolation cascade
Having a closer look at the result of Theorem 5.1, we see that the postprocessed solution of the modified discrete problem also solves a numerically integrated variational time discretization method but with the "right" associated quadrature rule.This enables to do one further postprocessing step.
For 1 ≤ ℓ ≤ r − k, using an interpolation cascade we even could enable up to ℓ + 1 additional postprocessing steps.More concretely we have where ❀ denotes the postprocessing steps.Note that f itself can be used in each postprocessing step to calculate the correction vector a n ∈ R d (cf.Theorem 3.1) since in each step the occurring derivative of f at t − n is preserved by the respective interpolation cascade.Remark 5.3 For Dahlquist's stability equation, i.e., d = 1, M = 1, A = −λ, and f = 0 in (5.1), we easily see that for all ℓ = 0, . . ., k 2 .Thus, ℓ postprocessing steps can be applied for this equation.Since the postprocessing does not change the function value in the end points of the intervals, the stability function does not change either.Therefore, VTD r k as well as Q r k -VTD r k provide the same stability function as VTD r−ℓ k−2ℓ .With the special choice ℓ = k 2 , we immediately find that VTD r k shares its stability properties with Hence, all methods with even k share their strong A-stability with the dG method while methods with odd k are A-stable as the cGP method, cf.Remark 2.1 and [6].♣ Remark 5.4 Analogously to Theorem 3.1 also a postprocessing from Q r k -VTD r k (g) to Q r k -VTD r+1 k+2 (g) can be proven when instead of (3.1) the correction vector is determined by However, when g and its derivatives are not globally continuous up to a sufficiently high order, in general the discrete solution of Q r k -VTD r k (g) is not ⌊ k−1 2 ⌋-times continuously differentiable.Therefore, the postprocessing by jumps and the postprocessing by (modified) residuals will not provide the same correction anymore.
A more detailed analysis shows that applying two postprocessing steps based on residuals on the solution of Q r k -VTD r k (f ) yields the solution of Q r+2 k+4 -VTD r+2 k+4 (I r+1, * f ) where I r+1, * f interpolates f in the quadrature points of Q r k and additionally preserves its (⌊ k 2 ⌋+1)th derivative in t − n .Similarly for dG-like methods (characterized by even k) we find that applying two postprocessing steps based on jumps on the solution of Q r k -VTD r k (f ) gives the solution of Q r+2 k+4 -VTD r+2 k+4 (I r+1 * f ) where I r+1 * f interpolates f in the quadrature points of Q r k and additionally preserves its (⌊ k−1 2 ⌋+1)th derivative in t + n−1 .♣

Derivatives of solutions
As in Section 5 we consider affine linear problems of the form (5.1) with time-independent coefficients.Since the quadrature formula Q r k is exact for polynomials up to degree 2r − k and the associated interpolation operator I r k yields polynomials of degree r, we can write both VTD r k f and Q r k -VTD r k f for 0 ≤ k ≤ r in the form Example 7.1 We consider the initial value problem , t ∈ (0, 32), u(0) = 1/2 0 , of a system of nonlinear ordinary differential equations which has u 1 (t) = cos t 2 + sin t , u 2 (t) = sin t 2 + sin t as solution.
The appearing nonlinear systems within each time step were solved by Newton's method where we applied a Taylor expansion of the inherited data from the previous time interval to calculate an initial guess for all unknowns on the current interval.If higher order derivatives were needed at initial time t = 0, the ode system and its temporal derivatives were used, see (2.7).The postprocessing used the jumps of the derivatives, as given in Corollary 3.3.
We denote by e := u − U, ẽ := u − U the error of the solution U and the error of the postprocessed solution U , respectively.Errors were measured in the norms where • denotes the Euclidean norm in R d .  1 presents the results for Q 6 0 -VTD 6 0 which is just dG (6) with numerical quadrature by the right-sided Gauss-Radau formula with 7 points.We show norms of the error between the solution u and the discrete solution U as well as the error between the solution u and the postprocessed discrete solution U in different norms.Using the results for N = 4096 and N = 8192, the experimental order of convergence (eoc) is calculated.In addition, the theoretically predicted convergence orders (theo) are given.We see clearly from Table 1 that the experimental orders of convergence coincide with the theoretical predictions.This holds for the function itself and its time derivative.Moreover, the order of convergence increases by 1 if one postprocessing step is applied.It is noteworthy that the error norm ẽ′ ℓ ∞ shows the same high order superconvergence order as e ℓ ∞ .This behavior is due to the collocation conditions satisfied by the postprocessed solution U .
The results of our calculations using the variational time discretization Q 6 5 -VTD 6  5 are collected in Table 2. Again we present the results in different norms for both the error itself and the error obtained after postprocessing the discrete solution.Also for this temporal discretization, all theoretically predicted orders of convergence are met by our numerical experiments.Compared to the results of Q 6 0 -VTD 6 0 the superconvergence order measured in • ℓ ∞ is much smaller which is in agreement with our theory.In addition, the of convergence of ẽ′ ℓ ∞ is the same as the order of convergence of e ′ ℓ ∞ since collocation conditions are fulfilled already by the discrete solution U .Hence, an improvement of this quantity by applying the postprocessing is not possible.Table 3 shows the results for calculations using Q 6 6 -VTD 6  6 as discretization in time.The presented error norms indicate that the experimental order of convergence are in agreement with our theory.Please note that the postprocessing does not lead to an improvement of the error itself.However, there is an improvement if we look at the L 2 -norm of the time derivative.We clearly see that the order of convergence is increased from 6 to 7 which is in agreement with Proposition 4.4.Moreover, there is no superconvergence at the discrete time points, as predicted by our theory.

Example 7.2
We consider the affine linear initial value problem where f 1 and f 2 are chosen such that u 1 (t) = (t + t 2 )e t , u 2 (t) = −te t are the solution components.
Table 4 presents the results for Q 7 0 -VTD 7 0 where the cascadic interpolation has been applied to the function f = (f 1 , f 2 ) on the right-hand side, see Section 5. We show norms of the error P P s e after s postprocessing steps using 50 time steps.The given experimental orders of convergence were calculated from the results with 25 and 50 time steps.Looking at the convergence orders in the L 2 -like norms, we clearly see that each postprocessing step increased the experimental order of convergence by 1 if at most 7 postprocessing steps are applied.The postprocessing step 8 leads to an improvement of the convergence order only for the temporal derivative since the function itself already converges with the optimal order 15.The postprocessing has no influence to the ℓ ∞ norm of the error itself while the very first postprocessing step improves the results for the derivative of the error in the ℓ ∞ norm.This is caused by the fact that the postprocessed solution fulfills a collocation condition at the discrete time points.5 presents the experimental orders of convergence of (P P s e) ′ L 2 for Q 7 k -VTD 7 k , k = 0, . . ., 7, after s postprocessing steps where at most r + 1 − k = 8 − k steps have been applied.The cascadic interpolation of the right-hand function f is used for all considered methods.It can be clearly seen that each additional postprocessing step increases the convergence by one order.Using the same number of postprocessing steps, the obtained convergence orders do not depend on the particular methods.Since each postprocessing step is covered by our theory and postprocessing by jumps and postprocessing by residual are equivalent for a single step, both types of postprocessing lead to identical results if the cascadic interpolation of the right-hand side function f is used.
The behavior changes if just f and not its cascadic interpolation is used.Table 6 shows for the methods Q 9 k -VTD 9 k , k = 0, . . ., the experimental convergence order of (P P s e) ′ L 2 after s postprocessing steps based on jumps where at most r + 1 − k = 10 − k steps have been carried out.The column s = 1 shows, as predicted by our theory, that the convergence order increases by 1 for all methods.The behavior using at least two postprocessing steps depends strongly on the parameter k of the variational time discretizations.For dG-like methods (characterized by even k), an additional improvement by one order is obtained independent of the number of postprocessing steps.The situation is completely different for cGP-like method (corresponding to odd k).For k ≡ 3 mod 4, the second postprocessing step does not lead to an improvement of the convergence order compared to a single postprocessing step.If k ≡ 1 (mod 4) then the second postprocessing step provides an increased convergence order.However, for all cGP-like methods, the obtained We will show that U satisfies all conditions for Q r k -VTD r+1 k+2 .
(a) Similar to the proof of Theorem 3.1 we prove the initial condition U (t + n−1 ) = U (t − n−1 ), the point conditions at t − n up to order k 2 − 1 and at t + n−1 up to order k−1 2 − 1, as well as the variational condition.Note that for k = 0 the proof of the variational condition is even easier since the initial condition U (t + n−1 ) = U (t − n−1 ) is immediately clear from the alternative definition of the postprocessing.In detail, we have Recalling the definition of u (i) (t 0 ), we iteratively obtain From this we conclude, using the definitions of U , ãn , and θn , that Now, let n > 1.We assume that U solves the I n−1 -problem which will be finally shown when also the last condition is proved, see (c).Then by construction where we also used that we already know that U (i) (t + n−1 ) = U (i) (t − n−1 ) for 0 ≤ i ≤ k−1 This can be done similar to the proof of Theorem 3.1.
By (a3) we have The special definition of ϕ j , the definition of the quadrature rule, and the already known identities from (a1), (a2), and (b) (for t + n−1 and k ≥ 1) yield after a short calculation using Leibniz' rule for the ith derivative that Hence, U solves Q r k -VTD r+1 k+2 .

2 .(c) Condition at t − n for i = k+2 2 − 1 = k 2 :
We have to show that

t=t − n where we exploited that w R ⌊ k 2 ⌋
= 0. Note that (b) is only needed for t + n−1 and already completely proven for t + 0 .Hence, (b) and (c) can be iteratively shown for all n.

Table 4 :
Example 7.2: Results for Q 7 0 -VTD 7 0 = dG(7) with cascadic interpolation of f and s postprocessing steps.P P s e L 2 (P P s e) ′