Unified analysis for variational time discretizations of higher order and higher regularity applied to non-stiff ODEs

We present a unified analysis for a family of variational time discretization methods, including discontinuous Galerkin methods and continuous Galerkin–Petrov methods, applied to non-stiff initial value problems. Besides the well-definedness of the methods, the global error and superconvergence properties are analyzed under rather weak abstract assumptions which also allow considerations of a wide variety of quadrature formulas. Numerical experiments illustrate and support the theoretical results.


Introduction
We study variational time discretization methods applied to non-stiff initial value problems of the form u ′ (t) = f t, u(t) , u(t 0 ) = u 0 ∈ R d , ( where f is supposed to be sufficiently smooth and to satisfy a Lipschitz condition on the second variable with constant L > 0. Here, an initial value u 0 is given at t = t 0 .The system of ordinary differential equations (ODEs) is considered on a time interval I = (t 0 , t 0 + T ] with arbitrary but fixed length T > 0. Probably the most popular variational discretization schemes for solving (1.1) numerically are discontinuous Galerkin (dG) and continuous Galerkin-Petrov (cGP) methods.Both have been studied in the literature for many decades.However, an important part for the design of a then fully computable discretization, namely the use of quadrature methods for approximate integration, is often only marginally considered or the assumptions on the quadrature formulas are quite restrictive and, thus, allow a small variability of quadrature formulas only.So, for example, the number of quadrature points is supposed to equal the dimension of the local ansatz space of the discrete solution, cf.[6,5], or the number of independent variational conditions, cf.[13,12].Often also the investigations are restricted to special Gauss, Gauss-Radau, or Gauss-Lobatto formulas.A quite general setting is considered in [8] where so-called internal and external quadrature formulas are studied.External quadrature means that integrals over products of f with test functions are approximated by quadrature rules.If f is replaced by a polynomial approximation before (numerical) integration then internal quadrature formulas appear.Since in [8] the dynamical behavior of the schemes is analyzed, it is assumed that at least for Dahlquist's test problem u ′ = λu all integrals are integrated exactly which again is quite restrictive.
In this paper we try to keep the requirements on the approximation operators involved in the definition of the discrete method as low as possible.Our scope is to figure out how the approximation properties of these (external and internal) operators affect the error behavior of the numerical solution.Thereby our studies are not restricted to the dG or cGP method but cover the whole family of variational discretization methods recently introduced in [3,2].The used assumptions are quite general and abstract in order enable the study of a wide variety of methods.Especially, we allow that the right-hand side is approximated by a composition of various interpolation operators (interpolation cascade) or that the quadrature formulas also use derivative values.Therefore all variational methods considered in [2] and their convergence results can be verified by a rigorous and unified error analysis in the non-stiff case now.
The paper is structured as follows.In Section 2 the variational time discretization methods are formulated.The well-definedness of the method formulation is studied in Section 3 where the existence of unique discrete solutions is proven under appropriate conditions.Section 4 is devoted to the error analysis.Here, at first, the global error is bounded by a sum of rather abstract approximation error terms.Then, in order to give a better insight into the error behavior, the convergence orders of these approximation errors are estimated in terms of basic quantities describing the approximation properties of the involved operators.Thereafter, superconvergence results in the time mesh nodes are derived in Section 5. Finally, in Section 6 we use numerical experiments to illustrate the convergence behavior of the variational time discretization methods and show that the proven estimates are sharp.
Notation: In this paper C denotes a generic constant independent of the time mesh interval length.To describe the vector-valued case (d > 1) in an easy way, let (•, •) be the standard inner product and • the Euclidean norm on R d , d ∈ N.Besides let e j , 1 ≤ j ≤ d, be the jth standard unit vector in R d .
For an arbitrary interval J and positive integers q, the spaces of continuous and k times continuously differentiable R q -valued functions on J are written as C(J, R q ) and C k (J, R q ), respectively.Furthermore, the space of square-integrable R q -valued functions shall be denoted by L 2 (J, R q ) or, for convenience, sometimes also by C −1 (J, R q ).For non-negative integers s, we write P s (J, R q ) for the space of R q -valued polynomials of degree less than or equal to s.For q = 1, we sometimes omit R q .Further notation will be introduced later at the beginning of the sections where it is needed.

Formulation of the methods
The interval I is decomposed by where n = 1, . . ., N .Furthermore, we set For convenience and to simplify the notation in some proofs, we assume τ n ≤ 1 which is not really a restriction since we are anyway interested in the asymptotic error behavior for τ → 0. For any piecewise continuous functions v we define by the one-sided limits and the jump of v at t n .We now present a slight generalization of the variational time discretization methods VTD r k investigated in [2].Let r, k ∈ Z, 0 ≤ k ≤ r.Then the local version (I n -problem) of the numerical method reads as follows: and where U (t − 0 ) = u 0 and δ i,j is the Kronecker symbol.
Here, I n denotes an integrator that typically represents either the integral over I n or the application of a quadrature formula for approximate integration.Details will be described later on.Moreover, I n could be used to model some projection/interpolation of f or the usage of some special (internal) quadrature rules even if I n is just the integral.

Existence and Uniqueness
First of all, we agree that both I n and I n are local versions (obtained by transformation) of appropriate linear operators I and I given on the closure of the reference interval (−1, 1].However, I n is an approximation of the integral operator while I n approximates the identity operator.Thus, the transformations scale quite differently.More precisely, let denote the affine transformation that maps the reference interval (−1, 1] on an arbitrary mesh interval I n = (t n−1 , t n ].Furthermore, let k I and k I be the smallest non-negative integers such that I and I are well-defined for functions in C k I ([−1, 1]) and C kI ([−1, 1]), respectively.Then we have for all ϕ ∈ C k I (I n , R d ) and for all ψ ∈ C kI (I n , R d ) that and Of course, these operators act component-wise when applied to vector-valued functions.Moreover, we suppose for all non-negative integers l that I v ∈ C l ([−1, 1]) holds for all v ∈ C max{kI ,l} ([−1, 1]), i.e., I v is at least as smooth as v.
The study of existence and uniqueness of solutions to (2.1) as well as the later error analysis is strongly connected with the following operator.
Let J I,I : where k J := max k 2 − 1, k I , k I be defined by As before let r, k ∈ Z, 0 ≤ k ≤ r, be the parameters of the method.The integrator I is such that ) and imply ψ ≡ 0. Note that the absolute value in the exponent is needed only for k = 0.
Suppose that Assumption 1 holds.Then J I,I given by (3.2) is welldefined.If I preserves polynomials up to degree r then J I,I preserves polynomials up to degree min{r + 1, r}.

Proof:
We have to study existence and uniqueness of the approximation.We need to determine r + 1 coefficients of a polynomial in P r ([−1, 1]).To this we have linear conditions.So, it remains to prove that for v ≡ 0 we obtain J I,I v ≡ 0 as uniquely defined approximation.The conditions with v ≡ 0 read as follows Otherwise, for k ≥ 1 we see from (3.3a), (3.3b) that J I,I v ).Because of Assumption 1 the variational condition (3.3c) implies that ψ ≡ 0. Thus, J I,I v ′ ≡ 0. Additionally using (3.3a) for i = 0, it again follows that Hence, either way J I,I v ≡ 0 is the unique approximation of v ≡ 0 and so in general the approximation is always uniquely determined.
Using the uniqueness of the approximation, the second statement can be easily verified.

Lemma 3.2
Let r ∈ N and k ∈ Z such that 0 ≤ k ≤ r.Suppose that Assumption 1 holds.Then the mapping Proof: The absolute homogeneity and the triangle inequality follow from the linearity of the integrator and of the derivatives along with the respective properties of the norm • .Finally, to show the positive definiteness, we can adapt the arguments that are used in the proof of Lemma 3.1 to show J I,Id v ′ ≡ 0 if v ≡ 0. Note that precisely those r conditions of (3.2) that uniquely define J I,Id v ′ also appear on the right-hand side of the mapping (3.4).

Lemma 3.3
Let r ∈ N 0 .Suppose that Assumption 1 holds.Then the mapping Proof: As in the proof of Lemma 3.2, the absolute homogeneity and the triangle inequality are clear.Moreover, by an adaption of the k = 0 case of the proof of Lemma 3.1 we get the positive definiteness.
Recall from (3.1) the affine transformation T n mapping the reference interval (−1, 1] on an arbitrary mesh interval The chain rule yields where ϕ and T n are differentiated with respect to t and ϕ is differentiated with respect to t.The inverse of T n and its derivative are given by Suppose that Assumption 1 holds.Then there is a positive constant κ r,k , independent of τ n , such that Proof: Since the statement is obviously true for r = 0, we can assume r ≥ 1 in the following.By the fundamental theorem of calculus we have for t ∈ I n that where κ r,k is a positive constant that is independent of τ n . So, using the transformation T n given in (3.1), the last estimate together with integral transformations (which analogously also hold for the integrator) and the chain rule imply that for all ϕ ∈ P r I n , R d .This completes the proof.
Lemma 3.5 Let r ∈ N 0 .Suppose that Assumption 1 holds.Then there is a positive constant κ r , independent of τ n , such that for all ϕ ∈ P r (I n , R d ).
Proof: Similar to the proof of Lemma 3.4 the statement follows from Lemma 3.3 using the norm equivalence on the finite dimensional space P r (−1, 1], R d along with transformations via T n given in (3.1) between the reference interval (−1, 1] and the actual interval I n .
In the following, we assume that the inverse inequalities hold true where C inv > 0 is independent of τ n but may depend on l and r.The proof is standard and uses transformations to together with norm equivalences on finite dimensional spaces on the reference interval.
For the proof of the existence and uniqueness of solutions of (2.1) we need some more assumptions.

Assumption 2
The reference integrator I satisfies where as before k I ≥ 0 is the smallest integer such that I is well-defined on C k I ([−1, 1]).This means by transformation that holds for the local integrators I n , 1 ≤ n ≤ N .
where as before k I ≥ 0 is the smallest integer such that I is well-defined on C kI ([−1, 1]).This means by transformation that sup t∈In holds for the local approximation operators I n , 1 ≤ n ≤ N .

Assumption 4
For 0 ≤ i ≤ k J = max k 2 − 1, k I , k I , it holds for sufficiently smooth functions v, w that Here C 2 depends on k J and f .

Remark 3.6
Sufficient conditions for Assumption 4 would be, e.g., (i) for k J = 0: f satisfies a Lipschitz condition on the second variable with constant L > 0, (ii) for k J ≥ 1: f is affine linear in u, i.e., f (t, u(t)) = A(t)u(t) + b(t), and A(•) C k J < ∞.Then the inequality follows from Leibniz' rule for the ith derivative.
(iii) In the literature, see e.g.[11, p. 74], there also appear conditions of the form sup where f (i) denotes the ith total derivative of f with respect to t in the sense of [11, p. 65].These conditions may be weaker in some cases.
Since in general the constant C 2 is somewhat connected to the Lipschitz constant and, thus, to the stiffness of the ode system, the dependence on this constant shall be studied very thoroughly in the analysis.♣ Now, we shall investigate the solvability of the local problem (2.1).
Theorem 3.7 (Existence and uniqueness) Let r, k ∈ Z, 0 ≤ k ≤ r.We suppose that Assumptions 1, 2, 3, and 4 hold.Then there is a constant γ r,k > 0 multiplicatively depending on C −1 2 but independent of n such that problem (2.1) has a unique solution for all 1 ≤ n ≤ N when τ n ≤ γ r,k .
Proof: Since the single local problems can be solved one after another, it suffices to prove that (2.1) determines a unique solution for a given initial value U − n−1 .In order to show this, we use the auxiliary mapping g : and Since all these conditions are linear, g(U ) is uniquely determined by (3.8) if and only if we have the unique solvability of the correspondent homogeneous problem, i.e., find and Analogously to the proof of Lemma 3.1, we obtain that the conditions of (3.9) allow the trivial solution V ≡ 0 only.So the homogeneous problem is uniquely solvable and, thus, the mapping g is well-defined by (3.8).
It can be easily seen that for given U − n−1 every fixed point of g is a solution of the local problem (2.1) and vice versa.To get the existence of a unique fixed point, Banach's fixed point theorem shall be applied.Therefore, we will prove for τ n ≤ γ r,k with γ r,k > 0 to be defined that g is on So, let V, W ∈ P r (I n , R d ) and additionally assume V (t where κr,k := κ r , k = 0, κ r,k , k > 0. Here, for k ≥ 1 we also used that g(V ) − g(W ) (t + n−1 ) = V − W (t + n−1 ) = 0 by (3.8a).Involving (3.8b) and (3.8c) the sums (I) and (II) can be rewritten as Thus by Assumption 4 we conclude The appearing sums can be bounded by the inverse inequalities (3.7) in the following way The argumentation for the second sum is analogue.Altogether, we obtain We now consider the third sum of (3.10).Exploiting (3.8d) we gain that where we used test functions of the form ϕ(t) = (1 + T −1 n (t)) i e j , j = 1, . . ., d, in order to derive the component-wise identity needed here.Applying the estimate of Assumption 2 it follows where Leibniz' rule for jth derivative was exploited for the last inequality.Obviously, (1 + T −1 n (t)) i is a polynomial of degree i in t and one easily shows Moreover, by construction 1 + T −1 n (t) ∈ (0, 2] for all t ∈ I n .Hence, we get Now, involving Assumption 3 we conclude further that .
The backmost double sum C Σ,i can be simplified and estimated for example as follows By Assumption 4 we then obtain that So, applying the inverse inequalities (3.7) we conclude similar to the estimation of (I) and (II) that Now, combining (3.10), (3.11), and (3.13), we get Hence, for sufficiently small τ n ≤ γ r,k where γ r,k multiplicatively depends on C −1 2 but is independent of n, the mapping g is on contraction with respect to the supremum norm.By Banach's fixed point theorem we have the existence of a unique fixed point which is just the unique solution of the local problem (2.1).

Error analysis
In order to prove an error estimate we shall reuse the operator J I,I introduced in the previous section.It is used to define a local approximation on the interval I n .

Lemma 4.1 (Approximation operator)
Let r, k ∈ Z with 0 ≤ k ≤ r.Moreover, suppose that Assumption 1 holds.Then, the operator is well-defined.
Proof: The existence and uniqueness of the approximation is a direct consequence of Lemma 3.1.
For convenience, we define for v ∈ C kJ +1 (I, R d ) a global approximation operator J I,I by combining the local approximations on the mesh intervals, i.e., (J I,I v) , and suppose that Assumptions 1, 2, and 3 hold.Then for 1 ≤ n ≤ N and 0 ≤ l ≤ r we have for all Proof: To derive the estimate, an inverse inequality, which yields the factor τn 2 −l , is applied first.
The remaining term, sup t∈In J I,I n v(t) , can be bounded on the reference interval independent of τ n .The factors τn 2 j then result from transformation.
For the argument used to prove the error estimate we need additional assumptions.In detail, compared to Theorem 3.7 we replace Assumption 3 by Assumption 5a or 5b since derivatives can be handled given in certain points, but not their supremum.Furthermore, we exploit in the error analysis an auxiliary interpolation operator I app defined below, see Definition 4.4, which amongst others is based on these assumptions.
For brevity the Assumptions 5a and 5b below shall be stated directly for the local operators I n and I n .However, appropriate properties of I and I guarantee these assumptions by transfor- mation, cf.Assumptions 2 and 3.

Assumption 5b
For 1 ≤ n ≤ N , there are disjoint points t I m , m = 0, . . ., K I , in the reference interval [−1, 1] such that Note that then typically k I = max{ K I m : m = 0, . . ., K I }.Moreover, for 1 ≤ n ≤ N assume that there are disjoint points t I m , m = 0, . . ., K I , in the reference interval [−1, 1] such that Remark 4.3 Assumption 5a is satisfied if I n is a polynomial approximation operator whose defining degrees of freedom only use derivatives in certain points, as for example Hermite interpolation operators.
Together with Assumption 2 the term I n [I n ϕ] can be estimated by the supremum of ϕ and certain point values of derivatives of ϕ.However, Assumption 5a is not satisfied if I n = Id and k I > 0. In order to enable a similar estimate for I n [I n ϕ] also in this case, Assumption 5b is formulated.Here the requirements on the integrator I n are increased.Of course, the defining degrees of freedom for the integrator now should use derivatives in certain points only.In return, the requirements for I n can be weakened such that they are met for example also by I n = Id.♣

Definition 4.4 (Auxiliary interpolation operator)
For the error estimation we introduce a special Hermite interpolation operator I app n .Concretely, the operator should satisfy the following conditions: I app n preserves derivatives up to order k 2 − 1 in t − n and up to order k−1 Moreover, we demand that ) where the points t I m are those of Assumption 5a or 5b, respectively.If (4.2) and (4.3a) provide r app independent interpolation conditions and r app < r + 1, then we choose r + 1 − r app further points t I m ∈ (−1, 1) \ { t I j : j = 0, . . ., K I }, m = K I + 1, . . ., K I + r + 1 − r app , and demand where again t I n,m := tn+tn−1

2
+ τn 2 t I m .We agree that I app n is applied component-wise to vector-valued functions.So, overall the conditions (4.2) and (4.3) uniquely define an Hermite-type interpolation operator of ansatz order max{r app − 1, r}.Now, we are able to prove an abstract error estimate.
We suppose that Assumptions 1, 2, and 4 hold.Moreover, let Assumption 5a or 5b be satisfied.Denote by u and U the solutions of (1.1) and (2.1), respectively.Then we have for 1 ≤ n ≤ N , sufficiently small τ , and l = 0, 1 that where the constants C in general exponentially depend on the product of T and C 2 .
Proof: In order to shorten the notation of the proof, we set E n := sup t∈In u(t) − U (t) .Using the approximation J I,I u, which has been introduced directly below Lemma 4.1, we split the error in two parts Then the triangle inequality yields The first term on the right-hand side is the approximation error of the operator J I,I and shall be analyzed separately later on.It remains to study the second term.
Since ζ is continuously differentiable on each time interval I ν , ν = 1, . . ., N , we have for Here, the integrals and the modulus have to be applied component-wise.Furthermore, We start analyzing the integral term of (4.5).For 1 ≤ n ≤ N and because of ζ| In ∈ P r (I n , R d ) we can apply Lemma 3.4 to get .
The right-hand side terms are now studied separately.The summands of (I) are basically of the form τn n u and U , especially (4.1a), (4.1b) and (2.1b), (2.1c), together with the ith derivative of the ode system (1.1) yield for the appearing summands denotes the decremented upper limits of the sums in (I) depending on t.By Assumption 4 we obtain Exploiting the Hermite interpolation operator I app n that especially preserves derivatives up to order k 2 − 1 in t − n and up to order k−1 2), as well as invoking the inverse inequality (3.7), we find for 0 Overall this implies In order to bound (II) in (4.6) we first of all note that by definition of J I,I n u, especially (4.1c), we gain for all ϕ ∈ P r−k (I n , R d ) that where the initial value problem (1.1) and the continuity of u are used in the last identity.So, subtracting the variational equation of the local problem (2.1d) we obtain for all ϕ ∈ P r−k (I n , R d ) Using test functions of the form ϕ(t) = (1 + T −1 n (t)) i e j , j = 1, . . ., d, we get by a component-wise derivation that For the summands on the right-hand side, we consider two different cases.If Assumption 5a holds, we first conclude by Assumption 2 and Leibniz' rule for the jth derivative that where we refer for details regarding the derivation of the last identity to the proof of Theorem 3.7.
Then involving the estimate of Assumption 5a, we obtain that with C Σ,i from (3.12).A quite similar argumentation also works when only Assumption 5b holds.
In this case we gain So, either way applying Assumption 4 gives The terms that include derivatives can be estimated similar to (4.7) since by definition the interpolation operator I app n also preserves the particular derivatives at the particular points t I n,m , see (4.3a).Therefore, altogether this results in where also (4.1a) and the continuity of u was used.If k = 0 we rewrite the jump term as follows where we exploited (4.9).The integrator terms on the right-hand side can be bounded using already known estimates.Indeed, (4.11) with i = 0 yields Furthermore, combining Assumption 2, the inverse inequality (3.7), and (4.13) we gain Hence, in both cases we have Summarizing, we get from (4.4), (4.5), (4.13), and (4.14 Note that C is independent of T but especially depends multiplicatively on the Lipschitz constant of f (hidden in C 2 ).For τ n sufficiently small ( Cτ n /2 < 1), the E n term of the right-hand side can be absorbed on the left.After that the application of a variant of the discrete Gronwall lemma, see [15, Lemma 1.4.2,p. 14], applied with where we also used that u(t − 0 ) = u 0 = J I,I u(t − 0 ).This already is the wanted statement for l = 0. Especially note the exponential dependency on T and the Lipschitz constant of f (hidden in C, C 2 ).
Finally, we shall derive a bound for the first derivative of the error.Recalling the estimate for ζ ′ in (4.13) it follows Using the already known estimate for E n , the statement for l = 1 follows.

Remark 4.6
Based on Theorem 4.5 we can also prove abstract estimates for higher order derivatives of the error between the solution u of (1.1) and the discrete solution U of (2.1).Of course, we gain that where an inverse inequality was used.However, since we only have a non-local error estimate for sup t∈In (u − U )(t) we cannot expect that the inverse of the local time step length can be compensated in general.So, usually we additionally need to assume that τ ν ≤ τ ν+1 for all ν or alternatively that the mesh is quasi uniform (τ /τ ν ≤ C for all ν) to obtain a proper estimate.♣ Remark 4.7 In the proof of Theorem 4.5 stiffness of the problem would be critical at several points.Indeed, for large Lipschitz constants (hidden in C 2 and so in C) the needed inequality Cτ n /2 < 1 would force very small time step lengths.For semidiscretizations in space of time-space problems, where the Lipschitz constant is typically proportional to h −2 with h denoting the spatial mesh parameter, this would cause upper bounds on the time step length with respect to h similar to CFL conditions.Moreover, since the error constant C exponentially depends on C 2 , this constant would be excessively large for stiff problems such that the error estimate would be useless in this case.♣ Of course, Theorem 4.5 provides an abstract bound for the error of the variational time discretization method.However, the order of convergence still is not clear.Since I app is an Hermitetype interpolator of polynomial ansatz order larger than or equal to r its approximation order is at least r + 1.It remains to prove suitable bounds on the error of the approximation operator J I,I .
Here, P −1 (I n ) is interpreted as {0}, in which case the respective operator does not provide the corresponding approximation property.For convenience, set r I I := r I I,r−k .Note that r I ex ≥ r I,i ≥ r I and r I I,i ≥ r I hold by definition.
holds with a constant C independent of τ n .
Proof: The error estimate follows from standard approximation theory since J I,I n preserves under the given assumptions polynomials up to degree min{r, r I I +1}.Here also note the stability estimate for J I,I n , cf.Lemma 4.2, which motivates the upper summation bound j max,ř .
As we shall see below, compared to the pointwise estimate of Lemma 4.9, the estimate for the approximation error of J I,I in the mesh points t − n can be even improved in some cases.However, to this end we need some further knowledge on the approximation property connected with the quantity r I I,i .Besides, the respective result presented in the following lemma will be also used later in the superconvergence analysis.Lemma 4.10 Let r, k ∈ Z, 0 ≤ k ≤ r.Suppose that the Assumptions 2 and 3 hold.Moreover, assume that n for all j ∈ N 0 .Let ř ∈ N 0 and define j * min,i,ř := min{ř, r I I,i + 1}, j * max,i,ř := max{k I , k I , j * min,i,ř }.
Then, we have for ϕ ∈ C j * max,i,ř (I n , R d ) the bound with a constant C independent of τ n .
Proof: Let ϕ ∈ P min{ř−1,r I I,i } (I n , R d ) be arbitrarily chosen.We start rewriting the left-hand side of the wanted inequality as follows Since the second summand on the right-hand side vanishes by definition of r I I,i , we find Exploiting Assumption 2, the Leibniz rule for the jth derivative, and the given bound for ψ (j) i , we gain Further, using Assumption 3 which yields we conclude that for all ϕ ∈ P min{ř−1,r I I,i } (I n , R d ).Choosing ϕ as the Taylor polynomial of ϕ at (t n−1 + t n )/2 of degree min{ř − 1, r I I,i }, we have that So, overall it follows which completes the proof.Now, we can state and prove the improved estimate for the approximation error of J I,I in the mesh points t − n .
Proof: We start rewriting the term to be estimated.From the fundamental theorem of calculus and exploiting the definition (4.1) of J I,I n we gain The first difference on the right-hand side can be estimated by (4.15).We obtain In order to bound the second difference, we apply Lemma 4.10 with i = 0, ψ 0 ≡ 1.It follows Finally, if r I ex ≥ r − 1 the third difference vanishes since then (J e., J I,I n preserves polynomials up to degree r − 1, we obtain from combining (4.15) and (4.17) that So, overall we have shown that Therefore, recalling (4.17) which in some cases may provide a better estimate, we conclude the wanted statement.Here also note that always r Finally, summarizing the above results, we now want to list the proven convergence orders.for the W 1,∞ seminorm.However, this gives the same convergence order as (4.19).Since the quantity r I I = r I I,r−k used in the lemmas and the corollary above is quite abstract, we want to provide some lower bound for r I I,i based on the more familiar quantities r I , r I ex , and r I ex .However, for this result, we need to formulate some further assumptions.

Assumption 6
The operator I n is a projection onto the space of polynomials of maximal degree r I < ∞, i.e., I n : C kI (I n ) → P rI (I n ) and I n ϕ = ϕ for all ϕ ∈ P rI (I n ), or I n is the identity characterized by setting r I = ∞.

Assumption 7
Given ℓ ∈ N 0 , the identity holds for all ϕ ∈ C kI (I n ).Note that Assumption 6 implies (4.21) for ℓ = 0.If I n is either the identity or an Hermite-type interpolation operator then condition (4.21) holds for any ℓ ∈ N 0 .
We now give the lower bounds for r I I,i .Lemma 4.13 Let r, k ∈ Z, 0 ≤ k ≤ r, and i ∈ N 0 .Then, it always holds r I I,i ≥ r I .Supposing that Assumption 6 is fulfilled, we even get r I I,i ≥ max{r I , min{r I ex − i, r I,i }}.
If I n additionally satisfies Assumption 7 (with parameter ℓ = i), we simply have r I I,i ≥ max{r I , min{r I ex , r I ex } − i} since then r I,i ≥ max{r I , r I ex − i}.Furthermore, under the weaker assumption that I = I 1 • . . .• I l is a composition of several operators I j , 1 ≤ j ≤ l, that all by themselves satisfy the Assumptions 6 and 7 (with parameter ℓ = i), we still find r I,i ≥ min j∈Mi∪{l} max{r I j , r I j ex − i} where M i := j ∈ N : 1 ≤ j ≤ l − 1, max{r I j , r I j ex − i} < min j+1≤m≤l {r I m } .
Proof: Since by definition ϕ = I n ϕ for ϕ ∈ P rI (I n ), it also holds I n [ϕψ] = I n [(I n ϕ)ψ] for ϕ ∈ P rI (I n ) and sufficiently smooth ψ.This always implies r I I,i ≥ r I .Now, assume that min{r I ex − i, r I,i } > r I and let ϕ ∈ P min{r I ex −i,r I,i } (I n ).Then Assumption 6 yields that I n ϕ ∈ P rI (I n ) ⊂ P min{r I ex −i,r I,i } (I n ).Therefore, since I n is exact for polynomials up to degree r I ex and recalling the definition of r I,i , it follows Hence, r I I,i ≥ min{r I ex − i, r I,i }, if min{r I ex − i, r I,i } > r I .So, altogether under Assumption 6 we have shown that r I I,i ≥ max{r I , min{r I ex − i, r I,i }}.Finally, we assume that I = I 1 • . . .• I l where each I j satisfies the Assumptions 6 and 7. Let ϕ ∈ P ř (I n ) with ř ≥ 0 to be specified later and Because of (4.21), both terms in the last line vanish.Here note that actually (4.21) is only needed for I j n with j ∈ M ∞ ∪ {l} and ϕ ∈ P min{ř,min{r I m : j+1≤m≤l}} (I n ).Moreover, since also those summands of the penultimate line with max{r I j , r I j ex − i} ≥ min j+1≤m≤l {r I m } vanish.Hence, we obtain From this identity, we easily find that r I,i ≥ min j∈Mi∪{l} max{r I j , r I j ex − i} .Note that here Assumption 6 is exploited to guarantee that for a polynomial ϕ the degree of I j n ϕ is never greater than that of ϕ.
Proof: In order to prove the wanted statement, we firstly derive an estimate for the error e(t) = (u − U )(t) at t − n provided that we already have a suitable bound at t − n−1 .To this end, we adapt some basic ideas known from the superconvergence proof for collocation methods, see e.g.[9, Theorem II.1.5,p. 28].So, we consider the local discrete solution U In as solution U of the perturbed initial value problem , where def(t) := U ′ (t) − f (t, U (t)) denotes the defect.Since u solves (1.1), we find for all .
Here f is interpreted as function of (t, v) such that ∂ ∂v f example is the derivative of f with respect to the second (function) argument.Also note that we used for the last identity that Shortly, we have Therefore, the variation of constants formula, cf.e.g.[ where R(t, s) is the resolvent of the homogeneous differential equation y ′ (t) = ∂ ∂v f t, u(t) y(t) for initial values given at s.
Using that u is continuous as well as U for k ≥ 1 we conclude After splitting the right-hand side in an appropriate way, we shall study the single terms separately.First, for the term including e(t − n−1 ) we gain which implies Furthermore, the term including the remainder term rem(•) can be bounded as follows (5.8) Here, for the last step we also exploited that u(s) − se(s) is in a bounded neighborhood of u(s) for all s ∈ I n , s ∈ [0, 1], since by assumption e(s) ≤ C. Finally, the remaining terms are considered.A Taylor series expansion of R(t − n , s) with respect to s in t + n−1 gives This formula motivates the following decomposition .
We start with the second term and bound (II) as follows Because of (5.4) the last term on the right-hand side can be rewritten as an approximation error.Indeed, we have where we used the estimate of (5.3) with v(•) = f (•, U (•)).This results in In order to estimate (I) for 0 ≤ i ≤ r − k, we split the term as where (2.1d) was used for the last identity.The first given difference on the right-hand side can be bounded by (4.15).We gain, additionally applying Leibniz' rule for the jth derivative, Recalling (5.4), the last supremum could be further estimated by (5.3) Note that disregarding regularity aspects the choice ř = min{r, r, r I I + 1} in Lemma 5.2 is allowed.Therefore, we have Factoring out τ i n , the second term on the right-hand side of (5.10) can be analyzed by Lemma 4.10 with ϕ(t) = f (t, U (t)) and ψ i (t) = t−tn−1 τn i .Then we obtain where for abbreviation we set r ⊲⊳ i,r := min{r + i, r I I,i + i + 1, max{min{r, r I ex + 1}, min{r, r I I + 1} + i}}, j ⊲⊳ max,i,r := max{k J , min{r, r I I,i + 1}, min{r, r, r I I + 1}, min{r, r I ex + 1}}.
Combining (5.9) with the above estimates for (I) and (II), we gain for r sufficiently large Here note that because of r I,I var ≤ r I I,r−k + r − k = r I I + r − k the second term in the minimum on the right-hand side of (5.11) could be dropped.Therefore, incorporating (5.7), (5.8) From the well known inequality (1 + x) ≤ e x , it follows where we also used e(t − 0 ) = 0.It remains a small technical detail to verify that G ν can be uniformly bounded independent of τ ν .The term depends on partial derivatives of f , derivatives of R, and on the derivatives of the discrete solution U thus, potentially also on the mesh parameter.However, U can be uniformly bounded by assumption.So, we are done.

Remark 5.4
Using an alternative argument (inspired by the proof of [10, Theorem II.7.9, pp.212/213]) that is based on the application of the nonlinear variation-of-constants formula [10, Corollary I.14.6, p. 97], it can be shown that for 1 ≤ k ≤ r the term sup t∈I (u − U )(t) 2 in (5.5) is not necessary and can be dropped.However, for k = 0 the alternative proof is much more complicated and in general only guarantees a worse superconvergence estimate than Theorem 5.3.Moreover, for all k the notation gets more involved.♣ Lemma 5.5 Suppose that Assumption 1 holds along with an estimate similar to (5.3) that at least guarantees approximation order r − 1 for P I,I n (e.g. if r I I ≥ r − 2).In addition, let the solutions u of (1.1) and U of (2.1) satisfy sup t∈In (u − U )(t) ≤ C for some constant C independent of the mesh parameter.Then we have for all 0 ≤ l ≤ r.
Proof: We first of all note that the assumed estimate for the local error implies Using this as basis we can prove the wanted estimates by induction, exploiting the identity (5.4).For 0 ≤ l ≤ r − 1 we proceed as follows with some non-negative integer constants r − 1 ≤ j • min ≤ j • max .Now, according to a generalization of Faà di Bruno's formula, see [14,Theorem 2.1] for details, we know that qim where the sum is over the non-negative integer solutions Q = (q im ) 1≤i≤j, 0≤m≤d of q im = j (5.13) and Note that this formula especially implies that derivatives of U appear up to maximal order j.
Now, taking into account that d i dt i t = δ 1,i , i ≥ 1, the above summands can be bounded by qim where the double product can be further estimated as So, applying an inverse inequality multiple times, more precisely at most j times because of (5.13), we obtain Note that this also implies that after applying an inverse inequality at most j − l times, the right-hand side only depends on derivatives of U of order less than or equal to l. Therefore the summands on the right-hand side of (5.12) can be estimated (independent of τ n ) by terms that contain derivatives of U up to maximal order l.Hence, again by the induction hypotheses we gain the upper bound C(f, u).
Summarizing the above observations, our analysis guarantees the following estimates in the time mesh points.
where we refer to Corollary 4.12 for bounds on the right-hand side term.

Numerical experiments
We consider the initial value problem The appearing nonlinear systems within each time step were solved by Newton's method where we applied a Taylor expansion of the inherited data from the previous time interval to calculate an initial guess for all unknowns on the current interval.If higher order derivatives were needed at initial time t 0 = 0, we apply based on the ode system (1.1) and its derivatives.
By considering different choices for I n and I n , we will show that our theory provides sharp bounds on the convergence order.Since I n and I n are obtained from I and I via transformation, only the reference operators will be specified.
Each integrator I that has been used in our calculations is based on Lagrangian interpolation with respect to a specific node set P I .Hence, we have k I = 0.The interpolation operator I is of Lagrange-type and uses the node set P I .This means that k I = 0.Both node sets will be given for each of our test cases.Since often nodes of quadrature formulas are used, we will write for instance "left Gauss-Radau(k)" to indicate that the nodes of the left-sided Gauss-Radau formula with k points have been used.All upcoming settings fulfill Assumption 1.
For all test cases, the method VTD 6 3 , which is cGP-C 1 (6), was applied as discretization.All calculations were carried out with the software Julia [4] using the floating point data type BigFloat with 512 bits.Errors will be measured in the norms where • denotes the Euclidean norm in R d .

Case group 1
Case 1 Choosing integration and interpolation according to predictions are met by the numerical experiments.Moreover, in accordance with Corollary 5.6 the ℓ ∞ convergence order is also just 3.This means that the uniform boundedness of sup t∈I U (l) (t) required by Theorem 5.3, which cannot be guaranteed because of r I + 1 < r − 1, is violated since otherwise (5.5) would yield order 4.
The condition max{r I ex , r I I + 1} ≥ r − 1 of Corollary 4.12 will be fulfilled for all coming cases.Hence, the computations should show the convergence order given by (4.20) for the L ∞ norm.

Case group 2
This group of cases provides choices for P I and P I such that the L ∞ convergence order is limited by the maximum expression inside the outer minimum in (4.20).In addition, the presented cases will show that each of the three terms occuring in the maximum term can limit the convergence order.We will indicate in the following the limiting term in boldface.

Case 2a
The choices provide the L ∞ convergence order min{7, 6, 6, max{4, min{6, 5}}} = 5 where the convergence order is limited by the second term inside the inner minimum.We see from order of convergence is 6, i.e., one order higher than expected.This behavior can be explained by a closer look to Lemma 4.11.The proof there guarantees in this case that (v − J I,I n v)(t − n ) = 0 for all v ∈ P 5 (I n ).However, due to symmetry reasons, it holds In (v − J I,I n v) ′ (t) dt = 0 for all v ∈ P 6 (I n ) which implies that (v − J I,I n v)(t − n ) = 0 even for v ∈ P 6 (I n ).Thus, the convergence order of the limiting term is actually better than predicted.
Taking the same setting for P I but using P I = − 5 6 , − 13 23 , 1 10 , 12 17 , 4 5 , the convergence order predicted by (4.20) is again min{7, 6, 6, max{4, min{6, 5}}} = 5.The limitation is here also caused by the second argument of the inner minimum.Table 2 shows under case 2a* that this convergence order is obtained in the numerical experiments.Note that here the interpolation points just were chosen such that still r I I,0 = 5 > 4 = r I I , i.e., especially I (v − J I,I v) ′ = I v′ − I(v ′ ) = 0 for v( t ) = t 6 , but 1 −1 (v − J I,I v) ′ ( t ) d t = 0 for v( t ) = t 6 which ensures that the estimate of Lemma 4.11 is sharp.

Case group 3
This group of cases studies the convergence orders in the L ∞ norm and the W 1,∞ seminorm.The presented choices will show that each of the first three expressions in the outer minimum in (4.20) can bound the L ∞ convergence order.Moreover, the cases will demonstrate that the convergence order in the W 1,∞ seminorm can be limited by both occuring terms in (4.19).Again, the limiting numbers will be given in boldface.
Hence, the third argument in the outer minimum determines the convergence order for the L ∞ norm while the second argument of the minimum limits the convergence order of the W 1,∞ seminorm.
The numerical results in Table 3 provide the predicted convergence orders.
The convergence order in the W 1,∞ seminorm is again determinded by the second argument of the corresponding minimum.The limitation of the convergence order of the L ∞ norm is caused by the second term.We clearly see from Table 3 that the expected convergence orders are obtained by the numerical simulations.

Case group 4
This group of cases studies the superconvergence.Hence, we will restrict ourselves to cases where the convergence order in the ℓ ∞ norm suggested by (5.14) is stricly greater than the convergence order in the L ∞ norm given by (4.20).We will show for this situation that the first two arguments in the minimum in (5.14) and the first argument inside the maximum there can limit the ℓ ∞ convergence order.We remind that the limiting term will be written in boldface.
However, since the uniform boundedness of sup t∈I U (l) (t) assumed in Theorem 5.3 cannot be ensured by Lemma 5.5 due to r I + 1 = 3 < 5 = r − 1, we actually do not expect any superconvergence.These expectations are confirmed by the numerical results given in Table 5.They show that for both the L ∞ and the ℓ ∞ norm convergence order 4 is obtained.

Summary
The experimentally obtained and theoretically predicted convergence orders for all cases and all considered norms are collected in Table 6.The experimental orders of convergence were calculated using the results obtained for 256 and 512 time steps.

U
min,i,r +jds j * min,i,r +j f (s, U (s))with a constant C independent of τ n where j * min,i,r = min{r, r I I,i + 1}, j * max,i,r = max{k I , k I , j * min,i,r }. (j+1) (s) + sup s∈In d j ds j f (s, U (s))

) ≤ sup t∈In d l dt
l f (t, U (t)) + sup t∈In d l dt l Id − P I,I n f (t, U (t)) .The first term on the right-hand side contains derivatives of U up to maximal order l, so can be bounded by C(f, u) due to the induction hypothesis.It remains to study the second term.Using (5.3) we obtain sup t∈In d l dt l Id − P I,I n f (t, U (t)) ≤ C j f (t, U (t)) (5.12)

Corollary 5. 6
Let r, k ∈ Z, 0 ≤ k ≤ r.Suppose that Assumptions 1, 2, 3, and 4 hold.Moreover, let Assumption 5a or 5b be satisfied.Let u and U denote the solutions of (1.1) and (2.1), respectively.Then, if r I I ≥ r − 2, we have for 1 ≤ n ≤ N (u − U )(t − n ) ≤ C τ min{2r−k+1,r I,I var +1,max{r I ex +1,min{r,r I I +1}}} + δ 0,k τ 2r I I +4 , (5.14) with r I,I var := min 0≤i≤r−k {r I I,i + i}, r I I = r I I,r−k , and r I I,i as defined in Definition 4.8.If r I I < r − 2, we in general cannot ensure the uniform boundedness of U and its derivatives since Lemma 5.5 does not hold.Then we only have see (2.1) and directly below Lemma 4.1.Hence, by the triangle inequality it follows CC 2 ≤ n ≤ N .Next, we analyze the jump term of (4.5).First, we have a closer look at ζ n−1 for 1 ≤ n ≤ N .There are two cases.If k ≥ 1, the discrete solution U is globally continuous due to (2.1a).So, and l ∈ {0, 1}.Suppose that Assumptions 1, 2, 3, and 4 hold.Moreover, let Assumption 5a or 5b be satisfied.Denoting by u and U the solutions of (1.1) and (2.1), respectively, we have for 1 ≤ n ≤ N Cτ min{r+1, r I I as defined in Definition 4.8.If we furthermore suppose that max{r I ex , r I I + 1} ≥ r − 1, then we even have sup t∈In (u − U )(t) ≤ ex , r I I + 1} ≥ r − 1 is satisfied, we obtain formally sup t∈In (u − U ) ′ (t) ≤ Cτ min{r, r I I +1, r I I,0 +1, max{r I ex +1,min{r,r I I +1}}} n

Table 1 :
Corollary 4.12 is violated and the expected convergence orders for both the L ∞ norm and the W 1,∞ seminorm are given by min{r, r I I + 1} = 3, see(4.19).It can be seen from Table1that the theoretical Errors and convergence orders in different norms for case 1.
ex = 3 and r I I = 2. Hence, the condition max{r I ex , r I I + 1} ≥ r − 1 in

Table 2 :
Table 2 that the experimental Errors and convergence orders in the L ∞ norm for the cases of group 2.