## 1 Introduction

The numerical solution of deterministic dynamical systems is an important task in many applications where the dynamical system is a spatiotemporal field that satisfies a partial differential equation (PDE). In this case, the field can be viewed as a function u mapping to an infinite-dimensional real separable Banach space $$(V,\left| \cdot \right| _V)$$, and the dynamical system is described by a deterministic operator differential equation initial value problem on a finite time interval [0, T] for some initial condition $$\vartheta$$:

\begin{aligned} u(0)=\vartheta ,\quad u'(t)=f(t,u(t)), \quad t\in [0,T]. \end{aligned}

Operator differential equations have been applied in peridynamics and elastic materials, e.g. [13, 25]. The purpose of this paper is to analyse the error of randomised time integration methods for solving such initial value problems. The methods are of the form

\begin{aligned} U_{k+1}:=\psi (h,U_{k})+\xi _k(h),\quad k\in \{0,\ldots ,N-1\}, \end{aligned}

where $$\psi (h,U_{k})$$ represents the output of a deterministic time integration method with time step h corresponding to the input $$U_{k}$$, and $$\xi _k(h)$$ is a V-valued random variable whose distribution depends on h. Our motivation for considering these methods comes from Bayesian inverse problems.

In many applications, the initial value problem depends on a parameter $$\theta ^*$$—for example, the initial condition $$\vartheta$$, or a parameter appearing in the vector field f—and it is of interest to infer the value of $$\theta ^*$$ given some observational data y, where y results from some fixed measurement process. Let $$\varTheta$$ and $${\mathcal {Y}}$$ denote the set of feasible parameter values and the set of feasible data values respectively. We assume that $$\varTheta$$ is a Banach space and $${\mathcal {Y}}$$ is a Hilbert space. Let S denote the solution operator that maps every $$\theta '\in \varTheta$$ to the solution of the corresponding initial value problem, and let O denote the observation operator that maps every continuous trajectory in V to the corresponding output $${\tilde{y}}\in {\mathcal {Y}}$$ of the fixed measurement process. Then the inference problem is to determine the value of the unknown true parameter $$\theta ^*$$ given noisy data of the form

\begin{aligned} y=O\circ S(\theta ^*)+\eta , \end{aligned}

where $$\eta$$ is often assumed to be a centred Gaussian random variable with known, positive-definite covariance operator $$\Gamma$$. In general, the inverse problem is ill-posed, and one can apply deterministic or statistical approaches to solving the inverse problem.

In the Bayesian approach to inverse problems, one assumes that $$\varTheta$$ can be equipped with a probability measure $$\mu _{0}$$, called the ‘prior’. Let $$G:=O\circ S:\varTheta \rightarrow {\mathcal {Y}}$$ denote the parameter-to-observable map. The Bayesian solution to the inverse problem is given by the ‘posterior’ probability measure $$\mu ^{y}$$ on $$\varTheta$$, which satisfies

\begin{aligned} \mu ^{y}(\mathrm {d}\theta ')=\frac{1}{Z(y)} \exp \left( -\frac{1}{2}\left\| y-G(\theta ') \right\| _{\Gamma }^{2}\right) \mu _{0}(\mathrm {d}\theta ') \end{aligned}

where $$\left\| x \right\| _{\Gamma }^{2}=\left\langle x,\Gamma ^{-1} x \right\rangle _{{\mathcal {Y}}}$$ and Z(y) is a normalisation constant. The posterior is important because one can use it to perform uncertainty quantification for the unknown parameter $$\theta ^*$$. See [34, Section 2.4] for a presentation of the Bayesian approach to inverse problems posed on vector spaces.

For many differential equations arising in applications, one must approximate the exact solution operator S using another operator $${\tilde{S}}$$ that results from a discretisation of the initial value problem. This leads to an approximation $${\tilde{G}}:=O\circ {\tilde{S}}$$ of the parameter-to-observable map, which in turn leads to an approximation $${\tilde{\mu }}^{y}$$ of the exact posterior $$\mu ^{y}$$ defined above. For a fixed data vector y and prior $$\mu _{0}$$, the error in $${\tilde{S}}$$ is propagated via Bayes’ theorem to an error in $${\tilde{\mu }}^{y}$$. Since the posterior is fundamental for performing inference on the unknown parameter $$\theta ^*$$, one seeks a principled way to take into account the discretisation error in $${\tilde{S}}$$.

Under some assumptions, a bound on the error $$G-{\tilde{G}}$$ with respect to some appropriate norm can be used to prove a bound on the error in the posterior, as measured by the Hellinger metric, e.g. [34, Corollary 4.9]. Stability bounds of this type ensure that the approximate posterior $${\tilde{\mu }}^{y}$$ converges in the Hellinger metric to the exact posterior $$\mu ^{y}$$, in the limit as the discretisation error vanishes. While this property ensures that we can ignore the error in the posterior in the limit of increasingly finer discretisations, it does not indicate how to treat the error in the posterior for a fixed discretisation.

One approach is to ignore the discretisation error. This approach is not ideal from the point of view of statistical inference, because the approximate posterior $${\tilde{\mu }}^{y}$$ can be tightly concentrated around the wrong parameter values, even in the small-noise limit. This phenomenon of ‘overconfidence’ is undesirable for uncertainty quantification. See Sect. 1.1 below.

The approach presented in  approaches the problem of accounting for the discretisation error, by applying the standard procedure of using random variables as proxies for unknown quantities. Let $$\psi (h,v)$$ denote the output of applying a time integration method for time step h to the state v, for a fixed time step $$h=T/N>0$$, $$N\in \mathbb {N}$$. Consider the error $$u(h)-\psi (h,u(0))$$ between the exact solution and the numerical solution, incurred over one time step. Since the one-step error is unknown, we model it using a random variable $$\xi _0(h)$$. Thus,

\begin{aligned} u(h)\approx \psi (h,u(0))+\xi _0(h) =:U_1. \end{aligned}

If we model the one-step error for subsequent steps in a similar way, then this leads to the randomised time integration methods stated at the beginning of this section.

### 1.1 Illustration of overconfidence phenomenon

Consider the standard heat equation on a bounded domain $$D\subset \mathbb {R}^{d}$$ with homogeneous Dirichlet boundary conditions, written as the operator differential equation

\begin{aligned} u(0)=\vartheta \in H,\quad u'(t)+A u(t)=0,\quad t\in [0,h], \end{aligned}

where A is the negative Laplacian, $$H=L^2(D)$$, and $$h>0$$. In [34, Section 3.5], one considers the inverse problem of inferring the initial condition $$\vartheta$$ from a noisy observation of the solution at a later time. We shall use the assumptions and the approach stated there. The parameter-to-observable map is $$G:H\rightarrow H$$, $$v\mapsto e^{-hA}v$$. The data y is a realisation of the random variable

\begin{aligned} Y=G(\vartheta )+\delta ^{1/2}\eta =G\vartheta +\delta ^{1/2}\eta \end{aligned}

where the noise $$\eta$$ is a Gaussian random variable with distribution $${\mathcal {N}}(0,\Gamma_{\textup{obs }})$$. The noise scaling $$\delta$$ is assumed to be known, and the small noise limit corresponds to $$\delta \rightarrow 0$$. For the unknown parameter $$\vartheta$$, we use the Gaussian prior $$\mu _0={\mathcal {N}}(m_0,\Gamma _{0})$$. The positive-definite covariance operators $$\Gamma_{\textup{obs}}$$ and $$\Gamma_{0}$$ are chosen so that 1) draws from $${\mathcal {N}}(0,\Gamma_{\textup{obs} })$$ and from $$\mu _{0}$$ are H-valued, almost surely; and 2) $$\Gamma_{0}$$ is an appropriate negative fractional power of A. Applying [34, Theorem 6.20] to the jointly Gaussian random variable $$(U,G(U)+\delta ^{1/2}\eta )$$ with $$U\sim \mu _0$$ yields the Gaussian posterior measure $$\mu ^{y}$$ with mean and covariance

\begin{aligned} m&= m_0+\Gamma_{0} G (\delta \Gamma_{\textup{obs} }+G\Gamma _{0} G )^{-1} (y-Gm_0) \\ {\mathcal {C}}&= \Gamma_{0}-\Gamma_{0}G (\delta \Gamma_{\textup{obs} }+G\Gamma_{0} G )^{-1}G\Gamma_{0}. \end{aligned}

In the $$\delta \rightarrow 0$$ limit, $$y\rightarrow G\vartheta$$. Using this fact and the assumptions on $$\Gamma_{0}$$, it follows that $${\mathcal {C}}\rightarrow 0$$ and $$m\rightarrow \vartheta$$ in the $$\delta \rightarrow 0$$ limit. Since Gaussian measures are completely characterised by their mean and covariance, the convergence of $${\mathcal {C}}$$ and m implies the weak convergence (in the sense of probability measures) of the posterior measure to the Dirac measure at the true initial condition $$\vartheta$$ as $$\delta \rightarrow 0$$. This convergence captures the concentration of the posterior $$\mu ^y$$ around the true unknown $$\vartheta$$, and validates the Bayesian approach to the inverse problem.

Now suppose we approximate G using the map $${\tilde{G}}$$ defined by a single step of the implicit Euler method, $${\tilde{G}}:H\rightarrow H$$, $$v\mapsto (I+hA)^{-1} v$$. Applying [34, Theorem 6.20] as we did earlier with $${\tilde{G}}$$ instead of G yields the associated approximate posterior $${\tilde{\mu }}^{y}$$, which is Gaussian with mean and covariance

\begin{aligned} {\tilde{m}}&= m_0+\Gamma_{0} {\tilde{G}} (\delta \Gamma_{\textup{obs} }+{\tilde{G}}\Gamma_{0} {\tilde{G}} )^{-1} (y-{\tilde{G}}m_0) \\ \tilde{ {\mathcal {C}}}&= \Gamma_{0}-\Gamma _{0}{\tilde{G}} (\delta \Gamma_{\textup{obs} }+{\tilde{G}}\Gamma_{0} {\tilde{G}} )^{-1}{\tilde{G}}\Gamma_{0}. \end{aligned}

In the $$\delta \rightarrow 0$$ limit, $$\tilde{{\mathcal {C}}}\rightarrow 0$$, but $${\tilde{m}}\rightarrow {\tilde{G}}^{-1}G \vartheta \ne \vartheta$$. Thus, the approximate posterior $${\tilde{\mu }}^{y}$$ converges weakly in the small noise limit to a biased Dirac measure. This demonstrates the overconfidence phenomenon. The bias $${\tilde{G}}^{-1}G\vartheta -\vartheta$$ in the limiting Dirac measure is the local truncation error of the implicit Euler method.

To address the overconfidence phenomenon, we use a random variable as a proxy for the unknown bias. Consider the randomised implicit Euler method given by $${\widehat{G}}(v):={\tilde{G}}v+h^{p+1}\zeta$$, where $$\zeta \sim {\mathcal {N}}(0,\Gamma_{1})$$ is independent of the observation noise $$\eta$$, and $$\Gamma_{1}$$ is chosen so that draws from $${\mathcal {N}}(0,\Gamma_{1})$$ are H-valued almost surely. By rewriting $${\widehat{G}}(U)+\delta ^{1/2}\eta ={\tilde{G}}U+(h^{p+1}\zeta +\delta ^{1/2}\eta )$$ and applying [34, Theorem 6.20], it follows that the associated deterministic posterior $${\widehat{\mu }}^{y}$$ is Gaussian, with mean and covariance

\begin{aligned} {\widehat{m}}&= m_0+\Gamma_{0} {\tilde{G}} (\delta \Gamma_{\textup{obs} }+h^{2p+2}\Gamma_{1}+{\tilde{G}}\Gamma_{0} {\tilde{G}} )^{-1} (y-{\tilde{G}}m_0) \\ \widehat{ {\mathcal {C}}}&= \Gamma_{0}-\Gamma _{0}{\tilde{G}} (\delta \Gamma_{\textup{obs} }+h^{2p+2}\Gamma_{1}+{\tilde{G}}\Gamma_{0} {\tilde{G}} )^{-1}{\tilde{G}}\Gamma_{0}. \end{aligned}

In the $$\delta \rightarrow 0$$ limit, $${\widehat{C}}$$ does not converge to zero, because of the additional $$h^{2p+2}\Gamma_{1}$$ term. However, in the limit as $$h,\delta \rightarrow 0$$, the bias $${\tilde{G}}^{-1}G\vartheta -\vartheta$$ associated to $${\tilde{G}}$$ vanishes. Hence $${\widehat{m}}\rightarrow \vartheta$$ and $${\widehat{C}}\rightarrow 0$$. For fixed $$h>0$$, the additional $$h^{2p+2}\Gamma_{1}$$ term ensures that the deterministic approximate posterior $${\widehat{\mu }}^{y}$$ associated to the randomised implicit Euler method $${\widehat{G}}$$ is more ‘spread out’ than the approximate posterior $${\tilde{\mu }}^{y}$$ associated to the non-randomised implicit Euler method $${\tilde{G}}$$. In this way, the problem of overconfidence is mitigated.

### 1.2 Main contributions

In this paper, we rigorously prove strong forward error bounds for randomised one-step time integration methods applied to operator differential equations. Our work builds on the approach for proving the error bounds in $$L^2$$ of [10, Theorem 2.2] and the error bounds in $$L^R$$—for user-specified $$R\in \mathbb {N}$$—of [20, Theorem 3.5]. These bounds were stated for initial value problems formulated in $$\mathbb {R}^{d}$$, where the associated exact flow maps are globally Lipschitz, and where the randomised time integrators are generated using uniform time grids and numerical methods $$\psi$$ that satisfy a uniform local truncation error assumption.

The error bounds that we prove in this paper generalise the existing error bounds in multiple aspects. Our bounds are valid for time-dependent vector fields, non-uniform time grids (i.e. variable time steps), and operator differential equations that are formulated on Banach spaces or on Gelfand triples. In Theorem 3.7, we show that one can obtain strong error bounds in $$L^R$$ for $$R>1$$, without the assumption of uniform local truncation error of the numerical method, and without the assumption that the flow map of the initial value problem is globally Lipschitz. In fact, we show that one can obtain strong error bounds in more general Orlicz norms. The bounds that we prove in this paper demonstrate that the paradigm of randomised time integration extends in a natural way to the time integration for PDEs with time-dependent coefficients. Moreover, the proofs we give for our main results are simpler than the proofs of the corresponding results given in .

A related but distinct contribution that we make is to consider the setting where the random variables used in the randomisation are independent and centred. We generalise the $$L^2$$ uniform error bound [20, Theorem 3.4] for centred and independent randomisation—which was proven in the setting of ODEs in $$\mathbb {R}^{d}$$—to the setting of operator differential equations on Gelfand triples, under weaker assumptions on the time integration map $$\psi$$. We address the question of whether it is possible to obtain better error bounds under these additional assumptions. This question was implicit in the analysis of , but was not addressed there.

### 1.3 Related work

Randomised time integration methods for differential equations have been studied extensively in the context of ‘probabilistic numerics’. For some reviews of research in this area, see [9, 17, 27]. In probabilistic numerics, ODEs have been considered from many perspectives, including structure- or symmetry-preserving methods [1, 40], Bayesian modelling of the unknown solution with Gaussian processes [5, 10, 33, 36, 38, 40], data-based statistical estimation of discretisation error [24, 35], and filtering [19, 38]. The papers [10, 20] cited earlier also belong to this context. For PDEs, methods based on Bayesian inference and Gaussian processes [6, 8, 10, 28, 31, 39], multiscale techniques , and random meshes  have been studied. The research area of ‘information field dynamics’ [11, 14] also considers probabilistic simulation schemes for PDEs by using Gaussian processes and information theoretic ideas.

Random approximate posteriors arising from randomised solution operators for differential equations have been studied in [21, Section 5] under a strong assumption of exponentially integrable discretisation error $$S-{\tilde{S}}$$, and more recently under a weaker square integrability hypothesis in .

Two aspects differentiate the problem we consider from the problems considered in numerical methods for stochastic evolution equations. The most important aspect is that the operator differential equation of interest in this paper is deterministic. Thus, our context is fundamentally different from the context of numerical integration methods for stochastic differential equations and numerical integration methods for random differential equations. The second aspect is that the random variables used in the randomisation need not be constructed using i.i.d. copies of a Wiener process or Lévy process.

### 1.4 Overview

We introduce notation and some recurring objects in the next section. In Sect. 2, we consider the setting where the initial value problem is formulated on a Banach space. The main result is the strong error bound in Orlicz norm proven in Theorem 2.8 under the assumption of uniform local truncation error of the time integration method $$\psi$$.

In Sect. 3, we consider the setting where the initial value problem is formulated on a Gelfand triple, and where $$\psi$$ satisfies a weaker local truncation error assumption. This setting is considered in the variational approach to PDEs. We prove strong $$L^2$$ error bounds for mutually independent and centred randomisation in Sect. 3.1. In Sect. 3.2, we discuss the feasibility of obtaining $$L^R$$ bounds for $$R>2$$ that are of the same order in the time step h, under the same assumptions of independence and centredness. In Sect. 3.3, we state in Theorem 3.7 a strong error bound in Orlicz norm without assuming independence or centredness.

In Sect. 4, we show that the assumptions we make in Sect. 3 are reasonable for a class of operator differential equations that includes the heat equation on a $$C^2$$ bounded domain.

We conclude in Sect. 5. In the appendices, we collect material that is useful for the main part of the paper.

### 1.5 Notation and setup

Below, $$(V,\left| \cdot \right| _V)$$ and $$(H,\left\langle \cdot ,\cdot \right\rangle _H)$$ denote a real separable Banach space and a real separable Hilbert space respectively. We write $$\left| \cdot \right| _H$$ for the Hilbert space norm. All integrals are Bochner integrals unless otherwise stated. We define $$C^1([0,T];V) :=\{ u \in C([0,T];V) \, | \, u' \in C([0,T];V)\}$$ and equip it with the norm $$\left\| u \right\| _{1,\infty } = \left\| u \right\| _{\infty } + \left\| u' \right\| _{\infty }$$ where $$\left\| u \right\| _{\infty }:=\sup _{t\in [0,T]}\vert u(t) \vert _V$$. We define the space $$C^1([0,T]; H)$$ analogously.

All random variables will be defined on a common probability space $$(\varOmega ,{\mathcal {F}},\mathbb {P})$$. We denote expectation with respect to $$\mathbb {P}$$ by $$\mathbb {E}[\cdot ]$$ and write $$X\sim \mu$$ to mean that X has $$\mu$$ as its distribution. For a V-valued random variable X and $$R\ge 1$$, we shall write $$\left\| X \right\| _{L^R(\varOmega ;V)}:=\mathbb {E}[\vert X \vert _V^R]^{1/R}$$. Similarly, if X is H-valued, then $$\left\| X \right\| _{L^R(\varOmega ;H)}:=\mathbb {E}[\left| X\right| _H^R]^{1/R}$$. For a Young function $$\varPsi :\mathbb {R}_{\ge 0}\rightarrow \mathbb {R}_{\ge 0}$$, the corresponding Orlicz normFootnote 1$$\left\| \cdot \right\| _{\varPsi }$$ of a $$\mathbb {R}$$-valued random variable Z is defined by

\begin{aligned} \left\| Z \right\| _{\varPsi }:=\inf \{k\in (0,\infty )\ :\ \mathbb {E}[\varPsi(\vert Z \vert /k)]\le 1\}. \end{aligned}

If Z is a V-valued (respectively, H-valued) random variable, then $$\left\| Z \right\| _{\varPsi (\varOmega ;V)}:=\left\| \vert Z \vert _V \right\| _{\varPsi }$$ (resp. $$\left\| Z \right\| _{\varPsi (\varOmega ;H)}:=\left\| \vert Z \vert _H \right\| _{\varPsi }$$). The $$\left\| \cdot \right\| _{\varPsi (\varOmega ;V)}$$ norm includes as a special case the $$\left\| \cdot \right\| _{L^R(\varOmega ;V)}$$ norm when $$R>1$$, but not when $$R=1$$. The analogous statement holds for the $$\left\| \cdot \right\| _{\varPsi (\varOmega ;H)}$$ norm. An important choice of Young function $$\varPsi$$ is given by $$\varPsi _2(z):=\exp (z^2)-1$$, because finiteness of $$\left\| X \right\| _{\varPsi _2}$$ implies that X is sub-Gaussian and hence exponentially square integrable.

We write $$p\wedge q=\min \{p,q\}$$ for $$p,q\in \mathbb {R}$$. For $$h>0$$, $$p\ge 0$$, and $$a=a(h)\in \mathbb {R}$$, we write $$a={\mathcal {O}}(h^p)$$ to mean that $$\vert a \vert \le Ch^p$$ for some h-independent term $$C>0$$. Given $$N\in \mathbb {N}$$, $$[N]:=\{1,\ldots ,N\}$$ and $$[N]_0:=[N]\cup \{0\}=\{0,1,\ldots , N\}$$.

Throughout the paper, we consider the following initial value problem on a deterministic time interval [0, T],

\begin{aligned} u(0)=\vartheta ,\quad u'(t)=f(t,u(t)),\quad t\in [0,T] \end{aligned}
(1.1)

for fixed $$T>0$$ and suitable initial condition $$\vartheta$$. We specify the domain and codomain of f in the following sections. We denote by $$\varphi$$ the exact flow map associated to (1.1) as follows: for suitable $$h\in [0,T]$$, $$t\in [0,T-h]$$, and $$u_s$$,

\begin{aligned} \varphi (h,t,u_s)=u_s+\int _{t}^{t+h}f(\tau ,\varphi (\tau ,t,u_s))\,\mathrm {d}\tau . \end{aligned}
(1.2)

We equip the time interval [0, T] in (1.1) with a time grid $$(t_k)_{k\in [N]_0}$$, where

\begin{aligned} 0=:t_0<t_1<\cdots <t_N:=T,\quad h_{k}:=t_{k+1}-t_k,\quad h:=\max _{k\in [N-1]_0}h_k. \end{aligned}
(1.3)

From (1.3) it follows that for any $$\tau \ge 0$$,

\begin{aligned} \sum _{\ell \in [N-1]_0}h_\ell ^{\tau +1}\le h^\tau \sum _{\ell \in [N-1]_0}h_\ell =h^\tau T. \end{aligned}
(1.4)

Given (1.2), the exact sequence $$(u(t_k))_{k\in [N]_0}$$ associated with the time grid satisfies

\begin{aligned} u(t_{k+1})=\varphi (h_k,t_k,u(t_k)),\quad k\in [N-1]_0. \end{aligned}
(1.5)

We denote by $$\psi$$ the approximate flow map associated to a time integration method, and define a deterministic approximating sequence $$(u_k)_{k\in [N]_0}$$ by

\begin{aligned} u_{k+1} :=\psi (h_k, t_k,u_k),\quad u_0=\vartheta . \end{aligned}

Let $$(\xi _k)_{k\in \mathbb {N}_0}$$ be a sequence of stochastic processes, where each $$\xi _k$$ is a stochastic process on $$[0,\infty )$$. In Sect. 2 (respectively, Sect. 3), each $$\xi _k$$ takes values in the Banach space V (resp. the Hilbert space H). Given the time grid in (1.3), we use $$(\xi _k(h_k))_{k\in [N-1]_0}$$ as a randomisation sequence in order to define the random approximating sequence $$(U_k)_{k\in [N]_0}$$ by

\begin{aligned} U_{k+1} :=\psi (h_k,t_k,U_k) + \xi _k(h_k),\quad k\in [N-1]_0 \end{aligned}
(1.6)

for a given random variable $$U_0$$. The sequence of errors $$(e_k)_{k\in [N]_0}$$ of the random approximating sequence (1.6) with respect to the exact sequence (1.5) is defined by

\begin{aligned} e_0=u(0)-U_0,\quad e_{k+1}:=u(t_{k+1})-U_{k+1},\quad k\in [N-1]_0. \end{aligned}

By (1.5) and (1.6), we obtain

\begin{aligned} e_{k+1}=\varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)-\xi _k(h_k),\quad k\in [N-1]_0. \end{aligned}
(1.7)

The equation (1.7) shall be the starting point for our error analysis.

## 2 Classical setting

In this section, we prove the generalisation of [10, Theorem 2.2] and [20, Theorem 3.5] to the setting of a time-dependent vector field f on an infinite-dimensional, real, separable Banach space V. We assume that the vector field f in (1.1) satisfies $$f :[0,T] \times V \rightarrow V$$. In addition, we assume that for every initial condition $$\vartheta \in V$$, there exists a unique classical solution $$u \in C^1([0, T];V)$$. For example, if f is continuous and uniformly Lipschitz in the second argument, then this assumption is satisfied, and $$\varphi$$ exists [12, Satz 7.2.6].

We state the assumptions needed to prove the main result of this section. The first is a Lipschitz continuity assumption on the exact flow map.

### Assumption 2.1

The exact flow map $$\varphi$$ admits a constant $$L_\varphi >0$$ such that for any $$t\in [0,T]$$, for every $$h\ge 0$$ such that $$t+h\le T$$, and for every $$x,y\in V$$,

\begin{aligned} \left| \varphi (h,t,x)-\varphi (h,t,y)\right| _V \le (1+L_\varphi h) \left| x-y\right| _V. \end{aligned}

If f is uniformly Lipschitz in the second argument, then Assumption 2.1 is satisfied [12, Satz 7.3.4].

Ideally, the deterministic sequence $$(u_k)_k$$ approximates the exact sequence $$(u(t_k))_k$$ well. We make this precise by introducing the following uniform local truncation error assumption.

### Assumption 2.2

The approximate flow map $$\psi$$ admits constants $$0<h^*<\infty$$, $$0<C_{\varphi ,\psi }<\infty$$, and $$q\ge 0$$, such that for all $$0<h\le h^*$$,

\begin{aligned} \sup _{\begin{array}{c} v \in V\\ t \in [0,T-h] \end{array}} \left| \varphi (h,t,v) - \psi (h,t,v) \right| _V \le C_{\varphi ,\psi } h^{q+1} \; . \end{aligned}

The parameter $$h^*$$ is included in order to account for implicit time integration methods that provide a unique output whenever the time step is small enough. In order to achieve an order of $$q\ge 1$$ for the truncation error, one usually requires higher regularity of f or equivalently higher regularity for the solution u [16, Section III.2, Theorem 2.4]. For classical one-step methods, the corresponding analysis extends to infinite-dimensional Banach spaces; see Appendix B.

The assumptions above are similar to [10, Assumption 2] and [20, Assumption 3.1, 3.2]. Note that Assumption 2.2 is restrictive, because it requires uniformity in t and v. For example, in , the analogous assumption is justified under the assumption that $$f :\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}$$ is sufficiently smooth and sufficiently many of its derivatives are uniformly bounded. However, Assumption 2.2 is not satisfied in general. For example, in the setting where the operator differential equation is given by $$u'(t)=Au(t)\in H$$ for a Hilbert space H and the infinitesimal generator A of an analytic semigroup with domain $$\textup{Dom} (A)$$, and $$\psi$$ is given by the implicit Euler method, there exists $$C>0$$ such that for all $$\vartheta \in \textup{Dom} (A)$$, $$n\in \mathbb {N}$$, and all sufficiently small $$h>0$$,

\begin{aligned} \left| \varphi (nh,0,\vartheta )-\psi (nh,0,\vartheta )\right| _H\le C h\left| A\vartheta \right| _H, \end{aligned}

see [37, Theorem 7.1].

For equations of the form (1.1) derived from PDEs and fixed time argument t, the right-hand side f is in many cases not Lipschitz from V to V. Furthermore, one cannot in general expect that (1.1) admits a classical solution $$u\in C^1([0,T];V)$$, because a classical solution requires regularity assumptions on the problem data that need not hold in general. In Sect. 3, we will consider vector fields f that do not satisfy the assumptions above. This will lead us to consider variational solutions of (1.1). There are other approaches to generalise the classical setting to problems with less regularity, e.g. mild solutions, but they are outside the scope of this paper.

### 2.1 Randomisation sequence

Recall the random approximating sequence $$(U_k)_k$$ defined in (1.6). In this section, we shall assume that each $$\xi _k$$ is a V-valued stochastic process indexed by $$[0,\infty )$$, and we shall assume $$U_0$$ is a V-valued random variable. Below, we shall impose the following regularity assumption on the $$(\xi _k)_{k\in \mathbb {N}_0}$$.

For the remainder of Sect. 2, we shall shorten notation and write $$\left\| Z \right\| _{\varPsi }$$ instead of $$\left\| Z \right\| _{\varPsi (\varOmega ;V)}$$ for any V-valued random variable Z.

### Assumption 2.3

The collection $$(\xi _k)_{k\in \mathbb {N}_0}$$ admits an Orlicz norm $$\left\| \cdot \right\| _{\varPsi }$$ and constants $$p\ge 0$$ and $$0<C_\xi <\infty$$, such that for all $$k\in \mathbb {N}_0$$ and $$t>0$$,

\begin{aligned} \left\| \xi _k(t) \right\| _{\varPsi }\le C_\xi t^{p+1}. \end{aligned}

The assumption allows the stochastic processes to be non-Gaussian, to be probabilistically dependent, and to have different distributions and nonzero means. Furthermore, Assumption 2.3 allows for $$\xi _k(t)$$ to have different orders of integrability. The rates at which the absolute moments decrease to zero as t decreases to zero may differ as well. The function $$\varPsi$$ quantifies the maximal common order of integrability, and the parameter p quantifies the maximal common decay rate with respect to $$\left\| \cdot \right\| _{\varPsi }$$.

Assumption 2.3 generalises [20, Assumption 3.3], which in turn generalised [10, Assumption 1]. The latter two assumptions considered the $$\left\| \cdot \right\| _{R}$$ norm for $$R\in \mathbb {N}$$ and the $$\left\| \cdot \right\| _{2}$$ norm of $$\mathbb {R}^{d}$$-valued random variables respectively.

We recall the motivation given in  for the additive random perturbation in (1.6) and in particular for Assumption 2.3. Comparing (1.5) and (1.6) yields

\begin{aligned} u(t_1) = u(0)+\int _{0}^{h_0} f(s,u(s))\,\mathrm {d}s\approx \psi (h_0,0,u(0))+\xi _0(h_0) = U_1. \end{aligned}

Thus, the random variable $$\xi _0(h_0)$$ models the uncertainty in the value of the integral term due to the fact that the value of the solution u over the time interval $$[0,h_0]$$ is known only at time 0, and not at every time s in the interval $$[0,h_0]$$.

It is desirable that the approximation above is good with high probability. Given that any reasonable choice of $$\psi$$ must satisfy $$\lim _{h_0\rightarrow 0}\psi (h_0,0,u_0)= u_0$$, a necessary condition for the approximation above to be good with high probability is that the law of $$\xi _0(h_0)$$ concentrates around 0 as $$h_0\rightarrow 0$$, because the integral term $$\int _{0}^{h_0}f(s,u(s))\,\mathrm {d}s\rightarrow 0$$ as $$h_0\rightarrow 0$$. Using Assumption 2.3 with Markov’s inequality yields that for every $$\varepsilon >0$$,

\begin{aligned} \mathbb {P}(\left| \xi _k(t)\right| _V\ge \varepsilon )\le \left( \frac{ C_\xi t^{p+1}}{\varepsilon }\right) ^r. \end{aligned}

The inequality above shows that the parameter p quantifies the maximal common rate at which all the laws $$(\mathbb {P}\circ (\left| \xi _k(t)\right| _V)^{-1})_{k}$$ contract around the Dirac measure at zero, as t decreases to zero.

In [10, 20], the parameter p is chosen in order to ensure that the error of the random approximate solution sequence $$(U_k)_{k}$$ with respect to the exact sequence $$(u(t_k))_{k}$$ decreases with h at the same rate as the error of the deterministic approximate solution sequence $$(u(t_k))_{k}$$. This choice is motivated by the goal of showing that probabilistic integrators can have the same convergence rate as the underlying deterministic one-step method.

Recall that if V is a separable Banach space and $$\mu$$ is a Gaussian measure whose support equals V, then the Cameron–Martin space of $$\mu$$ is dense in V, and hence there exists a V-valued Wiener process $$(W(t))_{t\ge 0}$$ associated to $$\mu$$ such that $$W(1)\sim \mu$$ [4, Theorem 3.6.1, Proposition 7.2.3].Footnote 2 The next lemma shows that there exists a large class of Gaussian processes that satisfies Assumption 2.3.

### Lemma 2.4

Let $$\mu$$ be a Gaussian distribution with support equal to V, and let $$(W(t))_{t\ge 0}$$ be a Wiener process associated to $$\mu$$ such that $$W(1)\sim \mu$$. Let $$\xi$$ be a stochastic process on $$[0,\infty )$$ defined by $$t\mapsto \xi (t):=t^{p+1/2}W(t)$$, and let $$(\xi _k)_{k\in \mathbb {N}_0}$$ be i.i.d. copies of $$\xi$$. Then

\begin{aligned} \left\| \xi (t) \right\| _{\varPsi }= \left\| \xi (1) \right\| _{\varPsi }t^{p+1}, \end{aligned}
(2.1)

for $$\left\| \cdot \right\| _{\varPsi }=\left\| \cdot \right\| _{R}$$, $$R>1$$, or $$\left\| \cdot \right\| _{\varPsi }=\left\| \cdot \right\| _{\varPsi _2}$$, $$\varPsi _2(z):=\exp (z^2)-1$$.

### Proof

For $$t>0$$, we have $$\left\| \xi (t) \right\| _{\varPsi }=t^{p+1/2}\left\| W(t) \right\| _{\varPsi }=t^{p+1}\left\| W(1) \right\| _{\varPsi }$$. The first equation follows from the definition of $$\xi (t)$$, and the second equation follows from the scaling property of the Wiener process, i.e. that $$W(t)=t^{1/2}W(1)$$ in distribution for every $$t>0$$. The conclusion follows since $$W(1)=\xi (1)$$ as random variables, and because Gaussian random variables are exponentially square integrable by Fernique’s theorem. $$\square$$

### Remark 2.5

The preceding discussion shows that a collection of i.i.d. copies of the standard Wiener process W satisfies Assumption 2.3 with $$p=-1/2$$, in which case we may set $$\xi _k(h_k)$$ in (1.6) to be a centred Gaussian random variable with variance proportional to $$h_k$$. This choice yields a time integration method that resembles methods for stochastic differential equations. However, for the error bound in Theorem 2.8 below to imply convergence in probability of $$(U_n)_n$$ to the exact solution sequence $$(u(t_k))_{k}$$, we need $$p>0$$. This observation highlights an important difference between the type of time integration methods that we analyse in this paper and time integration methods for stochastic differential equations.

### 2.2 Error bounds

Recall from (1.7) that

\begin{aligned} e_{k+1}=\varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)-\xi _k(h_k),\quad k\in [N-1]_0. \end{aligned}

The following bound is the generalisation of [10, Theorem 2] to our setting.

### Lemma 2.6

Suppose that

• Assumption 2.1 holds with parameters $$L_{\varphi }$$,

• Assumption 2.2 holds with parameters $$h^*$$, $$C_{\varphi ,\psi }$$ and q,

• Assumption 2.3 holds with parameters $$\left\| \cdot \right\| _{\varPsi }$$, p, and $$C_\xi$$, and

• the initial state $$U_0$$ satisfies $$\left\| U_0 \right\| _{\varPsi }<\infty$$.

Then for any time grid $$(t_k)_k$$ such that $$0<h\le h^*$$, the corresponding error sequence $$(e_k)_k$$ satisfies

\begin{aligned} \max _k\left\| e_k \right\| _{\varPsi }\le \exp (L_\varphi T)\left\| e_0 \right\| _{\varPsi }+\frac{C_{\varphi ,\psi }+C_\xi }{L_\varphi }\left( \exp (L_\varphi T)-1\right) h^{p\wedge q}. \end{aligned}

In particular, if $$\left\| e_0 \right\| _{\varPsi }=0$$, then $$\max _k\left\| e_k \right\| _{\varPsi }={\mathcal {O}}(h^{p\wedge q})$$.

### Proof

It suffices to prove the first statement. Let $$k\in [N-1]_0$$. From (1.7) we have

\begin{aligned} \left| e_{k+1}\right| _V&\le \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _V+\left| \xi _k(h_k)\right| _V \nonumber \\&\le \left| \varphi (h_k,t_k,u(t_k))-\varphi (h_k,t_k,U_k)\right| _V+\left| \varphi (h_k,t_k,U_k)-\psi (h_k,t_k,U_k)\right| _V \nonumber \\&\quad +\left| \xi _k(h_k)\right| _V \nonumber \\&\le (1+L_\varphi h_k)\left| e_k\right| _V+C_{\varphi ,\psi }h_k^{q+1}+\left| \xi _k(h_k)\right| _V \end{aligned}
(2.2)

where (2.2) follows from Assumptions 2.1 and 2.2. By taking the $$\left\| \cdot \right\| _{\varPsi }$$ norm of both sides of (2.2), using the triangle inequality, Assumption 2.3, and the bound $$h_k\le h$$ from (1.3), we obtain

\begin{aligned} \left\| e_{k+1} \right\| _{\varPsi }\le (1+L_\varphi h)\left\| e_k \right\| _{\varPsi }+(C_{\varphi ,\psi }+C_\xi )h^{(p\wedge q)+1}. \end{aligned}

Applying the discrete Gronwall inequality in Lemma C.1 completes the proof. $$\square$$

### Remark 2.7

In addition to bounds on the strong error $$\left\| e_k \right\| _{\varPsi }$$, one can prove bounds on the weak error, i.e. bounds of the form

\begin{aligned} \vert {\mathbb {E}}[\varPhi (U^h_n)]-\varPhi (u_n)\vert \le Ch^w, \end{aligned}

for all sufficiently smooth $$\mathbb {R}$$-valued functions $$\varPhi$$. Such bounds were proven in [10, Theorem 2.4] and [1, Section 3], for example. We focus on strong error bounds in this paper.

To prove Lemma 2.6, we take expectations via the $$\left\| \cdot \right\| _{\varPsi }$$ norm before applying the discrete Gronwall inequality in Lemma C.1 to conclude. By reversing the order of these operations and by using a different discrete Gronwall inequality, we can bound $$\left\| \max _k\vert e_k \vert _V \right\| _{\varPsi }$$. This yields the result below, which extends [20, Theorem 3.5] to our setting. On one hand, this bound has worse constants than the bound in Lemma 2.6. On the other hand, the bound is stronger, because

\begin{aligned} \max _k\left\| e_k \right\| _{\varPsi }\le \left\| \max _k\left| e_k\right| _V \right\| _{\varPsi }, \end{aligned}
(2.3)

and because the bound has the same order in h as Lemma 2.6.

### Theorem 2.8

Suppose the hypotheses of Lemma 2.6 hold. Then for any time grid $$(t_k)_k$$ with $$0<h\le h^*$$, the corresponding error sequence $$(e_k)_k$$ satisfies

\begin{aligned} \left\| \max _k \left| e_k\right| _V \right\| _{\varPsi }\le \left( \left\| e_0 \right\| _{\varPsi }+ C_{\varphi ,\psi }h^q T+C_\xi h^p T\right) \exp \left( L_\varphi T\right) , \end{aligned}

In particular, if $$\left\| e_0 \right\| _{\varPsi }=0$$, then $$\left\| \max _k\left| e_k\right| _V \right\| _{\varPsi }={\mathcal {O}}(h^{p\wedge q})$$.

### Remark 2.9

When $$\varPsi (z)=\exp (z^2)-1$$, then the strong error bound given in Theorem 2.8 implies the exponential square integrability of the pathwise error $$\max _k\left| e_k\right| _V^2$$. The exponential square integrability of the pathwise error was used in [21, Section 5] to establish local Lipschitz continuity of random approximate posteriors—measured in the Hellinger metric—with respect to the expected error of the randomised time integrator. In , exponential integrability was obtained by considering $$\left\| \max _k\left| e_k\right| _V \right\| _{R}$$ for all $$R\in \mathbb {N}$$ and using the series representation of the exponential function. The use of Orlicz norms allows us to exploit the fact that the random approximating sequence $$(U_k)_k$$ inherits the integrability properties of the collection $$(\xi _k)_k$$. This leads to a simpler proof of exponential integrability.

### Proof of Theorem 2.8

Using (2.2) and applying the discrete Gronwall inequality in Lemma C.3, we obtain for every $$k\in [N-1]_0$$ that

\begin{aligned} \left| e_{k+1}\right| _V\le \left( \left| e_0\right| _V+\sum _{k \in [N-1]_0} \left( C_{\varphi ,\psi }h_k^{q+1}+\left| \xi _k(h_k)\right| _V\right) \right) \exp \left( \sum _{0\le j\le k} L_\varphi h_j\right) . \end{aligned}

Since the sum in the exponential increases with k, setting $$k=N-1$$ above and using (1.3) to obtain $$\sum _{j\in [N-1]_0}h_j=T$$ yields the ‘pathwise’ bound

\begin{aligned} \max _k\left| e_k\right| _V\le \left( \left| e_0\right| _V+ C_{\varphi ,\psi }h^{q}T+\sum _{k\in [N-1]_0}\left| \xi _k(h_k)\right| _V\right) \exp \left( L_\varphi T\right) . \end{aligned}
(2.4)

By taking the $$\left\| \cdot \right\| _{\varPsi }$$ norm of both sides of (2.4), the triangle inequality, Assumption 2.3, and (1.4), we obtain

\begin{aligned} \left\| \max _k \left| e_k\right| _V \right\| _{\varPsi } &\le \left( \left\| e_0 \right\| _{\varPsi }+C_{\varphi ,\psi }h^{q}T+\sum _{k\in [N-1]_0}\left\| \xi _k(h_k) \right\| _{\varPsi }\right) \exp \left( L_\varphi T\right) \\& \le \left( \left\| e_0 \right\| _{\varPsi }+C_{\varphi ,\psi }h^qT+C_\xi h^p T\right) \exp \left( L_\varphi T\right) , \end{aligned}

which completes the proof. $$\square$$

### Remark 2.10

Under the assumption that $$V=\mathbb {R}^{d}$$ and under the assumption that the randomisation sequence $$(\xi _k(h_k))_k$$ consists of centred, independent random variables, [10, Theorem 2] and [20, Theorem 3.4] consider the special case where $$\left\| \cdot \right\| _{\varPsi }=\left\| \cdot \right\| _{2}$$ in Lemma 2.6 and Theorem 2.8, and establish $${\mathcal {O}}(h^{q\wedge (p+1/2)})$$ bounds on the strong error respectively. The order in these bounds is better than the bounds we proved above. However, both the proofs of these results exploit both the inner product structure of $$\mathbb {R}^{d}$$ and the fact that linear functionals of the $$\xi _k$$ appear in the expansion of $$\vert e_{k+1} \vert ^2_{\mathbb {R}^d}$$. In the key inequality (2.2), we cannot exploit an inner product even if it were available, because we only consider $$\vert e_{k+1} \vert _{V}$$. In Sect. 3, we shall generalise [10, Theorem 2] and [20, Theorem 3.4] from $$\mathbb {R}^{d}$$ to general Hilbert spaces.

## 3 Variational setting

For evolution equations originating from PDEs with possibly non-smooth right-hand sides or non-smooth initial conditions, the classical solution theory that we considered in Sect. 2 might not apply, because the requirement that the operator f in (1.1) satisfies $$f(t,v)\in V$$ for every $$v\in V$$ and all suitable t might be too strong. For example, this requirement does not hold for the heat equation in Sobolev spaces $$W^{k,p}$$. There are several settings that extend the classical setting for such problems. In this section, we focus on the variational setting, because it is suitable for numerical time integration methods. In the variational setting, we consider a Gelfand triplet $$V \hookrightarrow H \simeq H' \hookrightarrow V'$$, which is a sequence of continuous embeddings of a Banach space V into a Hilbert space H that is identified with its dual space $$H'$$, which is then embedded in the dual space $$V'$$ of V [41, Proposition 23.13].

In this section, we further specify the operator differential equation (1.1) to be

\begin{aligned} u(0)=\vartheta \in H,\quad u'(t) + A(t,u(t)) = b(t)\in V',\quad t\in [0,T] \end{aligned}
(3.1)

for a given operator $$A :[0,T] \times V\rightarrow V'$$ and $$b \in L^{p'}(0,T;V')$$. The equation (3.1) is written in the form that is common in PDE theory instead of the form used in (1.1), where the right-hand side would be defined by $$f(t,u(t)) :=b(t) -A(t,u(t))$$. The solution of (3.1) belongs to the space

\begin{aligned} {\mathcal {W}}^p (0,T) :=\left\{ u \in L^p (0,T;V) \, \Big | \, u' \in L^{p'}(0,T;V') \, \text {with } \frac{1}{p} + \frac{1}{p'} = 1\right\} , \end{aligned}

which is continuously embedded into C([0, T]; H) [12, Satz 8.4.1]. We emphasise that a solution of (3.1) must satisfy the equation only for almost every $$t\in [0,T]$$, and not for every t.

There are several conditions—e.g. Lipschitz or one-sided Lipschitz conditions, strong positivity, monotonicity, or coercivity — that one can impose on A and b in order to guarantee the existence of a unique variational solution $$u \in {\mathcal {W}}^p(0,T) \hookrightarrow C([0,T];H)$$ [41, Prop. 23.23]. Under stronger assumptions, higher regularity of u can be achieved [12, Satz 8.5.1]. In some cases, the flow map is continuous and even Lipschitz; see [41, Theorem 23.A] for linear problems and [41, Corollary 23.26] for the time-dependent case.

Recall the definition (1.5) of the sequence $$(u(t_k))_{k\in [N]_0}$$ of states of the exact solution:

\begin{aligned} u(t_{k+1})=\varphi (h_k,t_k,u(t_k)),\quad k\in [N-1]_0, \end{aligned}

where $$\varphi$$ is the flow map associated to the differential equation of interest (3.1). In the variational setting, the flow map $$\varphi$$ maps $$(h,t,u_s)$$ with $$h\in [0,T]$$, $$t\in [0, T-h]$$, and $$u_s\in H$$ to a vector $$\varphi (h,t,u_s)\in H$$. Next, recall that $$\psi$$ is the approximate flow map associated to a time integration method, and that according to (1.6), we construct the random approximating sequence $$(U_k)_{k\in [N]_0}$$ according to

\begin{aligned} U_{k+1}=\psi (h_k,t_k,U_k)+\xi _k(h_k),\quad k\in [N-1]_0. \end{aligned}

In this section, we shall assume that the initial condition $$U_0$$ is a H-valued random variable, and that each $$\xi _k$$ is a H-valued stochastic process indexed by $$[0,\infty )$$.

We shall make the following assumptions on $$\psi$$.

### Assumption 3.1

Let $$h^*>0$$, and let $$\psi :[0,h^*]\times [0,T]\times H\rightarrow V$$ satisfy the following conditions:

1. 1.

There exists a scalar $$q\ge 0$$, a function $$C_{\varphi ,\psi } :[0,T]\times H\rightarrow (0,\infty )$$ that is bounded on bounded subsets, and a dense subset $${\mathcal {D}}\subset H$$, such that, for every $$h\in [0,h^*]$$ and for every $$(t,x)\in [0,T-h]\times H$$ with $$x=\varphi (s,0,\vartheta ')$$ for some $$s\ge 0$$ and $$\vartheta '\in {\mathcal {D}}$$,

\begin{aligned} \left| \varphi (h,t,x)-\psi (h,t,x)\right| _H\le C_{\varphi ,\psi }(t,x)h^{q+1}; \end{aligned}
(3.2)
2. 2.

There exists a constant $$L_{\psi }>0$$ such that for all $$(h,t)\in [0,h^*]\times [0,T]$$ such that $$h+t\leq T$$ and for any $$x,y\in H$$,

\begin{aligned} \left| \psi (h,t,x)-\psi (h,t,y)\right| _H\le (1+ L_{\psi }h)\left| x-y\right| _H. \end{aligned}
(3.3)

The first statement of Assumption 3.1 means that the one-step error bound (3.2) holds for any x that lies on some solution $$u\in C([0,T];H)$$ of (3.1), where the initial condition $$\vartheta '=u(0)$$ belongs to the dense subset $${\mathcal {D}}$$. We make the hypothesis of density in order to account for known results concerning error bounds for time integration of PDEs, see e.g. [37, Chapter 7].

The local truncation error (3.2) is a reasonable requirement for any deterministic time integration method $$\psi$$ and weakens the uniform local truncation error bound of Assumption 2.2. Given (3.2), we define

\begin{aligned} \left\| C_{\varphi ,\psi } \right\| _{\infty }:=\left\| C_{\varphi ,\psi } \right\| _{\infty }(\vartheta):=\sup _{t\in [0,T]}C_{\varphi ,\psi }(t,u(t)), \end{aligned}
(3.4)

for any solution u of (3.1) with initial condition $$\vartheta \in {\mathcal {D}}$$. Since the solution u of (3.1) belongs to C([0, T]; H), it is a bounded set. Hence, the first statement of Assumption 3.1 ensures the finiteness of $$\left\| C_{\varphi ,\psi } \right\| _{\infty }$$ for any $$\vartheta\in\mathcal{D}$$. The second statement of Assumption 3.1 describes a global Lipschitz continuity property of the approximate flow map $$\psi$$ with respect to the third argument of the map $$\psi$$. For the error bounds that we prove in this section, the bounds (3.2) and (3.3) shall play the roles of Assumptions 2.2 and 2.1 respectively in the error bounds of Sect. 2.2. Next, we formulate the analogue of Assumption 2.3 for the collection $$(\xi _k)_{k\in \mathbb {N}_0}$$ of stochastic processes. For the remainder of Sect. 3, we shall simplify notation and write $$\left\| Z \right\| _{\varPsi }$$ instead of $$\left\| Z \right\| _{\varPsi (\varOmega ;H)}$$ for any H-valued random variable Z.

### Assumption 3.2

The collection $$(\xi _k)_{k\in \mathbb {N}_0}$$ admits an Orlicz norm $$\left\| \cdot \right\| _{\varPsi }$$ and constants $$p\ge 0$$ and $$0<C_\xi <\infty$$, such that for all $$k\in \mathbb {N}_0$$ and $$t>0$$,

\begin{aligned} \left\| \xi _k(t) \right\| _{\varPsi }\le C_\xi t^{p+1}. \end{aligned}

The only difference between Assumption 3.2 and Assumption 2.3 is that the stochastic processes are H-valued instead of V-valued.

### 3.1 $$L^2$$-error bounds for independent and centred randomisation

In this section, we assume that the $$(\xi _k)_{k}$$ are mutually independent and centred stochastic processes. In particular, for any time grid (1.3), the corresponding random variables $$(\xi _k(h_k))_{k\in [N-1]_0}$$ are mutually independent and centred. We shall generalise the $$L^2$$-error bounds from [10, Theorem 2] and [20, Theorem 3.4] to the variational setting.

For any time grid $$(t_k)_{k\in [N]_0}$$ and $$k\in [N-1]_0$$, let $${\mathcal {F}}_k:=\sigma (\xi _j(h_j): j\in [ k]_0)$$, i.e. $$({\mathcal {F}}_k)_{k\in [N-1]_0}$$ is the filtration generated by the randomisation sequence $$(\xi _j(h_j))_{j\in [N-1]_0}$$.

The following lemma only requires mutual independence of the $$(\xi _\ell )_{\ell }$$.

### Lemma 3.3

Suppose that Assumption 3.1 holds. Let $$(t_k)_{k\in [N]_0}$$ be an arbitrary time grid. Then for $$j\in [N-1]_0$$, $$U_{j+1}$$ is a measurable function of $$U_0$$ and $$\{\xi _\ell (h_\ell )\ :\ \ell \in [j]_0\}$$. In particular, if the $$(\xi _\ell )_{\ell }$$ are mutually independent, then for every $$j\in [N-1]$$, $$\xi _j(h_j)$$ and $$U_j$$ are independent, and $$\xi _j(h_j)$$ is independent of $${\mathcal {F}}_j$$.

### Proof

It follows from (3.3) in Assumption 3.1 that, for arbitrary (ht), $$\psi (h,t,z)$$ is globally Lipschitz continuous with respect to $$z \in H$$. Hence, $$U_{j+1}$$ is a measurable function of $$U_{j}$$ and $$\xi _j(h_j)$$, for every $$j\in [K-1]_0$$. This proves the first statement. The second statement follows from the first and the definition of $${\mathcal {F}}_j$$. $$\square$$

The following result is the generalisation of [10, Theorem 2.2], which considered the case $$H=\mathbb {R}^{d}$$ for $$d\in \mathbb {N}$$.

### Lemma 3.4

Suppose the following statements are true:

• Assumption 3.1 holds with parameters $$h^*$$, q, $$C_{\varphi ,\psi }$$, $${\mathcal {D}}$$, and $$L_\psi$$,

• Assumption 3.2 holds with parameters $$\left\| \cdot \right\| _{\varPsi }:=\left\| \cdot \right\| _{2}$$, p, and $$C_\xi$$,

• the $$(\xi _j)_{j}$$ are mutually independent and centred, and

• the initial condition $$\vartheta$$ of (3.1) belongs to $${\mathcal {D}}$$, and $$\left\| U_0 \right\| _{2}<\infty$$.

Then there exists a $$L'_{\psi }>0$$ depending only on $$L_\psi$$, such that for any time grid $$(t_k)_k$$ satisfying $$0<h\le 1\wedge h^*$$, the associated error sequence $$(e_k)_k$$ satisfies

\begin{aligned} \max _{k}\left\| e_k \right\| ^2_{2}\le \left( \left\| e_0 \right\| ^2_{2}+ 3T \left\| C_{\varphi ,\psi } \right\| _\infty ^2 T h^{2q}+C_\xi ^2 Th^{2p+1}\right) \exp \left( L'_\psi T\right) . \end{aligned}

In particular, if $$\left\| e_0 \right\| _{2}=0$$, then $$\max _{k\in [N]_0}\left\| e_k \right\| _{2}={\mathcal {O}}(h^{q\wedge (p+1/2)})$$.

We state the proof below, even though it is very similar to the proof of [10, Theorem 2.2]. This is because the proof will be useful later in Sect. 3.2, where we discuss the feasibility of bounding $$\max _{k} \left\| e_k \right\| _{R}$$ for $$R> 2$$ under similar assumptions as Lemma 3.4. An important difference between our proof and the proof of [10, Theorem 2.2] is that the latter assumes uniform truncation error, e.g. as in Assumption 2.2. Instead, we use Assumption 3.1.

### Proof of Lemma 3.4

Let $$k\in [N-1]_0$$. By the definition (1.7) of the error sequence $$(e_k)_{k\in [N]_0}$$,

\begin{aligned} \left| e_{k+1}\right| _H^2&= \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2+\left| \xi _k(h_k)\right| _H^2 \nonumber \\&\quad + 2\left\langle \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k), \xi _k(h_k) \right\rangle _H. \end{aligned}
(3.5)

Recall the term $$\left\| C_{\varphi ,\psi } \right\| _{\infty }$$ from (3.4). We obtain

\begin{aligned}&\left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2 \nonumber \\&\quad = \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,u(t_k))-\psi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| ^2_H \nonumber \\&\quad \le \left( 1+\left( \tfrac{2}{h_k}\right) \right) C_{\varphi ,\psi }(t_k,u(t_k))^2 h_{k}^{2q+2}+(1+2h_k)(1+L_\psi h_k)^2\left| e_k\right| _H^2 \nonumber \\&\quad \le 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{k}^{2q+1}+(1+2h_k)(1+L_\psi h_k)^2\left| e_k\right| _H^2. \end{aligned}
(3.6)

The first inequality follows from the hypothesis that the initial condition $$\vartheta$$ of (3.1) belongs to $${\mathcal {D}}$$, since we can then apply the local truncation error bound (3.2) of Assumption 3.1 and Young’s inequality. The second inequality follows from the fact that $$h_k\le h\le 1$$. By the same fact, there exists $$L'_\psi >0$$ that depends only on $$L_\psi$$ such that $$(1+2h_k)(1+L_\psi h_k)^2\le 1+L'_\psi h_k$$. Using this inequality in (3.6) yields

\begin{aligned} \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2\le 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{k}^{2q+1}+(1+L'_\psi h_k)\left| e_k\right| _H^2. \end{aligned}
(3.7)

Substituting (3.7) into the bound (3.5) on $$\vert e_{k+1} \vert _H^2$$ yields

\begin{aligned} \left| e_{k+1}\right| _H^2 & \le \left( 3 \left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h_{k}^{2q+1}+(1+L'_\psi h_k)\left| e_k\right| _H^2\right) +\left| \xi _k(h_k)\right| _H^2 \nonumber \\&\quad + 2\left\langle \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k), \xi _k(h_k) \right\rangle _H. \end{aligned}
(3.8)

By mutual independence of the $$(\xi _j(h_j))_{j \in [N-1]_0}$$, it follows from the second statement of Lemma 3.3 that the arguments of the inner product are independent. By taking expectations of (3.8) and centredness of the $$(\xi _j(h_j))_{j \in [N-1]_0}$$, the expectation of the inner product vanishes. By Assumption 3.2, we have

\begin{aligned} \left\| e_{k+1} \right\| ^2_2\le (1+L'_\psi h_k)\left\| e_k \right\| ^2_2+3\left\| C_{\varphi ,\psi } \right\| _\infty ^2h_k^{2q+1}+ C_\xi ^2 h_k^{2p+2}. \end{aligned}

Using the discrete Gronwall inequality in Lemma C.3 and (1.4) completes the proof. $$\square$$

We shall use the next result, Lemma 3.5, to prove Proposition 3.6 below. A similar result to Lemma 3.5 was established in the proof of [20, Theorem 3.4], under the assumption that $$\psi$$ preserves square integrability of random variables, i.e. that $$\psi (Z)\in L^2(\varOmega ;\mathbb {R}^d)$$ for every $$Z\in L^2(\varOmega ;\mathbb {R}^d)$$. Lemma 3.5 removes this assumption, by using Lemma 3.4.

### Lemma 3.5

Suppose the hypotheses of Lemma 3.4 hold. Then for any time grid $$(t_j)_{j\in [N]_0}$$ with $$h>0$$, the stochastic process $$(M_k)_{k\in [N-1]_0}$$ defined by

\begin{aligned} M_k:=\sum _{j=0}^{k}\left\langle \varphi (h_j,t_j,u(t_j))-\psi (h_j,t_j,U_j),\xi _j(h_j) \right\rangle _H \end{aligned}
(3.9)

is a $$\mathbb {R}$$-valued, square-integrable martingale with respect to $$({\mathcal {F}}_k)_{k\in [N-1]_0}$$. If in addition the time grid $$(t_j)_{j\in [N]_0}$$ satisfies $$h\le 1\wedge h^*$$, then there exists a universal constant $$\kappa >0$$ such that for every $$k\in [N-1]_0$$,

\begin{aligned} \mathbb {E}\left[ \max _{j\in [k]_0}\left| M_k\right| \right] \le \left\| C_{\varphi ,\psi } \right\| _{\infty }^2h^{2q+1}+\frac{1}{4}\mathbb {E}\left[ \max _{j\in [k]_0}\left| e_j\right| ^2_H\right] +\kappa ^2(1+L'_\psi ) TC_\xi ^2 h^{2p+1}, \end{aligned}
(3.10)

for the same $$L'_\psi$$ given in Lemma 3.4.

### Proof

See Sect. D.1 for the proof. $$\square$$

Next, we use Lemma 3.5 to prove the following error bound, which is stronger than the bound given in Lemma 3.4 because of (2.3).

### Proposition 3.6

Suppose the hypotheses of Lemma 3.4 hold. Then for any time grid $$(t_k)_{k}$$ with $$0<h\le 1\wedge h^*$$, the corresponding error sequence $$(e_k)_k$$ satisfies

\begin{aligned}&\left\| \max _{k}\left| e_k\right| _H \right\| _{2}^2 \\&\quad \le 2\left( \left\| e_0 \right\| ^2_{2}+4\left\| C_{\varphi ,\psi } \right\| _{\infty }^2 h^{2q}T+ C_\xi ^2 Th^{2p+1} (1+\kappa ^2(1+L'_\psi ))\right) \exp \left( 2L'_\psi T\right) , \end{aligned}

for the universal constant $$\kappa$$ in (3.10) and the constant $$L'_\psi$$ given in Lemma 3.4. In particular, if $$\left\| e_0 \right\| _{2}=0$$, then $$\left\| \max _{k}\left| e_k\right| _H \right\| _{2}={\mathcal {O}}(h^{q\wedge (p+1/2)})$$.

### Proof

See Sect. D.2 for the proof. $$\square$$

### 3.2 Error bounds of higher integrability order for independent and centred randomisation

It is natural to ask if one can prove the analogues of Lemma 3.4 or Proposition 3.6 where we use $$\left\| \cdot \right\| _{R}$$, $$R>2$$, while keeping the same order in h. Suppose that we wish to prove the analogue of Lemma 3.4 for $$R=3$$. It follows from the triangle inequality and the definition (1.7) that

\begin{aligned} \left| e_{k+1}\right| _H\le \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H+\left| \xi _k(h_k)\right| _H. \end{aligned}

Thus

\begin{aligned} \left| e_{k+1}\right| _H^3\le \left| e_{k+1}\right| _H^2\left( \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H+\left| \xi _k(h_k)\right| _H\right) \end{aligned}

and substituting (3.5) results in an upper bound on $$\left| e_{k+1}\right| _H^3$$ containing the mixed product of an inner product term and a norm term,

\begin{aligned} \left\langle \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k),\xi _k(h_k) \right\rangle _H\left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H. \end{aligned}

In general, this product will not vanish in expectation, because one can no longer exploit the commutativity of the inner product with the expectation operator. The same assertion is valid for $$R\ge 3$$. This is the important difference between the $$R=2$$ case that was proven in Lemma 3.4 and the case $$R\ge 3$$. This difference implies that we must use the Cauchy–Schwarz inequality to bound products. Using the Cauchy–Schwarz inequality yields

\begin{aligned} \left| e_{k+1}\right| _H^3\le \sum _{i=0}^{3}\begin{pmatrix} 3 \\ i\end{pmatrix} \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^i\left| \xi _k(h_k)\right| _H^{3-i}. \end{aligned}

We can obtain the same bound by applying the binomial theorem to the bound $$\left| e_{k+1}\right| _H\le \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H+\left| \xi _k(h_k)\right| _H$$.

If the stochastic processes $$(\xi _k)_k$$ are mutually independent, then we may use the second statement of Lemma 3.3. Assuming that $$e_0=0$$ almost surely and taking expectations of the summand for $$i=2$$ yields

\begin{aligned}&\mathbb {E}\left[ \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2\left| \xi _k(h_k)\right| _H\right] \\&\quad = \mathbb {E}\left[ \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H^2\right] \mathbb {E}\left[ \left| \xi _k(h_k)\right| _H\right]&\qquad \text {by independence } \\&\quad \le \left( {\mathcal {O}}(h_{k}^{2q+1})+(1+L'_\psi h_k)\mathbb {E}\left[ \left| e_k\right| _H^2\right] \right) C_\xi h_k^{p+1}&\qquad \text {by}\, (3.7), \hbox { Assumption }3.2 \\&\quad \le \left( {\mathcal {O}}(h_{k}^{2q+1})+ {\mathcal {O}}(h^{2q})+{\mathcal {O}}(h^{2p+1})\right) C_\xi h_k^{p+1}&\text { by Lemma } 3.4. \end{aligned}

This yields a bound on $$\left\| e_{k+1} \right\| _3^{3}$$ by a term that is $${\mathcal {O}}(h^{(2q)\wedge (2p+1)+p+1})$$. Applying a discrete Gronwall inequality produces a bound on $$\max _{k\in [N]}\left\| e_k \right\| _{3}^{3}$$ that is $${\mathcal {O}}(h^{(2q)\wedge (2p+1)+p})$$. Since this upper bound on the exponent arises from the mixed product mentioned above, and since such mixed products will arise in any expansion of $$\vert e_{k+1} \vert _H^R$$, we cannot expect to prove that $$\max _{k}\left\| e_k \right\| _{R}={\mathcal {O}}(h^{q\wedge (p+1/2)})$$ for $$R>2$$ using the techniques that we applied earlier, even if the $$(\xi _k)_{k}$$ are mutually independent and centred.

For the $$L^3$$ analogue of Proposition 3.6, the fact that terms involving inner products do not vanish in expectation also poses a problem. This is because the proof of the $$L^2$$ case in Proposition 3.6 relies on the bound (3.10) in Lemma 3.5 on the martingale $$(M_k)_k$$. This bound in turn follows from the Burkholder–Davis–Gundy inequality for martingales [32, Chapter IV, §4, Theorem (4.1)]. For the $$L^3$$ case, the expectations of products containing an inner product term do not vanish, because one can no longer exploit commutativity of the inner product with the expectation operator, due to the mixed product. As a result, the martingale $$(M_k)_k$$ does not appear, and one cannot apply the Burkholder–Davis–Gundy inequality to prove a bound similar to (3.10). Instead, one must apply the Cauchy–Schwarz inequality or the binomial theorem, as we did above. This results in a bound on $$\left\| \max _k\left| e_k\right| _H \right\| _{3}$$ that is worse than $${\mathcal {O}}(h^{q\wedge (p+1/2)})$$.

### 3.3 Error bounds of higher integrability order without independence or centredness assumptions

In this section, we prove a strong error bound for a general Orlicz norm instead of for the $$\left\| \cdot \right\| _2$$-norm. We use the same hypotheses as for Lemma 3.4 and Proposition 3.6, except that we do not assume mutual independence or centredness of the stochastic processes $$(\xi _k)_{k\in \mathbb {N}_0}$$.

### Theorem 3.7

Suppose the following statements are true:

• Assumption 3.1 holds with parameters $$h^*$$, q, $$C_{\varphi ,\psi }$$, $${\mathcal {D}}$$, and $$L_\psi$$,

• Assumption 3.2 holds with parameters $$\left\| \cdot \right\| _{\varPsi }$$, p, and $$C_\xi$$, and

• the initial condition $$\vartheta$$ of (3.1) belongs to $${\mathcal {D}}$$, and $$\left\| U_0 \right\| _{\varPsi }<\infty$$.

Then for any time grid $$(t_k)_k$$ with $$0<h\le h^*$$, the corresponding error sequence $$(e_k)_k$$ satisfies

\begin{aligned} \left\| \max _k\left| e_k\right| _H \right\| _{\varPsi } \le \left( \left\| e_0 \right\| _{\varPsi }+\left\| C_{\varphi ,\psi } \right\| _{\infty }h^q T+C_\xi h^p T\right) \exp \left( L_\psi T\right) . \end{aligned}

In the results from Sect. 3.1, we required that the maximal time step h associated to the time grid satisfies $$h\le 1\wedge h^*$$. In Theorem 3.7, we only require that $$h\le h^*$$. The discussion of exponential integrability in Remark 2.9 also applies to Theorem 3.7.

### Proof of Theorem 3.7

Recall (1.7):

\begin{aligned} e_{k+1}=\varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)-\xi _k(h_k),\quad k\in [N-1]_0. \end{aligned}

By the triangle inequality, and by (3.2) and (3.3) from Assumption 3.1,

\begin{aligned}&\left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H \\&\quad \le \left| \varphi (h_k,t_k,u(t_k))-\psi (h_k,t_k,u(t_k))\right| _H+\left| \psi (h_k,t_k,u(t_k))-\psi (h_k,t_k,U_k)\right| _H \\&\quad \le \left\| C_{\varphi ,\psi } \right\| _{\infty }h_k^{q+1}+(1+L_\psi h_k)\vert e_k \vert _H. \end{aligned}

From this it follows that

\begin{aligned} \left| e_{k+1}\right| _H\le \left\| C_{\varphi ,\psi } \right\| _{\infty }h_k^{q+1}+(1+L_\psi h_k)\left| e_k\right| _H+\left| \xi _k(h_k)\right| _H. \end{aligned}
(3.11)

Applying Lemma C.3 and using the same arguments that yielded (2.4), we obtain the analogous pathwise bound

\begin{aligned} \max _k\left| e_k\right| _H\le \left( \left| e_0\right| _H+\left\| C_{\varphi ,\psi } \right\| _{\infty }h^{q}T+\sum _{k\in [N-1]_0}\left| \xi _k(h_k)\right| _H\right) \exp \left( L_\psi T\right) . \end{aligned}

Taking the $$\left\| \cdot \right\| _{\varPsi }$$ norm of both sides and applying Assumption 3.2 completes the proof. $$\square$$

### Remark 3.8

The inequality (3.11) in the proof of Theorem 3.7 closely resembles the inequality (2.2), which we used to prove Theorem 2.8. The key difference results from adding $$0=\psi (h_k,t_k,u(t_k))-\psi (h_k,t_k,u(t_k))$$ before applying the triangle inequality to derive (3.11); for (2.2), we added $$0=\varphi (h_k,t_k,U_k)-\varphi (h_k,t_k,U_k)$$ instead. The decomposition we use for (3.11) enables us to exploit the weaker local truncation error bound (3.2) in Assumption 3.1 instead of the uniform local truncation error bound in Assumption 2.2.

## 4 Example: heat equation

Consider the heat equation on a $$C^2$$ bounded domain $$D\subset \mathbb {R}^{d}$$ with homogeneous Dirichlet boundary conditions

\begin{aligned} u(0) = u_0, \quad \partial _t u - \text {div} ({\mathcal {E}} \nabla u ) = b \, \text { on } [0,T] \times D, \end{aligned}
(4.1)

where $${\mathcal {E}}:[0,T]\times D\rightarrow \mathbb {R}^{d\times d}$$ is a sufficiently smooth elliptic diffusion tensor. Upon multiplying the PDE by a test function and using integration by parts, the left-hand side of the PDE yields a bilinear form a(u(t), v), which allows us to rewrite the problem above as the operator differential equation

\begin{aligned} u(0)=u_0\in H, \quad u'(t) + A u(t) = b \in V'\, \end{aligned}
(4.2)

with spaces $$H=L^2(D)$$, $$V=H^1_0(D)$$, and $$V'=H^{-1}(D)$$. The bounded, linear operator $$A\, : \, V \rightarrow V'$$ is induced by the bilinear form $$a(\cdot ,\cdot )$$ on $$V\times V$$ according to $$a(u,v) = \left\langle Au, v \right\rangle _{V'\times V}$$, where $$\left\langle \cdot ,\cdot \right\rangle _{V'\times V}$$ denotes the dual pairing. For the particular PDE considered above, the operator A is strongly positive with constant $$\mu >0$$ on $$V\times V$$.

In this section, we will show that the results that we proved for the variational setting in Sect. 3 are valid for parabolic PDEs and the implicit Euler method, by showing that the conditions (3.2) and (3.3) from Assumption 3.1 are satisfied. We shall consider the more general setting of parabolic PDEs with possibly time-dependent coefficients, because this analysis includes the setting of time-independent coefficients — and hence the heat equation stated above—as a special case.

Let $$L(V,V')$$ be the set of all linear mappings from V to $$V'$$. Consider a mapping $$a :[0,T] \times V \times V \rightarrow {\mathbb {R}}$$ that is bilinear in the second and third argument. This mapping induces a collection $$(A(t))_t\subset L(V,V')$$ according to

\begin{aligned} \left\langle A(t) u(t), v \right\rangle _{V'\times V} = a(t,u(t),v),\quad \forall v \in V. \end{aligned}

Now we pose the following standard assumptions on a and state their equivalent formulation in terms of A.

### Assumption 4.1

1. 1.

For fixed t, $$a(t,\cdot , \cdot )$$ is a bilinear form, and for fixed $$u,v\in V$$, $$a(\cdot , u,v)$$ is measurable. Equivalently, for every t, $$A(t)\in L(V,V')$$ is linear and $$t\mapsto A(t)$$ is measurable.

2. 2.

There exists $$\beta >0$$ such that for every (tuv), $$a(t,u,v) \le \beta \left| u \right| _V \vert v \vert _V$$. Equivalently, for every t we have $$\left\| A(t) \right\| _{L(V,V')} \le \beta$$.

3. 3.

A Gårding inequality holds, i.e. there exist $$\mu >0$$, $$\kappa \ge 0$$ such that

\begin{aligned} a(t,u,u) \ge \mu \left| u\right| _V^2 - \kappa \left| u\right| _H^2, \quad \forall (t,u)\in [0,T]\times V. \end{aligned}
(4.3)

Equivalently, for every $$t\in [0,T]$$, $$A(t) + \kappa I \in L(V,V')$$ is strongly positive.

For the special case of the heat equation (4.1) where $${\mathcal {E}}$$ is the identity matrix, the first statement of Assumption 4.1 holds since $${\mathcal {E}}$$ is constant. By definition of the bilinear form a and the spaces H and V, the second statement holds with $$\beta =1$$, and the third statement holds with equality for $$\kappa =0$$ and $$\mu =1$$.

Consider the implicit Euler scheme

\begin{aligned} \psi (h,t,v):=(I+h {\bar{A}}_{h,t})^{-1}(h{\bar{b}}_{h,t}+v), \end{aligned}
(4.4)

for $$0<h\le h^*$$, $$0\le t\le T-h$$ and $$v\in H$$. We specify an interval of suitable values of $$h^*$$ in Sect. 4.2. Above, $${\bar{A}}_{h,t}$$ and $${\bar{b}}_{h,t}$$ denote Steklov time averages of the linear operators $$(A(t))_{t}$$ and the right-hand side b respectively,

\begin{aligned} {\bar{A}}_{h,t}:=\frac{1}{h}\int _{t}^{t+h}A(s)\,\mathrm {d}s,\quad {\bar{b}}_{h,t}:=\frac{1}{h}\int _{t}^{t+h}b(s)\,\mathrm {d}s, \end{aligned}

where the integrals in the definitions of $${\bar{A}}_{h,t}$$ and $${\bar{b}}_{h,t}$$ are Bochner–Lebesgue integrals in $$L(V,V')$$ and $$V'$$ respectively. The existence of $$\psi (h,t,v) \in V$$ for $$(h{\bar{b}}_{h,t}+v) \in V'$$ is guaranteed by the Lax–Milgram theorem; see e.g. [7, Section 6.2]. For every suitable (ht), the operator $${\bar{A}}_{h,t}$$ inherits the properties of A stated in Assumption 4.1.

For the heat equation (4.1), $$t\mapsto A(t)$$ and $$t\mapsto b(t)$$ are constant. Therefore, $${\bar{A}}_{h,t}=A$$ and $${\bar{b}}_{h,t}=b$$, and (4.4) simplifies to $$\psi (h,t,v):=(I+h A)^{-1}(hb+v)$$.

### 4.1 Local truncation error condition

We verify the local truncation error condition (3.2) in Assumption 3.1, for $$\psi$$ as given in (4.4). Recall the definition (1.5) of $$(u(t_k))_{k}$$ and that $$(u_k)_{k}$$ is defined by $$u_0=\vartheta$$, $$u_{k+1} :=\psi (h_k, t_k,u_k)$$ for $$k\in [N-1]_0$$. Under the assumption that $$(b-u')' \in L^2(0,T;V')$$, the result [12, Satz 8.3.6] yields for any initial condition $$\vartheta \in H$$

\begin{aligned} \left| u_k - u(t_k) \right| _H^2 + \mu \sum _{j=1}^k h_j \left| u_j - u(t_j) \right| _V^2 \le \frac{h^2}{3 \mu } \left| (b-u')'\right| _{L^2(0,T;V')}^2, \end{aligned}

where $$\mu$$ is the constant from positivity assumption on A (4.3). Thus, (3.2) holds with $$q=0$$ and $$C_{\varphi ,\psi }(t,x)=(3\mu )^{-1/2}\vert (b-u')' \vert _{L^2(0,T;V')}$$ for all (tx).

One can obtain numerical methods of higher order q, by assuming higher regularity of the solution. For example, [22, Theorems 4.2, 4.3, 4.4] assume $$u,u',u'' \in {\mathcal {W}}^2(0,T)$$, and show the existence of a numerical method $$\psi$$ that satisfies (3.2) with $$q=1$$. For a general result dealing with arbitrary regularity $$u^{(k+1)} \in {\mathcal {W}}^2(0,T)$$ and numerical method of order $$q=k$$, see [23, Theorem 3.2].

### 4.2 Lipschitz condition on approximate flow map

Next, we verify the Lipschitz condition (3.3) for $$\psi$$ given in (4.4), and determine an interval of suitable values for the upper bound $$h^*$$ on the time step of the implicit Euler scheme. Fix $$0<h\le h^*$$, $$t\in [0,T-h]$$, and $$u_0,v_0\in V$$. Test $$w_1:=\psi (h,t,u_0)-\psi (h,t,v_0)\in V$$ with $$w_0:=u_0-v_0\in H$$. Then

\begin{aligned} \frac{1}{2h} (\left| w_1\right| _H^2 -\left| w_0\right| _H^2)& \le \left\langle \frac{w_1-w_0}{h} , w_1 \right\rangle _{H} \le - \left\langle {\bar{A}}_{h,t} w_1 , w_1 \right\rangle _{V'\times V} \\& \le - \mu \left| w_1\right| _V^2 + \kappa \left| w_1\right| _H^2. \end{aligned}

The first inequality follows from rearranging $$0\le \vert w_{1}+w_{0} \vert _{H}^{2}$$. The second inequality follows since (4.4) is equivalent to $$h^{-1}(\psi (h,t,u_0)-u_0)={\bar{b}}_{h,t}-{\bar{A}}_{h,t} \psi (h,t,u_0)$$. The third inequality holds because $${\bar{A}}_{h,t}$$ inherits the positivity property (4.3) from A. Using $$(2h)^{-1}(\vert w_1 \vert _{H}^{2}-\vert w_0 \vert _{H}^{2})\le -\mu \vert w_1 \vert _{V}^{2}+\kappa \vert w_1 \vert _{H}^{2}$$ and the definitions of $$w_1$$ and $$w_0$$, we obtain

\begin{aligned} \left| \psi (h,t,u_0)-\psi (h,t,v_0)\right| _H^2 \le (1-2h\kappa )^{-1}\left| u_0-v_0\right| _H^2. \end{aligned}

If $$\kappa \le 0$$, then $$(1-2h\kappa )^{-1}\le (1+L_\psi h)$$ for any $$L_\psi >0$$ and $$h>0$$. Therefore, suppose that $$\kappa >0$$. If the bound above on $$\left| \psi (h,t,u_0)-\psi (h,t,v_0)\right| _H^2$$ holds for all $$0<h\le h^*$$, then we must have $$h^*<(2\kappa )^{-1}$$. In fact, if $$h^*<(2\kappa )^{-1}$$, then $$L_\psi :=[(2\kappa )^{-1}-h^*]^{-1}$$ is equivalent to $$h^*=\tfrac{L_\psi -2\kappa }{2\kappa L_\psi }$$. In this case, $$0<h\le h^*$$ is equivalent to

\begin{aligned} h^2(2\kappa L_\psi )\le h(L_\psi -2\kappa )\Leftrightarrow 1\le (1+L_\psi h)(1-2h\kappa )\Leftrightarrow (1-2h\kappa )^{-1} \le 1+L_\psi h. \end{aligned}

Hence, the implicit Euler scheme (4.4) satisfies condition (3.3) in Assumption 3.1.

## 5 Conclusion

In this paper, we proved strong error bounds for general Orlicz norms for randomised time integration methods applied to operator differential equations, using possibly non-uniform time grids. Our work builds on the ideas and approaches of [10, 20]. We show that the proof techniques of the key error bounds contained therein can be applied in more general settings, where the differential equation is formulated on a possibly infinite-dimensional Banach or Hilbert space, and the numerical time integration method is applied to a possibly non-uniform time grid. Our work has two additional novel aspects relative to [10, 20].

First, we use a different error decomposition to bound the one-step error. Our error decomposition enables us to replace the strong assumption of uniform local truncation error with a weaker assumption on the local truncation error. This is important, because it is known that the strong assumption of uniform local truncation error is invalid even when the linear operator A in the operator differential equation generates an analytic semigroup [37, Theorem 7.1]. For the implicit Euler method, and for a large class of examples that includes the standard heat equation, we showed that our weaker local truncation error assumption is reasonable.

Second, we consider more general Orlicz norms instead of $$L^R$$-norms. Previous results concerning higher-order error bounds — for example, [20, Theorem 3.5]—were less direct: they involved finding bounds on the $$L^R$$ error for each $$R\in \mathbb {N}$$ and using the series expansion of the exponential function. The use of Orlicz norms leads to shorter and conceptually simpler proofs of our main results, Theorem 2.8 and Theorem 3.7, by exploiting the fact that the random approximating sequence $$(U_k)_k$$ inherits the integrability properties of the collection $$(\xi _k)_k$$.