# Statistical analysis of differential equations: introducing probability measures on numerical solutions

## Abstract

In this paper, we present a formal quantification of uncertainty induced by numerical solutions of ordinary and partial differential equation models. Numerical solutions of differential equations contain inherent uncertainties due to the finite-dimensional approximation of an unknown and implicitly defined function. When statistically analysing models based on differential equations describing physical, or other naturally occurring, phenomena, it can be important to explicitly account for the uncertainty introduced by the numerical method. Doing so enables objective determination of this source of uncertainty, relative to other uncertainties, such as those caused by data contaminated with noise or model error induced by missing physical or inadequate descriptors. As ever larger scale mathematical models are being used in the sciences, often sacrificing complete resolution of the differential equation on the grids used, formally accounting for the uncertainty in the numerical method is becoming increasingly more important. This paper provides the formal means to incorporate this uncertainty in a statistical model and its subsequent analysis. We show that a wide variety of existing solvers can be randomised, inducing a probability measure over the solutions of such differential equations. These measures exhibit contraction to a Dirac measure around the true unknown solution, where the rates of convergence are consistent with the underlying deterministic numerical method. Furthermore, we employ the method of modified equations to demonstrate enhanced rates of convergence to stochastic perturbations of the original deterministic problem. Ordinary differential equations and elliptic partial differential equations are used to illustrate the approach to quantify uncertainty in both the statistical analysis of the forward and inverse problems.

### Keywords

Numerical analysis Probabilistic numerics Inverse problems Uncertainty quantification### Mathematics Subject Classification

62F15 65N75 65L20## 1 Introduction

### 1.1 Motivation

The numerical analysis literature has developed a large range of efficient algorithms for solving ordinary and partial differential equations, which are typically designed to solve a single problem as efficiently as possible (Hairer et al. 1993; Eriksson 1996). When classical numerical methods are placed within statistical analysis, however, we argue that significant difficulties can arise as a result of errors in the computed approximate solutions. While the distributions of interest commonly do converge asymptotically as the solver mesh becomes dense [e.g. in statistical inverse problems (Dashti and Stuart 2016)], we argue that at a finite resolution, the statistical analyses may be vastly overconfident as a result of these unmodelled errors.

The purpose of this paper is to address these issues by the construction and rigorous analysis of novel probabilistic integration methods for both ordinary and partial differential equations. The approach in both cases is similar: we identify the key discretisation assumptions and introduce a local random field, in particular a Gaussian field, to reflect our uncertainty in those assumptions. The probabilistic solver may then be sampled repeatedly to interrogate the uncertainty in the solution. For a wide variety of commonly used numerical methods, our construction is straightforward to apply and provably preserves the order of convergence of the original method.

Furthermore, we demonstrate the value of these probabilistic solvers in statistical inference settings. Analytic and numerical examples show that using a classical non-probabilistic solver with inadequate discretisation when performing inference can lead to inappropriate and misleading posterior concentration in a Bayesian setting. In contrast, the probabilistic solver reveals the structure of uncertainty in the solution, naturally limiting posterior concentration as appropriate.

Construct randomised solvers of ODEs and PDEs using natural modification of popular, existing solvers.

Prove the convergence of the randomised methods and study their behaviour by showing a close link between randomised ODE solvers and stochastic differential equations (SDEs).

Demonstrate that these randomised solvers can be used to perform statistical analyses that appropriately consider solver uncertainty.

### 1.2 Review of existing work

The statistical analysis of models based on ordinary and partial differential equations is growing in importance and a number of recent papers in the statistics literature have sought to address certain aspects specific to such models, e.g. parameter estimation (Liang and Wu 2008; Xue et al. 2010; Xun et al. 2013; Brunel et al. 2014) and surrogate construction (Chakraborty et al. 2013). However, the statistical implications of the reliance on a numerical approximation to the actual solution of the differential equation have not been addressed in the statistics literature to date and this is the open problem comprehensively addressed in this paper. Earlier work in the literature including randomisation in the approximate integration of ordinary differential equations (ODEs) includes (Coulibaly and Lécot 1999; Stengle 1995). Our strategy fits within the emerging field known as Probabilistic Numerics (Hennig et al. 2015), a perspective on computational methods pioneered by Diaconis (1988), and subsequently (Skilling 1992). This framework recasts solving differential equations as a statistical inference problem, yielding a probability measure over functions that satisfy the constraints imposed by the specific differential equation. This measure formally quantifies the uncertainty in candidate solution(s) of the differential equation, allowing its use in uncertainty quantification (Sullivan 2016) or Bayesian inverse problems (Dashti and Stuart 2016).

A recent Probabilistic Numerics methodology for ODEs (Chkrebtii et al. 2013) [explored in parallel in Hennig and Hauberg (2014)] has two important shortcomings. First, it is impractical, only supporting first-order accurate schemes with a rapidly growing computational cost caused by the growing difference stencil [although Schober et al. (2014) extends to Runge–Kutta methods]. Secondly, this method does not clearly articulate the relationship between their probabilistic structure and the problem being solved. These methods construct a Gaussian process whose mean coincides with an existing deterministic integrator. While they claim that the posterior variance is useful, by the conjugacy inherent in linear Gaussian models, it is actually just an *a priori* estimate of the rate of convergence of the integrator, independent of the actual forcing or initial condition of the problem being solved. These works also describe a procedure for randomising the construction of the mean process, which bears similarity to our approach, but it is not formally studied. In contrast, we formally link each draw from our measure to the analytic solution.

Our motivation for enhancing inference problems with models of discretisation error is similar to the more general concept of model error, as developed by Kennedy and O’Hagan (2001). Although more general types of model error, including uncertainty in the underlying physics, are important in many applications, our focus on errors arising from the discretisation of differential equations leads to more specialised methods. Future work may be able to translate insights from our study of the restricted problem to the more general case. Existing strategies for discretisation error include empirically fitted Gaussian models for PDE errors (Kaipio and Somersalo 2007) and randomly perturbed ODEs (Arnold et al. 2013); the latter partially coincides with our construction, but our motivation and analysis are distinct. Recent work (Capistrán et al. 2013) uses Bayes factors to analyse the impact of discretisation error on posterior approximation quality. Probabilistic models have also been used to study error propagation due to rounding error; see Hairer et al. (2008).

### 1.3 Organisation

The remainder of the paper has the following structure: Sect. 2 introduces and formally analyses the proposed probabilistic solvers for ODEs. Section 3 explores the characteristics of random solvers employed in the statistical analysis of both forward and inverse problems. Then, we turn to elliptic PDEs in Sect. 4, where several key steps of the construction of probabilistic solvers and their analysis have intuitive analogues in the ODE context. Finally, an illustrative example of an elliptic PDE inference problem is presented in Sect. 5.^{1}

## 2 Probability measures via probabilistic time integrators

^{2}We let \(\varPhi _t\) denote the flow map for Eq. (1), so that \(u(t)=\varPhi _t\bigl (u(0)\bigr )\). The conditions ensuring that this solution exists will be formalised in Assumption 2, below.

Deterministic numerical methods for the integration of this equation on time interval [0, *T*] will produce an approximation to the equation on a mesh of points \(\{t_k=kh\}_{k=0}^{K}\), with \(Kh=T\), (for simplicity we assume a fixed mesh). Let \(u_k=u(t_k)\) denote the exact solution of (1) on the mesh and \(U_k\approx u_k\) denote the approximation computed using finite evaluations of *f*. Typically, these methods output a single discrete solution \(\{U_k\}_{k=0}^K\), often augmented with some type of error indicator, but do not statistically quantify the uncertainty remaining in the path.

Let \(X_{a,b}\) denote the Banach space \(C([a,b];\mathbb {R}^n)\). The exact solution of (1) on the time interval [0, *T*] may be viewed as a Dirac measure \(\delta _u\) on \(X_{0,T}\) at the element *u* that solves the ODE. We will construct a probability measure \(\mu ^h\) on \(X_{0,T}\), that is straightforward to sample from both on and off the mesh, for which *h* quantifies the size of the discretisation step employed, and whose distribution reflects the uncertainty resulting from the solution of the ODE. Convergence of the numerical method is then related to the contraction of \(\mu ^h\) to \(\delta _u\).

*h*, a class including all Runge–Kutta methods and Taylor methods for ODE numerical integration (Hairer et al. 1993). Our numerical methods will have the property that, on the mesh, they take the form

### 2.1 Probabilistic time integrators: general formulation

*g*(

*s*) through an underlying numerical method. A variety of traditional numerical algorithms may be derived based on approximation of

*g*(

*s*) by various simple deterministic functions \(g^h(s)\). The simplest such numerical method arises from invoking the Euler approximation that

*g*(

*s*) in (7) is \(U_{k+1}=U_{k}+hf(U_{k}).\) Now consider the more general one-step numerical method \(U_{k+1}=\varPsi _{h}(U_k).\) This may be derived by approximating

*g*(

*s*) in (7) byWe note that all consistent (in the sense of numerical analysis) one-step methods will satisfy

*g*(

*s*) in the interval \(s \in [t_k,t_{k+1}).\) We propose to approximate

*g*stochastically in order to represent this uncertainty, taking

*h*] with \(\chi _k \sim N(0,C^h)\).

^{3}

*h*at a prescribed rate (see Assumption 1), and also to ensure that \(\chi _k \in X_{0,h}\) almost surely. The functions \(\{\chi _k\}\) represent our uncertainty about the function

*g*. The corresponding numerical scheme arising from such an approximation is given by

*h*. In particular, in the simple one-dimensional case, \(\sigma \) would be given by \(\sqrt{C^{h}/h}\). Section 2.4 develops a more sophisticated connection that extends to higher order methods and off the mesh.

While we argue that the choice of modelling local uncertainty in the flow map as a Gaussian process is natural and analytically favourable, it is not unique. It is possible to construct examples where the Gaussian assumption is invalid; for example, when a highly inadequate time-step is used, a systemic bias may be introduced. However, in regimes where the underlying deterministic method performs well, the centred Gaussian assumption is a reasonable prior.

### 2.2 Strong convergence result

To prove the strong convergence of our probabilistic numerical solver, we first need two assumptions quantifying properties of the random noise and of the underlying deterministic integrator, respectively. In what follows we use \(\langle \cdot , \cdot \rangle \) and \(|\cdot |\) to denote the Euclidean inner product and norm on \(\mathbb {R}^n\). We denote the Frobenius norm on \(\mathbb {R}^{n \times n}\) by \(|\cdot |_\mathrm{F}\), and \(\mathbb {E}^h\) denotes expectation with respect to the i.i.d. sequence \(\{\chi _k\}\).

### Assumption 1

Let \(\xi _k(t)\!:=\int _0^t \chi _k(s)ds\) with \(\chi _k \sim N(0,C^h).\) Then there exists \(K>0, p \ge 1\) such that, for all \(t \in [0,h]\), \(\mathbb {E}^h|\xi _k(t) \xi _k(t)^T|_\mathrm{F}^2 \le Kt^{2p+1};\) in particular \(\mathbb {E}^h|\xi _k(t)|^2 \le Kt^{2p+1}.\) Furthermore, we assume the existence of matrix *Q*, independent of *h*, such that \(\mathbb {E}^h[\xi _k(h) \xi _k(h)^T]=Qh^{2p+1}.\)

Here, and in the sequel, *K* is a constant independent of *h*, but possibly changing from line to line. Note that the covariance kernel \(C^h\) is constrained, but not uniquely defined. We will assume the form of the constant matrix is \(Q=\sigma I\), and we discuss one possible strategy for choosing \(\sigma \) in Sect. 3.1. Section 2.4 uses a weak convergence analysis to argue that once *Q* is selected, the exact choice of \(C^h\) has little practical impact.

### Assumption 2

*f*and a sufficient number of its derivatives are bounded uniformly in \(\mathbb {R}^n\) in order to ensure that

*f*is globally Lipschitz and that the numerical flow map \(\varPsi _h\) has uniform local truncation error of order \(q+1\):

### Remark 2.1

We assume globally Lipschitz *f*, and bounded derivatives, in order to highlight the key probabilistic ideas, whilst simplifying the numerical analysis. Future work will address the non-trivial issue of extending of analyses to weaken these assumptions. In this paper, we provide numerical results indicating that a weakening of the assumptions is indeed possible.

### Theorem 2.2

This theorem implies that every probabilistic solution is a good approximation of the exact solution in both a discrete and continuous sense. Choosing \(p \ge q\) is natural if we want to preserve the strong order of accuracy of the underlying deterministic integrator; we proceed with the choice \(p=q\), introducing the maximum amount of noise consistent with this constraint.

### 2.3 Examples of probabilistic time integrators

^{4}Another useful example is the classical

*Runge–Kutta method*which defines a one-step numerical integrator as follows:

^{5}Theorem 2.2 shows that the error between the probabilistic integrator based on the classical Runge–Kutta method is, in the mean square sense, of the same order of accuracy as the deterministic classical Runge–Kutta integrator.

### 2.4 Backward error analysis

Backwards error analyses are useful tool for numerical analysis; the idea is to characterise the method by identifying a modified equation (dependent upon *h*) which is solved by the numerical method either exactly, or at least to a higher degree of accuracy than the numerical method solves the original equation. For our random ODE solvers, we will show that the modified equation is a stochastic differential equation (SDE) in which only the matrix *Q* from Assumption 1 enters; the details of the random processes used in our construction do not enter the modified equation. This universality property underpins the methodology we introduce as it shows that many different choices of random processes all lead to the same effective behaviour of the numerical method.

*W*, we introduce the operators \(\widehat{\mathcal {L}}^h\) and \(\widetilde{\mathcal {L}}^h\) so that, for all \(\phi \in C^{\infty }(\mathbb {R}^n,\mathbb {R})\),

*A*:

*B*\(= \text {trace}(A^{T} B)\).

*f*. With this choice of

*q*we obtain

### Assumption 3

*f*is in \(C^{\infty }\) and all its derivatives are uniformly bounded on \(\mathbb {R}^n\). Furthermore,

*f*is such that the operators \(e^{h\mathcal {L}}\) and \(e^{h\mathcal {L}^h}\) satisfy, for all \(\psi \in C^{\infty }(\mathbb {R}^n,\mathbb {R})\) and some \(L>0\),

### Remark 2.3

If \(p=q\) in what follows (our recommended choice) then the weak order of the method coincides with the strong order; however, measured relative to the modified equation, the weak order is then one plus twice the strong order. In this case, the second part of Theorem 2.2 gives us the first weak order result in Theorem 2.4. Additionally, Assumption 3 is stronger than we need, but allows us to highlight probabilistic ideas whilst keeping overly technical aspects of the numerical analysis to a minimum. More sophisticated, but structurally similar, analysis would be required for weaker assumptions on *f*. Similar considerations apply to the assumptions on \(\phi \).

### Theorem 2.4

*u*and \(\tilde{u}\) solve (1) and (17), respectively.

### Example 2.5

These results allow us to constrain the behaviour of the randomised method using limited information about the covariance structure, \(C^h\). The randomised solution converges weakly, at a high rate, to a solution that only depends on *Q*. Hence, we conclude that the practical behaviour of the solution is only dependent upon *Q*, and otherwise, \(C^h\) may be any convenient kernel. With these results now available, the following section provides an empirical study of our probabilistic integrators.

## 3 Statistical inference and numerics

*V*,

*R*) and parameters (

*a*,

*b*,

*c*), governed by the equations

*f*, something we will address in future work.

### 3.1 Calibrating forward uncertainty propagation

*V*species trajectories from the measure associated with the probabilistic Euler solver with \(p=q=1\), for various values of the step-size and fixed \(\sigma =0.1\). The random draws exhibit non-Gaussian structure at large step-size and clearly contract towards the true solution.

Although the rate of contraction is governed by the underlying deterministic method, the scale parameter, \(\sigma \), completely controls the apparent uncertainty in the solver.^{6} This tuning problem exists in general, since \(\sigma \) is problem dependent and cannot obviously be computed analytically.

Therefore, we propose to calibrate \(\sigma \) to replicate the amount of error suggested by classical error indicators. In the following discussion, we often explicitly denote the dependence on *h* and \(\sigma \) with superscripts, hence the probabilistic solver is \(U^{h, \sigma }\) and the corresponding deterministic solver is \(U^{h, 0}\). Define the deterministic error as \(e(t) = u(t) - U^{h, 0}(t)\). Then we assume there is some computable error indicator \(E(t) \approx e(t)\), defining \(E_k = E(t_k)\). The simplest error indicators might compare differing step-sizes, \(E(t) = U^{h, 0}(t) - U^{2h, 0}(t)\), or differing order methods, as in a Runge–Kutta 4–5 scheme.

We proceed by constructing a probability distribution \(\pi (\sigma )\) that is maximised when the desired matching occurs. We estimate this scale matching by comparing: (i) a Gaussian approximation of our random solver at each step *k*, \( \tilde{\mu }_k^{h,\sigma } = \mathcal {N}(\mathbb {E} (U^{h, \sigma }_k), \mathbb {V}(U^{h, \sigma }_k )); \) and (ii) the natural Gaussian measure from the deterministic solver, \(U^{h, 0}_k\), and the available error indicator, \(E_k\), \( \nu _k^\sigma = \mathcal {N}(U^{h, 0}_k, (E_k)^2). \) We construct \(\pi (\sigma )\) by penalising the distance between these two normal distributions at every step: \( \pi (\sigma ) \propto \prod _k \exp \left( -d(\tilde{\mu }_k^{h,\sigma }, \nu _k^\sigma ) \right) \). We find that the Bhattacharyya distance (closely related to the Hellinger metric) works well (Kailath 1967), since it diverges quickly if either the mean or variance differs. The density can be easily estimated using Monte Carlo. If the ODE state is a vector, we take the product of the univariate Bhattacharyya distances. Note that this calibration depends on the initial conditions and any parameters of the ODE.

Returning to the FitzHugh–Nagumo model, sampling from \(\pi (\sigma )\) yields strongly peaked, uni-modal posteriors, hence we proceed using \(\sigma ^*= \hbox {arg max}\pi (\sigma )\). We examine the quality of the scale matching by plotting the magnitudes of the random variation against the error indicator in Fig. 4, observing good agreement of the marginal variances. Note that our measure still reveals non-Gaussian structure and correlations in time not revealed by the deterministic analysis. As described, this procedure requires fixed inputs to the ODE, but it is straightforward to marginalise out a prior distribution over input parameters.

### 3.2 Bayesian posterior inference problems

Given the calibrated probabilistic ODE solvers described above, let us consider how to incorporate them into inference problems.

Assume we are interested in inferring parameters of the ODE given noisy observations of the state. Specifically, we wish to infer parameters \(\theta \in \mathbb {R}^d\) for the differential equation \(\dot{u} = f(u, \theta )\), with fixed initial conditions \(u(t=0) = u_0 \) (a straightforward modification may include inference on initial conditions). Assume we are provided with data \(d \in \mathbb {R}^m\), \(d_j = u(\tau _j) + \eta _j\) at some collection of times \(\tau _j\), corrupted by i.i.d. noise, \(\eta _j \sim \mathcal {N}(0, \varGamma )\). If we have prior \(\mathbb {Q}(\theta )\), the posterior we wish to explore is, \( \mathbb {P}(\theta \mid d) \propto \mathbb {Q}(\theta ) \mathcal {L}(d,u(\theta )), \) where density \(\mathcal {L}\) compactly summarises this likelihood model.

The standard computational strategy is to simply replace the unavailable trajectory *u* with a numerical approximation, inducing approximate posterior \( \mathbb {P}^{h, 0}(\theta \mid d) \propto \mathbb {Q}(\theta ) \mathcal {L}(d,U^{h, 0}(\theta )). \) Informally, this approximation will be accurate when the error in the numerical solver is small compared to \(\varGamma \) and often converges formally to \(\mathbb {P}(\theta \mid d)\) as \(h \rightarrow 0\) (Dashti and Stuart 2016). However, highly correlated errors at finite *h* can have substantial impact.

In this work, we are concerned about the undue optimism in the predicted variance, that is, when the posterior concentrates around an arbitrary parameter value even though the deterministic solver is inaccurate and is merely able to reproduce the data by coincidence. The conventional concern is that any error in the solver will be transferred into posterior bias. Practitioners commonly alleviate both concerns by tuning the solver to be nearly perfect, however, we note that this may be computationally prohibitive in many contemporary statistical applications.

Notice that as \(h \rightarrow 0\), both the measures \(\mathbb {P}^{h, 0}\) and \(\mathbb {P}^{h, \sigma }\) typically collapse to the analytic posterior, \(\mathbb {P}\), hence both methods are correct. We do not expect the bias of \(\mathbb {P}^{h, \sigma }\) to be improved, since all of the averaged trajectories are of the same quality as the deterministic solver in \(\mathbb {P}^{h, 0}\). We now construct an analytic inference problem demonstrating these behaviours.

### Example 3.1

*k*has effective variance

*k*. If a Gaussian prior \(\mathcal {N}(m_0,\zeta _0^2)\) is specified for \(u_0\), then the posterior is \(\mathcal {N}(m,\zeta ^2)\), where

*h*is fixed and \(k \rightarrow \infty \). For the standard Euler method, where \(\gamma _h=\gamma \), we see that \(\zeta ^2 \rightarrow 0\), whilst \(m \asymp \bigl ((1+h\lambda )^{-1}e^{h\lambda }\bigr )^{k} u_0^\dagger \). Thus the inference scheme becomes increasingly certain of the wrong answer: the variance tends to zero and the mean tends to infinity.

*h*, large

*k*asymptotics are

We take an empirical Bayes approach to choosing \(\sigma \), that is, using a constant, fixed value \(\sigma ^*= \hbox {arg max}\pi (\sigma )\), chosen before the data are observed. Joint inference of the parameters and the noise scale suffer from well-known MCMC mixing issues in Bayesian hierarchic models. To handle the unknown parameter \(\theta \), we can marginalise it out using the prior distribution, or in simple problems, it may be reasonable to choose a fixed representative value.

We now return to the FitzHugh–Nagumo model; given fixed initial conditions, we attempt to recover parameters \(\theta = (a,b,c)\) from observations of both species at times \(\tau = 1,2,\ldots ,40\). The priors are log-normal, centred on the true value with unit variance, and with observational noise \(\varGamma = 0.001\). The data are generated from a high-quality solution, and we perform inference using Euler integrators with various step-sizes, \(h \in \{0.005, 0.01, 0.02, 0.05, 0.1\}\), spanning a range of accurate and inaccurate integrators.

## 4 Probabilistic solvers for partial differential equations

We now turn to present a framework for probabilistic solutions to partial differential equations, working within the finite element setting. Our discussion closely resembles the ODE case, except that now we randomly perturb the finite element basis functions.

### 4.1 Probabilistic finite element method for variational problems

*U*:

*h*is introduced to measure the diameter of the finite elements. We will also assume that

In order to account for uncertainty introduced by the numerical method, we will assume that each basis function \(\phi _j\) can be split into the sum of a systematic part \(\phi ^{\mathsf {s}}_j\) and random part \(\phi ^{\mathsf {r}}_j\), where both \(\phi _j\) and \(\phi ^{\mathsf {s}}_j\) satisfy the nodal property (32), hence \(\phi ^{\mathsf {r}}_j(x_k) = 0\). Furthermore, we assume that each \(\phi ^{\mathsf {r}}_j\) shares the same compact support as the corresponding \(\phi ^{\mathsf {s}}_j\), preserving the sparsity structure of the underlying deterministic method.

### 4.2 Strong convergence result

As in the ODE case, we begin our convergence analysis with assumptions constraining the random perturbations and the underlying deterministic approximation. The bilinear form \(a(\cdot ,\cdot )\) is assumed to induce an inner product, and then norm via \(\Vert \cdot \Vert _a^2=a(\cdot ,\cdot );\) furthermore, we assume that this norm is equivalent to the norm on \(\mathcal {V}\). Throughout, \(\mathbb {E}^h\) denotes expectation with respect to the random basis functions.

### Assumption 4

The collection of random basis functions \(\{\phi ^{\mathsf {r}}_j\}_{j=1}^J\) are independent, zero-mean, Gaussian random fields, each of which satisfies \(\phi ^{\mathsf {r}}_j(x_k)=0\) and shares the same support as the corresponding systematic basis function \(\phi ^{\mathsf {s}}_j.\) For all *j*, the number of basis functions with index *k* which share the support of the basis functions with index *j* is bounded independently of *J*, the total number of basis functions. Furthermore, the basis functions are scaled so that \(\sum _{j=1}^J \mathbb {E}^h\Vert \phi ^{\mathsf {r}}_j\Vert _a^2 \le Ch^{2p}.\)

### Assumption 5

The true solution *u* of problem (4.1) is in \(L^{\infty }(D).\) Furthermore, the standard deterministic interpolant of the true solution, defined by \(v^{\mathsf {s}}:=\sum _{j=1}^J u(x_j)\phi ^{\mathsf {s}}_j,\) satisfies \(\Vert u-v^{\mathsf {s}}\Vert _a \le Ch^q.\)

### Theorem 4.1

As for ODEs, the solver accuracy is limited by either the amount of noise injected or the convergence rate of the underlying deterministic method, making \(p=q\) the natural choice.

### 4.3 Poisson solver in two dimensions

*H*to be the space \(L^2(D)\) with inner product \(\langle \cdot , \cdot \rangle \) and resulting norm \(| \cdot |^2=\langle \cdot , \cdot \rangle .\) The weak formulation of the problem has the form (4.1) with

*h*measures the width of the triangulation of the finite element mesh. Assuming that \(f \in H\) it follows that \(u \in H^2(D)\) and that

*H*. Such a result can be shown to hold in our setting, following the usual arguments for the Aubin–Nitsche trick Johnson (2012), which is available in the supplementary materials.

## 5 PDE inference and numerics

We now perform numerical experiments using probabilistic solvers for elliptic PDEs. Specifically, we perform inference in a 1D elliptic PDE, \( \nabla \cdot (\kappa (x) \nabla u(x)) = 4x \) for \(x \in [0,1]\), given boundary conditions \(u(0) = 0, u(1) = 2\). We represent \(\log \kappa \) as piecewise constant over ten equal-sized intervals; the first, on \(x \in [0,.1)\) is fixed to be one to avoid non-identifiability issues, and the other nine are given a prior \(\theta _i = \log \kappa _i \sim \mathcal {N}(0,1)\). Observations of the field *u* are provided at \(x=(0.1, 0.2, \ldots 0.9)\), with i.i.d. Gaussian error, \(\mathcal {N}(0, 10^{-5})\); the simulated observations were generated using a fine grid and quadratic finite elements, then perturbed with error from this distribution.

Again we investigate the posterior produced at various grid sizes, using both deterministic and randomised solvers. The randomised basis functions are draws from a Brownian bridge conditioned to be zero at the nodal points, implemented in practice with a truncated Karhunen–Loève expansion. The covariance operator may be viewed as a fractional Laplacian, as discussed in Lindgren et al. (2011). The scaling \(\sigma \) is again determined by maximising the distribution described in Sect. 3.1, where the error indicator compares linear to quadratic basis functions, and we marginalise out the prior over the \(\kappa _i\) values.

## 6 Conclusions

We have presented a computational methodology, backed by rigorous analysis, which enables quantification of the uncertainty arising from the finite-dimensional approximation of solutions of differential equations. These methods play a natural role in statistical inference problems as they allow for the uncertainty from discretisation to be incorporated alongside other sources of uncertainty such as observational noise. We provide theoretical analyses of the probabilistic integrators which form the backbone of our methodology. Furthermore we demonstrate empirically that they induce more coherent inference in a number of illustrative examples. There are a variety of areas in the sciences and engineering which have the potential to draw on the methodology introduced including climatology, computational chemistry, and systems biology.

Our key strategy is to make assumptions about the *local* behaviour of solver error, which we have assumed to be Gaussian, and to draw samples from the *global* distribution of uncertainty over solutions that results. Section 2.4 describes a universality result, simplifying task of choosing covariance kernels in practice, within the family of Gaussian processes. However, assumptions of Gaussian error, even locally, may not be appropriate in some cases, or may neglect important domain knowledge. Our framework can be extended in future work to consider alternate priors on the error, for example, multiplicative or non-negative errors.

Our study highlights difficult decisions practitioners face, regarding how to expend computational resources. While standard techniques perform well when the solver is highly converged, our results show standard techniques can be disastrously wrong when the solver is not converged. As the measure of convergence is not a standard numerical analysis one, but a statistical one, we have argued that it can be surprisingly difficult to determine in advance which regime a particular problem resides in. Therefore, our practical recommendation is that the lower cost of the standard approach makes it preferable when it is certain that the numerical method is strongly converged with respect to the statistical measure of interest. Otherwise, the randomised method we propose provides a robust and consistent approach to address the error introduced into the statistical task by numerical solver error. In difficult problem domains, such as numerical weather prediction, the focus has typically been on reducing the numerical error in each solver run; techniques such as these may allow a difference balance between numerical and statistical computing effort in the future.

The prevailing approach to model error described in Kennedy and O’Hagan (2001) is based on a non-intrusive methodology where the effect of model discrepancy is allowed for in observation space. Our intrusive randomisation of deterministic methods for differential equations can be viewed as a highly specialised discrepancy model, designed using our intimate knowledge of the structure and properties of numerical methods. In this vein, we intend to extend this work to other types of model error, where modifying the internal structure of the models can produce computationally and analytically tractable measures of uncertainty which perform better than non-intrusive methods. Our future work will continue to study the computational challenges and opportunities presented by these techniques.

## Footnotes

- 1.
Supplementary materials and code are available online: http://www2.warwick.ac.uk/pints.

- 2.
To simplify our discussion we assume that the ODE is autonomous, that is,

*f*(*u*) is independent of time. Analogous theory can be developed for time-dependent forcing. - 3.
We use \(\chi _k \sim N(0,C^h)\) to denote a zero-mean Gaussian process defined on [0,

*h*] with a covariance kernel \(\mathrm {cov}(\chi _k(t),\chi _k(s)) \triangleq C^h(t,s)\). - 4.
An additional example of a probabilistic integrator, based on a Ornstein–Uhlenbeck process, is available in the supplementary materials.

- 5.
Implementing Eq. 10 is trivial, since it simply adds an appropriately scaled Gaussian random number after each classical Runge–Kutta step.

- 6.
Recall that throughout we assume that, within the context of Assumption 1, \(Q=\sigma I\). More generally it is possible to calibrate an arbitrary positive semi-definite

*Q*.

## Notes

### Acknowledgments

The authors gratefully acknowledge support from EPSRC Grant CRiSM EP/D002060/1, EPSRC Established Career Research Fellowship EP/J016934/2, EPSRC Programme Grant EQUIP EP/K034154/1, and Academy of Finland Research Fellowship 266940. Konstantinos Zygalakis was partially supported by a grant from the Simons Foundation. Part of this work was done during the author’s stay at the Newton Institute for the program “Stochastic Dynamical Systems in Biology: Numerical Methods and Applications.;”

## Supplementary material

### References

- Arnold, A., Calvetti, D., Somersalo, E.: Linear multistep methods, particle filtering and sequential Monte Carlo. Inverse Probl.
**29**(8), 085,007 (2013)MathSciNetCrossRefMATHGoogle Scholar - Brunel, N.J.B., Clairon, Q., dAlch Buc, F.: Parametric estimation of ordinary differential equations with orthogonality conditions. J. Am. Stat. Assoc.
**109**(505), 173–185 (2014). doi:10.1080/01621459.2013.841583 MathSciNetCrossRefGoogle Scholar - Capistrán, M., Christen, J.A., Donnet, S.: Bayesian analysis of ODE’s: solver optimal accuracy and Bayes factors (2013). arXiv:1311.2281
- Chakraborty, A., Mallick, B.K., Mcclarren, R.G., Kuranz, C.C., Bingham, D., Grosskopf, M.J., Rutter, E.M., Stripling, H.F., Drake, R.P.: Spline-based emulators for radiative shock experiments with measurement error. J. Am. Stat. Assoc.
**108**(502), 411–428 (2013). doi:10.1080/01621459.2013.770688 MathSciNetCrossRefMATHGoogle Scholar - Chkrebtii, O.A., Campbell, D.A., Girolami, M.A., Calderhead, B.: Bayesian uncertainty quantification for differential equations (2013). arXiv:1306.2365
- Coulibaly, I., Lécot, C.: A quasi-randomized Runge–Kutta method. Math. Comput. Am. Math. Soc.
**68**(226), 651–659 (1999)MathSciNetCrossRefMATHGoogle Scholar - Dashti, M., Stuart, A.: The Bayesian approach to inverse problems. In: Ghanem, R., Higdon, D., Owhadi, H. (eds.) Handbook of Uncertainty Quantification. Springer, New York (2016)Google Scholar
- Diaconis, P.: Bayesian numerical analysis. Stat. Decision Theory Relat. Top. IV
**1**, 163–175 (1988)MathSciNetCrossRefMATHGoogle Scholar - Eriksson, K.: Computational Differential Equations, vol. 1. Cambridge University Press, Cambridge (1996). https://books.google.co.uk/books?id=gbK2cUxVhDQC
- Haario, H., Laine, M., Mira, A., Saksman, E.: DRAM: efficient adaptive MCMC. Stat. Comput.
**16**(4), 339–354 (2006)MathSciNetCrossRefGoogle Scholar - Hairer, E., Nørsett, S., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff Problems. Solving Ordinary Differential Equations, Springer, New York (1993). https://books.google.co.uk/books?id=F93u7VcSRyYC
- Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, vol. 31. Springer, New York (2006)MATHGoogle Scholar
- Hairer, E., McLachlan, R.I., Razakarivony, A.: Achieving Brouwer’s law with implicit Runge–Kutta methods. BIT Numer. Math.
**48**(2), 231–243 (2008)MathSciNetCrossRefMATHGoogle Scholar - Hennig, P., Hauberg, S.: Probabilistic solutions to differential equations and their application to Riemannian statistics. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 33 (2014)Google Scholar
- Hennig, P., Osborne, M.A., Girolami, M.: Probabilistic numerics and uncertainty in computations. Proceedings of the Royal Society A (2015) (in press)Google Scholar
- Johnson, C.: Numerical Solution of Partial Differential Equations by the Finite Element Method. Dover Books on Mathematics Series, Dover Publications, New York (2012). Incorporated, https://books.google.co.uk/books?id=PYXjyoqy5qMC
- Kailath, T.: The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Commun. Technol.
**15**(1), 52–60 (1967). doi:10.1109/TCOM.1967.1089532 CrossRefGoogle Scholar - Kaipio, J., Somersalo, E.: Statistical inverse problems: discretization, model reduction and inverse crimes. J. Comput. Appl. Math.
**198**(2), 493–504 (2007)MathSciNetCrossRefMATHGoogle Scholar - Kennedy, M.C., O’Hagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**63**(3), 425–464 (2001)MathSciNetCrossRefMATHGoogle Scholar - Liang, H., Wu, H.: Parameter estimation for differential equation models using a framework of measurement error in regression models. J. Am. Stat. Assoc.
**103**(484), 1570–1583 (2008). doi:10.1198/016214508000000797 MathSciNetCrossRefMATHGoogle Scholar - Lindgren, F., Rue, H., Lindström, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**73**(4), 423–498 (2011)MathSciNetCrossRefMATHGoogle Scholar - Medina-Aguayo, F.J., Lee, A., Roberts, G.O.: Stability of Noisy Metropolis-Hastings (2015). arxiv:1503.07066
- Ramsay, J.O., Hooker, G., Campbell, D., Cao, J.: Parameter estimation for differential equations: a generalized smoothing approach. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**69**(5), 741–796 (2007)MathSciNetCrossRefGoogle Scholar - Schober, M., Duvenaud, D.K., Hennig, P.: Probabilistic ODE solvers with Runge–Kutta means. In: Advances in Neural Information Processing Systems, pp. 739–747 (2014)Google Scholar
- Skilling, J.: Bayesian solution of ordinary differential equations. In: Maximum Entropy and Bayesian Methods, pp 23–37. Springer, New York (1992)Google Scholar
- Stengle, G.: Error analysis of a randomized numerical method. Numer. Math.
**70**(1), 119–128 (1995)MathSciNetCrossRefMATHGoogle Scholar - Sullivan, T.: Uncertainty Quantification. Springer, New York (2016)MATHGoogle Scholar
- Xue, H., Miao, H., Wu, H.: Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Ann. Stat.
**38**(4), 2351–2387 (2010)Google Scholar - Xun, X., Cao, J., Mallick, B., Maity, A., Carroll, R.J.: Parameter estimation of partial differential equation models. J. Am. Stat. Assoc.
**108**(503), 1009–1020 (2013)MathSciNetCrossRefMATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.