Improved bridge constructs for stochastic differential equations

We consider the task of generating discrete-time realisations of a nonlinear multivariate diffusion process satisfying an Itô stochastic differential equation conditional on an observation taken at a fixed future time-point. Such realisations are typically termed diffusion bridges. Since, in general, no closed form expression exists for the transition densities of the process of interest, a widely adopted solution works with the Euler–Maruyama approximation, by replacing the intractable transition densities with Gaussian approximations. However, the density of the conditioned discrete-time process remains intractable, necessitating the use of computationally intensive methods such as Markov chain Monte Carlo. Designing an efficient proposal mechanism which can be applied to a noisy and partially observed system that exhibits nonlinear dynamics is a challenging problem, and is the focus of this paper. By partitioning the process into two parts, one that accounts for nonlinear dynamics in a deterministic way, and another as a residual stochastic process, we develop a class of novel constructs that bridge the residual process via a linear approximation. In addition, we adapt a recently proposed construct to a partial and noisy observation regime. We compare the performance of each new construct with a number of existing approaches, using three applications.


Introduction
Diffusion processes satisfying stochastic differential equations (SDEs) provide a flexible class of models for describing many continuous-time physical processes.Some application areas and indicative references include finance, e.g.Kalogeropoulos et al. (2010), Stramer et al. (2010), reaction networks, e.g.Fuchs (2013), Golightly et al. (2015) and population dynamics, e.g.Heydari et al. (2014).Fitting such models to data observed at discrete-times can be problematic since the transition densities of the diffusion process are likely to be intractable.A review of inferential methods for diffusions can be found in Fuchs (2013).A widely adopted solution is to approximate the unavailable transition densities either analytically (Aït-Sahalia, 2002, 2008) or numerically (Pedersen, 1995;Elerian et al., 2001;Eraker, 2001;Roberts and Stramer, 2001).Within the Bayesian paradigm, the numerical approach can be seen as a data augmentation problem.The simplest implementation augments low-frequency data by introducing intermediate time-points between observation times.An Euler-Maruyama scheme is then applied by approximating the transition densities over the induced discretisation as Gaussian.Computationally intensive algorithms such as Markov chain Monte Carlo (MCMC) are then used to integrate over the uncertainty associated with the missing data.The key challenges of designing such an MCMC scheme include overcoming dependence between the parameters and missing data (first highlighted as a problem by Roberts and Stramer (2001)) and overcoming dependence between successive values of the missing data.Dealing with the latter requires repeatedly generating realisations known as diffusion bridges from an approximation of the conditioned process.Methods built upon exact simulation, that avoid use of the Euler-Maruyama approximation and the associated discretisation error, have been proposed by Beskos et al. (2006) (see also Beskos et al. (2009)).However, these exact methods are limited to diffusions which can be transformed to have unit diffusion coefficient, known as reducible diffusions.
Designing bridge constructs for irreducible, multivariate diffusions is a challenging problem and has received much attention in recent literature.The simplest approach (see e.g.Pedersen (1995)) is based on the forward dynamics of the diffusion process and generates a bridge by sampling iteratively from the Euler-Maruyama approximation of the unconditioned SDE.This myopic approach induces a discontinuity at the observation time (as the discretisation gets finer) and is well known to lead to low Metropolis-Hastings acceptance rates.The modified diffusion bridge (MDB) construct of Durham and Gallant (2002) (see also extensions to the partial and noisy observation case in Golightly and Wilkinson (2008)) pushes the bridge process towards the observation in a linear way and provides the optimal sampling method when the drift and diffusion coefficients of the SDE are constant (Stramer and Yan, 2006).However, this construct is less effective when the process exhibits nonlinear dynamics.Several approaches have been proposed to overcome this problem.For example, Lindström (2012) (see also Fearnhead (2008) for a similar approach) combines the Pedersen and MDB approaches, with a tuning parameter governing the precise dynamics of the resulting sampler.Del Moral and Murray (2014) (see also Lin et al. (2010)) use a sequential Monte Carlo scheme to generate realisations according to the forward dynamics, pushing the resulting trajectories towards the observation using a sequence of reweighting steps.Schauer et al. (2016) combine the ideas of Delyon and Hu (2006) and Clark (1990) to obtain a bridge based on the addition of a guiding term to the drift of the process under consideration.The guiding term is derived using a tractable approximation of the target process.

Contributions and organisation of the paper
Our contribution is the development of a novel class of bridge constructs that are computationally and statistically efficient, simple to implement, and can be applied in scenarios where only partial and noisy measurements of the system are available.Essentially, the process is partitioned into two parts, one that accounts for nonlinear dynamics in a deterministic way, and another as a residual stochastic process.A bridge construct is obtained for the target process by applying the MDB sampler of Durham and Gallant (2002) to the end-point conditioned residual process.We consider two implementations of this approach.Firstly, we use the bridge introduced by Whitaker et al. (2015) that constructs the residual process by subtracting the solution of an ordinary differential equation (ODE) system based on the drift, from the target process.Secondly, we recognise that the intractable SDE governing the residual process can be approximated by a tractable process.We therefore extend the first approach by additionally subtracting the expectation of the approximate residual process and bridging the remainder with the MDB sampler.In addition, we adapt the guided proposal proposed by Schauer et al. (2016) to a partial and noisy observation regime.
We evaluate the performance of each bridge construct (as well as the constructs proposed by Durham and Gallant (2002) and Lindström (2012)) using three examples: a simple birth-death model, a Lotka-Volterra system and a model of aphid growth.
The remainder of this article is organised as follows.Section 2 provides a brief introduction to the problem of sampling conditioned SDEs and examines two previously proposed approaches.In Section 3 we describe a novel class of bridge constructs and adapt an existing approach to a more general observation regime.Applications are considered in Section 4 and a discussion is provided in Section 5.

Sampling conditioned SDEs
Consider a continuous-time d-dimensional Itô process {X t , t ≥ 0} governed by the SDE paramaterised by θ = (θ 1 , . . ., θ p ) of the form Here, α is a d-vector of drift functions, the diffusion matrix β is a d × d positive definite matrix with a square root representation √ β such that √ β √ β = β and W t is a d-vector of (uncorrelated) standard Brownian motion processes.We assume that α and β are sufficiently regular so that the SDE has a weak non-explosive solution (Øksendal, 2003).
For tractability, we make the same assumption as Golightly and Wilkinson (2008), Golightly and Wilkinson (2011), Picchini (2014) and Lu et al. (2015) among others, that the process is observed at t = T according to Here, Y T is a d o -vector, F is a constant d × d o matrix and T is a random d o -vector for some d o ≤ d.This flexible setup allows for only observing a subset of components.For simplicity we also assume that the process is known exactly at t = 0.This is the case when a diffusion process is observed completely and without error.In the case of partial and/or noisy observations, typically the initial position is an unknown parameter in an MCMC scheme and a new bridge is created at each iteration conditional on the current parameter values, so in terms of the bridge, the initial position is effectively known.The complication of multiple partial and/or noisy observations is discussed in Section 5.
Our aim is to generate discrete-time realisations of X t conditional on x 0 and y T .To this end, we partition [0, T ] as giving m intervals of equal length ∆τ = T /m.Since, in general, the form of the SDE in (1) will not permit an analytic solution, we work with the Euler-Maruyama approximation which gives the change in the process over a small interval of length ∆τ as a Gaussian random vector.Specifically, we have that where ∆W τ k ∼ N (0, ∆τ I d ) and I d is the d × d identity matrix.The continuous-time conditioned process is then approximated by the discrete-time skeleton bridge, with the latent values x (0,T ] = (x τ 1 , . . ., x τm = x T ) having the (posterior) density where is the transition density under the Euler-Maruyama approximation, π(y T |x T , Σ) = N (y T ; F x T , Σ) and N (•; m, V ) denotes the multivariate Gaussian density with mean vector m and variance matrix V .In the special case where x T is known (so that y T = x T and F = I d ), the latent values x (0,T ) = (x τ 1 , . . ., x τ m−1 ) have the density For nonlinear forms of the drift and diffusion coefficients, the products in (3) and (4) will be intractable and samples can be generated via computationally intensive algorithms such as Markov chain Monte Carlo or importance sampling.We focus on the former but note that in either case, the efficiency of the algorithm will depend on the proposal mechanism used to generate the bridge.
A common approach to constructing an efficient proposal is to factorise the target in (3) as (5) The density in (4) can be factorised in a similar manner.This suggests seeking proposal densities of the form q(x τ k+1 |x τ k , y T , θ, Σ) which aim to approximate the intractable constituent densities in (5).In what follows, we consider some existing approaches for generating bridges via approximation of π(x τ k+1 |x τ k , y T , θ, Σ) before outlining our contribution.For each bridge, the proposal densities take the form and our focus is on the choice of µ(•) and Ψ(•).For simplicity and where possible, we drop the parameters θ and Σ from the notation as they remain fixed throughout.

Myopic simulation
Ignoring the information in the observation y T and simply applying the Euler-Maruyama approximation over each interval of length ∆τ leads to a proposal density of the form given by ( 6) with µ EM (x τ k ) = α(x τ k ) and Ψ EM (x τ k ) = β(x τ k ).Sampling iteratively according to (6) for k = 0, 1, . . ., m − 1 gives a proposed bridge which we denote by x * (0,T ] .The Metropolis-Hastings (MH) acceptance probability for a move from .
This strategy is likely to work well provided that the observation y T is not particularly informative, that is, when the measurement error dominates the intrinsic stochasticity of the process.However, as Σ is reduced, the MH acceptance rate decreases.A related approach can be found in Pedersen (1995), where it is assumed that x T is known.In this case, a move from x (0,T ) to x * (0,T ) is accepted with probability which tends to 0 as m → ∞ (or equivalently, ∆τ → 0).

Modified diffusion bridge
For known x T , Durham and Gallant (2002) derive a linear Gaussian approximation of π(x τ k+1 |x τ k , x T ), leading to a sampler known as the modified diffusion bridge (MDB).Extensions to the partial and noisy observation regime are considered in Golightly and Wilkinson (2008).In brief, the joint distribution of X τ k+1 and Y T (conditional on x τ k ) is approximated by In the case of no measurement error and observation of all components (so that x T is known), ( 7) and (8) become

Connection with continuous-time conditioned processes
Consider the case of no measurement error and full observation of all components.The SDE satisfied by the conditioned process {X t , t ∈ [0, T ]}, takes the form where the drift is α See for example chap.IV.39 of Rogers and Williams (2000) for a derivation.Note that p(x T |x t ) denotes the (intractable) transition density of the unconditioned process defined in (1).Approximating α(X t ) and β(X t ) in ( 1) by the constants α(x T ) and β(x T ) yields a process for which p(x T |x t ) is tractable.The corresponding conditioned process satisfies Use of (11) as a proposal process has been justified by Delyon and Hu (2006) (see also Stramer and Yan (2006), Marchand (2011) and Papaspiliopoulos et al. (2013)), who show that the distribution of the target process (conditional on x T ) is absolutely continuous with respect to the distribution of the solution to (11).As discussed by Papaspiliopoulos et al. (2013), it is impossible to simulate exact (discrete-time) realisations of (11) unless β(•) is constant.They also note that performing a local linearisation of (11) according to Shoji and Ozaki (1998) (see also Shoji (2011)) gives a tractable process with transition density that is, the transition density of the modified diffusion bridge discussed in the previous section.Plainly, taking the Euler-Maruyama approximation of (11) yields the MDB construct, albeit without the time dependent multiplier of β(x τ k ) in the variance.As observed by Durham and Gallant (2002) and discussed in Papaspiliopoulos and Roberts (2012) and Papaspiliopoulos et al. (2013), the inclusion of the time dependent multiplier can lead to improved empirical performance.
Unfortunately, the MDB is only efficient when the drift of ( 1) is approximately constant.When this is not the case, so that realisations of the SDE started from the same point exhibit strong and similar non-linearity over the inter-observation time, the modified diffusion bridge is likely to be unsatisfactory.

Lindström bridge
A bridge construct that combines the myopic sampler with the MDB is proposed in Lindström (2012), for the special case of known x T .Extending the sampler to the observation scenario in (2) is straightforward.Whereas the MDB approximates the variance of where C(∆ k+1 ) 2 is the squared bias of X T |x τ k+1 using a single Euler-Maruyama time-step and C is an unknown matrix.By assuming that the squared bias is a fraction γ of the variance over an interval of length ∆τ , a heuristic choice of C is given by with γ > 0. This particular choice of C Heur ensures that Var(Y T |x τ k ) is a positive definite matrix.
The joint distribution of X τ k+1 and Y T (conditional on x τ k ) is then approximated by In the case of no measurement error and observation of all components, ( 12) and ( 13) become where The Lindström bridge can therefore be seen as a convex combination of the MDB and myopic samplers, with γ = 0 giving the MDB and γ = ∞ giving the myopic approach.In practice, Lindström (2012) suggests that γ ∈ [0.01, 1], given that these values have proved successful in simulation experiments.Note also that for a fixed γ, if T − τ k+1 ∆τ then w γ k 0 and the myopic sampler dominates.However, as τ k+1 approaches T , w γ k approaches 1 and the LB is dominated by the MDB.
Whilst the LB attempts to account for nonlinear dynamics by combining the MDB with the myopic approach, having to specify a model-dependent tuning parameter is unsatisfactory, since different choices of γ will lead to different properties of the proposed bridges.Moreover, the link between the regularised sampler and the continuous-time conditioned process is unclear.

Improved bridge constructs
In this section we describe a novel class of bridge constructs that require no tuning parameters, are simple to implement (even when only a subset of components are observed with Gaussian noise) and can account for nonlinear dynamics driven by the drift.In addition, we discuss the recently proposed bridging strategy of Schauer et al. (2016) and describe an implementation method in the case of partial observation with additive Gaussian measurement error.

Bridges based on residual processes
Suppose that X t is partitioned as X t = ζ t + R t where {ζ t , t ≥ 0} is a deterministic process and {R t , t ≥ 0} is a residual stochastic process, satisfying We then aim to choose ζ t (and therefore f (•)) to adequately account for nonlinear dynamics (so that the drift in ( 14) is approximately constant), and construct the MDB of Section 2.2 for the residual stochastic process rather than the target process itself.Suitable choices of ζ t and f (•) can be found in Sections 3.1.1and 3.1.2.It should be clear from the discussion in Section 2.2 that for known x T , the MDB approximates the density of R τ k+1 |r τ k , r T by In this case, the connection between ( 15) and the intractable continuous-time conditioned residual process can be established by following the arguments of Section 2.2.1.By approximating the drift and diffusion matrix in ( 14) by the constants α(x T ) − f (ζ T ) and β(x T ) gives a process with a tractable transition density.The corresponding conditioned process then satisfies The density in ( 15) is then obtained by a local linearisation of ( 16).
It remains for us to choose ζ t to balance the accuracy and computational efficiency of the resulting construct.We explore two possible choices in the remainder of this section.

Subtracting the drift
In the simplest approach to account for dynamics based on the drift, we take ζ t = η t and f (•) = α(•) where so that The MDB can be constructed for the residual process by approximating the joint distribution of R τ k+1 and Y T − F η T (conditional on r τ k ), where Y T − F η T can be seen as a partial and noisy observation of R T since As in Section 2.2, we obtain the (approximate) joint distribution where α η k = α(η τ k ) and α k , β k and ∆ k are as defined in Section 2.2.Note that the mean in (19) uses the tangent α η k at (τ k , η τ k ) to approximate dη t /dt over time intervals of length ∆τ and ∆ k .Since η τ k+1 will be available either exactly from the solution of (17) or from the output of a (stiff) ODE solver, we propose to approximate dη t /dt via the chord between (τ k , η τ k ) and (τ k+1 , η τ k+1 ), that is, by Replacing α η k in ( 19) with δ η k , conditioning on y T − F η T and using the partition Note that in the case of known x T , Ψ *

Further subtraction using the linear noise approximation
Whilst the solution of the SDE governing the residual stochastic process in ( 18) is unavailable in closed form, a tractable approximation can be obtained.Therefore, in situations where η t fails to adequately capture the target process dynamics, we propose to further subtract an approximation of the conditional expectation ρ t = E(R t |r 0 , y T ), which we denote by ρt = E( Rt |r 0 , y T ).Here, { Rt , t ∈ [0, T ]} is obtained through the linear noise approximation (LNA) of ( 18).The LNA can be derived in a number of more or less formal ways (see e.g.Kurtz (1970), van Kampen ( 2001) and Fearnhead et al. (2014)).Here, we give a brief exposition of the LNA and refer the reader to Fearnhead et al. (2014) and the references therein for a complete derivation.By Taylor expanding α(X t ) and β(X t ) about η t (the solution of ( 17)), truncating the expansion of α at the first two terms and taking only the first term of the expansion of β, we obtain where H(η t ) is the Jacobian matrix with (i, j)th element (H(η t )) i,j = ∂α i (η t )/∂η j,t .It should be clear from the truncations used in the Taylor expansions of the drift and diffusion coefficients that the key assumption underpinning the LNA is that the stochastic term β(X t ) is "small".Now, for a fixed initial condition R0 = r0 , it is straightforward to show that Rt | R0 = r0 ∼ N P t r0 , P t ψ t P t (21) where P t and ψ t satisfy the ODE system The joint distribution of Rt and Conditioning further on y T − F η T and noting that r0 = r 0 = 0 gives ρt = E( Rt |r 0 , y T ) Having obtained an explicit, closed-form (subject to the solution of ( 17), ( 22) and ( 23)) approximation of the expected conditioned residual process, we adopt the partition X t = η t + ρt + R − t where {R − t , t ∈ [0, T ]} is the residual stochastic process resulting from the additional decomposition of X t .Although the SDE satisfied by R − t will be intractable, the joint distribution of R − τ k+1 and Y T − F (η T + ρT ) can be approximated (conditional on r − τ k ) by where again we use the chord Note that in the case of known

Guided proposals
For known x T , van der Meulen and Schauer (2015) (see also Schauer et al. (2016)) derive a bridge construct which they term a guided proposal (GP).They take the SDE satisfied by the conditioned process {X t , t ∈ [0, T ]} in ( 9) and ( 10 The guided proposal can be extended to the Gaussian additive noise regime in (2) by noting that in this case, the drift in (10) becomes Given a tractable approximation of p(y T |x t ), the Euler-Maruyama approximation of ( 9) can be applied over the discretisation of [0, T ] to give a proposal density of the form (6) with We will approximate p(y T |x t ) using the LNA.Using the partition Xt = η t + Rt and combining the transition density of Rt in (21) with the observation regime defined in (2) gives p(y where P T |t and ψ T |t are found by integrating the ODE system in ( 22) and ( 23) from t to T with P t|t = I d and ψ t|t = 0. Hence the drift (27) becomes Note that a computationally efficient implementation of this approach is obtained by using the identities P T |t = P T P −1 t and ψ T |t = P t (ψ T − ψ t )P t .Hence, the LNA ODEs in ( 17), ( 22) and ( 23) need only be integrated once over the interval [0, T ].Unfortunately, we find that this approach does not work well in practice, unless the total measurement error tr(Σ) is large relative to the infinitesimal variance β(•).Note that the variance of Y T |x t under the LNA is a function of the deterministic process η t .If η t and x t diverge as t is increased, the guiding term in (28) will result in an over or under dispersed proposal mechanism (relative to the target conditioned process) at times close to T .The problem is exacerbated in the case of no measurement error, where the discrepancy between x t and η t can result in a singularity in the guiding term in (28) at time T .This naive approach (henceforth referred to as GP-N) can be alleviated by integrating the ODE system given by ( 17), ( 22) and ( 23) for each interval [τ k , T ], k = 0, 1, . . ., m − 1, with η τ k = x τ k .In this case, the drift ( 27) is given by α In the special case that x T is known, we have that . The limiting form of the acceptance rate in this case can be found in Schauer et al. (2016), who also remark that a key requirement for absolute continuity of the target and proposal process is that σ(T ) = β(x T ).For the LNA, we have σ(t) = β(η t ).Again, we note that the naive implementation of the guided proposal (GP-N) will not meet this condition in general (when x T is known).Ensuring that σ(t) → β(x T ) as t → T by integrating ( 17), ( 22) and ( 23) for each τ k is likely to be time consuming, unless the LNA ODE system is tractable.In the case of exact observations, a computationally less demanding approach is obtained in van der Meulen and Schauer (2015) by taking the transition density of (26) with B(t) = 0 and σ(t) = β(x T ) to construct the guided proposal.Setting b(t) = α(η t ) leads to a proposal density for the simplified guided proposal (GP-S) of the form (6) with Ψ * GP-S (x τ k ) = β(x τ k ) and Further (example-dependent) methods for constructing guided proposals in the case of known x T can be found in van der Meulen and Schauer (2015).

Use of the MDB variance
Using the Euler-Maruyama approximation of (9) gives the variance of X τ k+1 |x τ k , y T in the guided proposal process as Ψ GP (x τ k )∆τ = β(x τ k )∆τ .In Section 4 we investigate the effect of using the variance (8) of the modified diffusion bridge construct by taking Ψ GP (x τ k ) = Ψ MDB (x τ k ).Although in this case, deriving the limiting form of the acceptance rate under the resulting proposal is problematic, we observe a worthwhile increase in empirical performance.In the case of known x T , use of the MDB variance in place of β(x τ k )∆τ comes at almost no additional computational cost.We denote this construct GP-MDB.

Computational considerations
For the observation regime in (2), all bridge constructs (with the exception of the myopic approach) require the inversion of a The Lindström bridge and modified diffusion bridge have roughly the same computational cost.The bridges based on residual processes incur an additional computational cost of having to solve a system of either d (when subtracting η t ) or order d 2 (when further subtracting ρ t ) coupled ODEs.However, we note that for known x 0 , the ODE system need only be solved once, irrespective of the number of skeleton bridges required.This is also true of the naive and simplified guided proposals.However, we note that in the case of known x T , the guided proposal requires solving order d 2 ODEs over each interval [τ k , T ], k = 0, 1, . . ., m − 1 for each simulated skeleton bridge, in order to maintain reasonable statistical efficiency (as measured by, for example, estimated acceptance rate of a Metropolis-Hastings independence sampler).

Applications
We now compare the accuracy and efficiency of the bridging methods discussed in the previous sections, by using them to make proposals inside a Metropolis-Hastings independence sampler.We consider three examples: a simple birth-death model in which the ODEs governing the LNA are tractable, a Lotka-Volterra system in which the use of numerical solvers are required, and a model of aphid growth inspired by real data taken from Matis et al. (2008).Generating discretetime realisations from the SDE model of aphid growth is particularly challenging due to nonlinear dynamics, and an observation regime in which only one component is observed and is subject to additive Gaussian noise.
In what follows, all results are based on 100K iterations of a Metropolis-Hastings independence sampler targeting either (3) or (4), depending on the observation regime.We measure the statistical efficiency of each bridge via their empirical acceptance probability.R code for the implementation of the M-H scheme can be found at https://github.com/gawhitaker/bridges-apps.The bridge constructs used in each example, together with their relative computational cost can be found in Table 1.Note that in contrast to Lindström (2012), we found that γ ∈ [0.001, 0.3] was required in order to find a near-optimal γ.Where LB is used, we only present results for the value of γ that maximised empirical performance.
Since the ODE system governing the LNA is tractable for this example, there is little difference in CPU cost between the bridges (see Table 1).Therefore, we use statistical efficiency (as measured by empirical Metropolis-Hastings acceptance probablity) as a proxy for overall efficiency of each bridge, with higher probabilities preferred.
Figure 1 shows empirical acceptance probabilities against the number of sub-intervals m for each bridge and each x T .Figures 2 and 3 compare 95% credible regions of the proposal under various bridging strategies with the true conditioned process (obtained from the output of the Metropolis-Hastings independence sampler).It is clear from the figures that as T is increased, the MDB fails to adequately account for the nonlinear behaviour of the conditioned process.Indeed, in terms of empirical acceptance rate, MDB is outperformed by all other bridges for T = 2.As m is increased so that the discretisation gets finer, the acceptance rates under all bridges (with the exception of GP-N) stay roughly constant.For GP-N, the acceptance rates decrease with m when x T is either the 5% or 95% quantile of X T |X 0 = 50.In this case, the variance associated with the approximate transition density either overestimates (when x T is the 5% quantile) or underestimates (when x T is the 95% quantile) the true variance at the end-point.For example, when x T is the 95% quantile, this results (see Figure 3) in a 'tapering in' of the proposal relative to the true conditioned process.GP-S, GP and LB give similar performance, although we note that GP-S and LB perform particularly poorly when x T is the 5% quantile.Moreover, LB requires the specification of a tuning parameter γ and we found that the acceptance rate was fairly sensitive to the choice of γ.In all scenarios, RB, RB − and GP-MDB comprehensively outperform all other bridge constructs.When x T is the median of X T |X 0 = 50, we see that RB and RB − (red and blue lines in Figure 1) give near identical performance, with η t adequately accounting for the observed nonlinear dynamics.In terms of statistical efficiency, GP-MDB outperforms both RB and RB − in all scenarios, although the relative difference is small.

Lotka-Volterra
In this example we consider a Lotka-Volterra model of predator-prey dynamics.We denote the system state at time t by X t = (X 1,t , X 2,t ) , ordered as prey, predators.The mass-action SDE representation of system dynamics takes the form x T = x T,(5) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability m 20 50 100 500 1000 x T = x T,(50) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability m 20 50 100 500 1000 x T = x T,(95) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability m 20 50 100 500 1000 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability m 20 50 100 500 1000 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability m 20 50 100 500 1000 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability m 20 50 100 500 1000 Figure Birth-death model.Empirical acceptance probability against m with T = 1 (1 st row) and T = 2 (2 nd row).The results are based on 100K iterations of a Metropolis-Hastings independence sampler.Black: MDB.Brown: LB.Red: RB.Blue: RB − .Grey: GP-N.Green: GP-S.Purple: GP.Pink: GP-MDB.(96.82,71.93) (133.35,70.75) (182.64,77.36) (242.08,97.23)x T,(95) (112.13,81.58) (162.28,84.63) (228.82,97.12) (308.58,128.76)Table 2: Lotka-Volterra model.Quantiles of X T |X 0 = (71, 79) found by repeatedly simulating from the Euler-Maruyama approximation of (30) with θ = (0.5, 0.0025, 0.3) .
The components of θ = (θ 1 , θ 2 , θ 3 ) can be interpreted as prey reproduction rate, prey death and predator reproduction rate, and predator death.Note that the ODE system (( 17), ( 22) and ( 23)) governing the linear noise approximation of ( 30) is intractable and we therefore use the R package lsoda to numerically solve the system when necessary.
We fixed the discretisation by taking m = 50, but note no appreciable difference in results for finer discretisations (e.g.m = 1000).As in the previous example, GP-N and GP-S perform relatively poorly, therefore in what follows we omit these bridges from the results.Note that we include MDB for reference.Figure 4 shows empirical acceptance probabilities against T for each bridge and each x T .Figure 5 compares 95% credible regions of the proposal under various bridging strategies with the true conditioned process (obtained from the output of the Metropolis-Hastings independence sampler).Unsurprisingly, as T is increased, MDB fails to adequately account for the nonlinear behaviour of the conditioned process.LB offers a modest improvement (except when x T = x T,(5) ) but is generally outperformed by the other bridge constructs.We found that as T was increased, LB required larger values of γ, reflecting the need for more weight to be placed on the myopic component of the construct.As for the previous example, unless x T is the median of X T |x 0 , RB is comprehensively outperformed by RB − (see Figure 5 for the effect of increasing T on RB and RB − ).However, we see that the acceptance probabilities are decreasing in T for both constructs.As noted by Fearnhead et al. (2014), the LNA can become poor as T increases, with the implication here being that the approximation of the expected residual (as used in RB − ) degrades with T .
We note that the estimated acceptance probabilities are roughly constant for GP and (to a lesser extent) GP-MDB, and in terms of statistical efficiency for a fixed number of iterations, GP-MDB should be preferred over all other algorithms considered in this article.However, the difference in estimated acceptance probabilities between GP-MDB and RB − is fairly small, even when T = 4 (e.g.0.857 vs 0.577 when x T = x T,(5) and 0.834 vs 0.606 when x T = x T,(50) ).We also note that a Metropolis-Hastings scheme that uses RB or RB − is some 30 times faster than a scheme with GP or GP-MDB, since the latter require solving the LNA ODE system for each sub-interval [τ k , T ] to maintain reasonable statistical efficiency for a given m.Therefore, we further compare RB, RB − , GP and GP-MDB by computing the minimum effective sample size (ESS) at time T /2 (where the minimum is over each component of X T /2 ) divided by CPU cost (in seconds).We denote this measure of overall efficiency by ESS/s.When x T = x T,(5) and T = 1, ESS/s scales roughly as 1 : 3 : 56 : 83 for GP : GP-MDB : RB : RB − .When T = 4, ESS/s scales roughly as 1 : 3 : 1 : 17.Hence, for this example, RB − is to be preferred in terms of overall efficiency, although the relative difference between RB − and GP-MDB appears to decrease as T is increased, consistent with the behaviour of the empirical acceptance rates observed in Figure 4.

Aphid growth
Matis et al. ( 2008) describe a stochastic model for aphid dynamics in terms of population size (N t ) and cumulative population size (C t ).The diffusion approximation of their model is given by where the components of θ = (θ 1 , θ 2 ) characterise the birth and death rate respectively.Matis et al. (2008) also provide a dataset consisting of cotton aphid counts recorded at times t = 0, 1.14, 2.29, 3.57 and 4.57 weeks, and collected for 27 different treatment block combinations.The analysis of these data via a stochastic differential mixed-effects model driven by ( 31) is the focus of Whitaker et al. (2015).
x T = x T,(5) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability Time 1 2 3 4 x T = x T,(50) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability Time 1 2 3 4 x T = x T,(95) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q q q q q q q q q q Acceptance Probability Time   = (347.55, 398.94) found by repeatedly simulating from the Euler-Maruyama approximation of (31) with θ = (1.45,0.0009) , and corrupting N 3.57 with additive N (0, σ 2 ) noise.
Driven by the real data of Matis et al. (2008) and to illustrate the proposed methodology in a challenging partial observation scenario, we assume that X T cannot be measured exactly.Rather, we observe Y T = F X T + T , T |Σ ∼ N (0, Σ), where Σ = σ 2 and F = (1, 0) so that only noisy observation of N T is possible, and C T is not observed at all.We consider a single treatment-block combination and consider the dynamics of the process over an observation time interval [2.29, 3.57], over which nonlinear dynamics are typically observed.We fix θ and x 2.29 at their marginal posterior means found by Whitaker et al. (2015), that is, at θ = (1.45,0.0009) and x 2.29 = (347.55, 398.94) .We generate various end-point conditioned scenarios by taking y 3.57 to be either the 5%, 50% or 95% quantile of Y 3.57 |X 2.29 = (347.55,398.94) , σ.To investigate the effect of measurement error, we further take σ ∈ {5, 10, 50}.The resulting quantiles are shown in Table 3.As with the previous example, the ODE system governing the linear noise approximation of ( 31) is intractable and we again use the lsoda package to numerically solve the system when necessary.
Figure 6 shows empirical acceptance probabilities against σ for EM, RB, RB − , GP and GP-MDB.Figure 7 compares 95% credible regions for a selection of bridges with the true conditioned process (obtained from the output of the independence sampler).All results are based on m = 50 (but note that no discernible difference in output was obtained for finer discretisations).As illustrated by both figures, the myopic sampler (EM) performs poorly (in terms of statistical efficiency, as measured by empirical acceptance probability) when the measurement error variance is relatively small (σ = 5).For σ = 50, the performance of EM is comparable with the other bridge constructs.In fact, as σ increases, the bridge constructs coincide with the Euler-Maruyama approximation of the target process.The gain in statistical performance of RB − over RB is clear.Likewise, GP-MDB outperforms GP, although the difference is very small for σ = 50 and again we note that as σ increases, the variance under GP-MDB, Ψ MDB (x τ k ), approaches the Euler-Maruyama variance, as used in GP.
The relative computational cost of each scheme can be found in Table 1.EM is particularly cheap to implement, given the simple form of the construct and the M-H acceptance probability.However, this approach cannot be recommended in this example for σ < 10, due to its dire statistical efficiency.The computational cost of RB, RB − , GP and GP-M is roughly the same, since for the guided proposals, we found that a naive implementation that only solves the LNA ODEs once, gave no appreciable difference in empirical acceptance probability as obtained when repeatedly solving the ODE system for each sub-interval [τ k , T ] (as is required in the case of no measurement error).Consequently, in this example, GP-MDB outperforms RB − in terms of overall efficiency.

Discussion
We have presented a novel class of bridge constructs that are both computationally and statistically efficient, and can be readily applied in situations where only noisy and partial observation of the y 3.57 = y 3.57,(5) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q Acceptance Probability σ 5 10 50 y 3.57 = y 3.57,(50) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q Acceptance Probability σ 5 10 50 y 3.57 = y 3.57,(95) 0.0 0.2 0.4 0.6 0.8 1.0 q q q q q q q q q q q q q q q Acceptance Probability σ 5 10 50  process is possible.Our approach is straightforward to implement and is based on a partition of the process into a deterministic part that accounts for forward dynamics, and a residual stochastic process.The intractable end-point conditioned residual SDE is approximated using the modified diffusion bridge of Durham and Gallant (2002).Using three examples, we have investigated the empirical performance of two variants of the residual bridge.The first constructs the residual SDE by subtraction of a deterministic process based on the drift governing the target process (denoted RB).The second variant further subtracts the linear noise approximation (LNA) of the expected conditioned residual process (denoted RB − ).Our examples included a scenario in which the LNA system is tractable, and another where the system must be solved numerically.An example that considers partial and noisy observation of the process at a future time was also presented.

Choice of residual bridge
We find that for all examples considered, the residual bridge that further subtracts the LNA mean results in improved statistical efficiency (over the simple implementation based on the drift subtraction only) at the expense of having to solve a larger ODE system consisting of order d 2 equations (as opposed to just d when using the simpler variant).For a known initial time-point x 0 , the ODE system need only be solved once, irrespective of the number of skeleton bridges required.Taking the Lotka-Volterra diffusion (described in Section 4.2) as an example, overall efficiency (as measured by minimum effective sample size per second, ESS/s, at time T /2) of RB − is 1.5 times that of RB when T = 1 and x T is either the 5% or 95% quantile of X T |x 0 .This factor increases to 17 when T = 4.However, for unknown x 0 , as would typically be the case when performing parameter inference, the ODE solution will be required for each skeleton bridge, and the difference in computational cost between the two approaches is likely to be important, especially as the dimension of the state space increases.For the Lotka-Volterra example, the computational cost for solving the ODE system for each bridge scales as 1 : 2.8 for RB : RB − .Therefore, the relative difference in ESS/s would reduce to a factor of roughly 0.5 when T = 1 (so that RB would be preferred) and 6 when T = 4.We therefore anticipate that in problems where x 0 is unknown, the simple residual bridge is to be preferred, unless the ODE system governing the LNA is tractable, or the dimension d of X t is relatively small, say d < 5.

Residual bridge or guided proposal?
We have compared the performance of our approach to several existing bridge constructs (adapting where necessary to the case of noisy and partial observation).These include the modified diffusion bridge (Durham and Gallant, 2002), Lindström bridge (Lindström, 2012) and guided proposal (Schauer et al., 2016).Our implementation of the latter uses the LNA to guide the proposal.We find that a further modification that replaces the Euler-Maruyama variance with the MDB variance gives a particularly effective bridge, outperforming all others considered here, in terms of statistical efficiency.We find that for fixed x 0 and noisy observation of x T , an efficient implementation of the guided proposal is possible, where the ODE system governing the LNA need only be solved once.In this case, the guided proposal outperforms both implementations of the residual bridge in terms of overall efficiency.However, we found that in the case of no measurement error (so that x T is known exactly), the guided proposal required that the ODEs governing the LNA be re-integrated for each intermediate time-point and for each skeleton bridge required.Unless the ODE system can be solved analytically, we find that when combining statistical and computational efficiency, the guided proposal is outperformed by both implementations of the residual bridge.

Extensions
Our work can be extended in a number of ways.For example, it may be possible to improve the statistical performance of the residual bridges by replacing the Euler-Maruyama approximation of the variance of Y T |X 0 with that obtained under the LNA.This approach could also be combined with the Lindström sampler to avoid specification of a tuning parameter.Deriving the limiting (as ∆τ → 0) forms of the Metropolis-Hastings acceptance rates associated with the residual bridges would be problematic due to the time dependent terms entering the variance of the constructs.Nevertheless, this merits further research.Interest also lies in the comparison of the bridge constructs for SDEs that exhibit multimodal behaviour, although we anticipate that further modification of the constructs will be required to efficiently deal with such a scenario.

Table 1 :
Example and bridge specific relative CPU cost for 100K iterations of a Metropolis-Hastings independence sampler.Due to well known poor performance in the case of known x T , EM is not implemented for the first two examples.Likewise, due to poor performance, we omit results based on GP-N and GP-S in the second example, and results based on MDB and LB in the final example.
d o × d o matrix at each intermediate time τ k , k = 1, 2 . . ., m − 1 and for each skeleton bridge required.For known x T , the proposal densities associated with each construct simplify.In this case, only the LNA-based residual bridge and guided proposal require the inversion of a d × d matrix at each intermediate time.