1 Introduction

Sequential data assimilation interlaces dynamic processes with intermittent partial state observations in order to provide reliable state estimates and their uncertainties. A wide array of numerical methods have been proposed to tackle this problem computationally. Popular methods include sequential Monte Carlo, variational inference, and various ensemble Kalman filter formulations [5, 8]. These methods can encounter difficulties whenever the predictive distribution is incompatible with the incoming data; in other words whenever the distance between the prior, as provided by the underlying stochastic process, and the data informed posterior distribution is large. It has long been realized that this challenge can be partially circumvented by altering the underlying stochastic process through appropriate control terms or modified proposal densities [5, 23, 16, 9]. Recently, the connection between devising such control terms and Schrödinger bridge problems [4] has been made explicit [16]. However, Schrödinger bridge problems are notoriously difficult to solve numerically. The key contribution of this paper is to provide a computationally tractable (sub-optimal) solution via a novel extension of established homotopy approaches [7, 14]. Similar to related homotopy approaches for purely Bayesian inference, the solution of certain partial differential equations (PDE) is required in order to find the desired control terms [14, 25]. In line with standard ensemble Kalman filter (EnKF) methodologies we approximate these PDEs via a constant gain approximation [21]. There are also alternative approaches to sequential data assimilation or inference which utilize ideas from optimal transportation; see for example [15, 6, 20, 3].

The paper is structured as follows. The mathematical formulation of the data assimilation problem, as considered in this paper, is laid out in Sect. 2. The standard optimal control and Schrödinger bridge approach to data assimilation is briefly summarized in Sect. 3, and the novel control formulation based on an homotopy formulation is introduced in Sect. 4. A practical implementation based on the EnKF methodology is proposed in Sect. 5. A series of increasingly complex data assimilation problems is considered in Sect. 6 in order to demonstrate the feasibility of the proposed methodologies. The paper concludes with some conclusions in Sect. 7. Detailed mathematical derivations can be found in Appendices 1 and 2, respectively.

2 Problem Formulation and Background

We consider drift diffusion processes given by a stochastic differential equations (SDE)

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t = f(X_t)\mathrm{d}t + \sqrt{2\sigma}\mathrm{d}W_t, \end{aligned} $$
(1)

where \(X_t : \Omega \to \mathbb {R}^{d_x}\), \(f:\mathbb {R}^{d_x} \to \mathbb {R}^{d_x}\), \(\sigma \in \mathbb {R}_{\geq 0}\), and \(W_t:\Omega \to \mathbb {R}^{d_x}\) denotes \(d_x\)-dimensional standard Brownian motion [12, 13].

Assuming the law of \(X_t\) is absolutely continuous w.r.t. Lebesgue measure with density \(\pi _t\), this leads to the Fokker–Planck equation [13]

$$\displaystyle \begin{aligned} {} \partial_t \pi_t = -\nabla \cdot \left( \pi_t \left( f - \sigma \nabla \log \pi_t \right) \right) . \end{aligned} $$
(2)

The SDE (1) can be replaced by the mean field ODE

$$\displaystyle \begin{aligned} {} \frac{\mathrm{d}}{\mathrm{d}t} \tilde X_t = f(\tilde X_t) - \sigma \nabla \log \tilde \pi_t \end{aligned} $$
(3)

where \(\tilde \pi _t\) denotes the law of \(\tilde X_t\). Provided \(\tilde \pi _0 = \pi _0\), it holds that \(\tilde \pi _t = \pi _t\) for all \(t> 0\). Note that the evolution of the random variable \(\tilde X_t\) is entirely deterministic subject to random initial conditions \(\tilde X_0 \sim \pi _0\).

At time \(t = T>0\), we have observations of the system according to

(4)

from which we wish to infer the unknown state ). Here \(h:\mathbb {R}^{d_x} \to \mathbb {R}^{d_y}\) denotes the forward map and \(\nu \sim \mathcal {N}(0,R)\) is \(d_y\)-dimensional Gaussian noise with covariance matrix \(R \in \mathbb {R}^{d_y \times d_y}\).

Let \(L:\mathbb {R}^{d_x} \to \mathbb {R}\) denote the corresponding negative log-likelihood function. Since \(\nu \) is Gaussian it is given by

$$\displaystyle \begin{aligned} {} L(x) = \frac{1}{2}(h(x) - y_T)^\top R^{-1} (h(x) - y_T) \end{aligned} $$
(5)

up to an irrelevant constant. The observations are combined with the predictive density \(\pi _T\) at time \(t=T\) according to Bayes’ theorem,

$$\displaystyle \begin{aligned} {} \pi^{\mathrm{a}}_T = \frac{e^{-L} \pi_T}{\int e^{-L(x)} \pi_T(x) \mathrm{d}x} . \end{aligned} $$
(6)

The process of transforming the random variable \(X_T \sim \pi _T\) into a random variable \(X_T^a \sim \pi _T^{\mathrm {a}}\) is called data assimilation in the context of dynamical systems and stochastic processes [10, 17, 8].

Since performing data assimilation can be difficult if the relative Kullback–Leibler divergence

$$\displaystyle \begin{aligned} \mathrm{KL}(\pi_T|\pi_T^{\mathrm{a}}) = \int_{\mathbb{R}^{d_x}} \pi_T(x) (\log \pi_T(x)-\log \pi_T^{\mathrm{a}}(x)) \mathrm{d}x, \end{aligned} $$
(7)

also called the relative entropy [13], between the prior \(\pi _T\) and posterior \(\pi _T^a\) is large and/or if the involved distributions are strongly non-Gaussian [19, 1], we propose to construct a new SDE with state process \(X_t^{\mathrm {h}}\) such that \(X_0^{\mathrm {h}} \sim \pi _0\) and \(X_T^{\mathrm {h}} \sim \pi _T^a\). In other words, we are looking for a stochastic process (bridge) with initial density \(\pi _0\) and final density \(\pi ^{\mathrm {a}}_T\). The problem of finding the optimal process (in the sense of minimal Kullback–Leibler divergence) is known as the Schrödinger bridge problem [4].

3 Schrödinger Bridge Approach

The Bayesian adjustment (6) at final time \(t=T\) leads in fact to an adjustment over the whole solution space of the underlying diffusion process described by (1). Let us denote the so called smoothing distribution by \(\pi _t^{\mathrm {a}}\), \(t \in [0,T]\) [18, 5]. It is well established that these marginal distributions can be generated from a controlled SDE

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t^{\mathrm{a}} = f(X^{\mathrm{a}}_t)\mathrm{d}t + g_t^{\mathrm{a}}(X^{\mathrm{a}}_t) \mathrm{d}t + \sqrt{2\sigma} \mathrm{d}W_t \end{aligned} $$
(8)

for appropriate control \(g_t^{\mathrm {a}}:\mathbb {R}^{d_x} \to \mathbb {R}^{d_x}\) such that \(X_0^{\mathrm {a}}\sim \pi _0^{\mathrm {a}}\) implies \(X_t^{\mathrm {a}}\sim \pi _t^{\mathrm {a}}\) for all \(t>0\). It is also well known that finding a suitable \(g_t^{\mathrm {a}}\) can be formulated as an optimal control problem which in turn is closely related to the backward Kolmogorov equation [13, 16]. Formulations related to (8) have also been used in the context of sequential Monte Carlo methods [5].

As proposed in [16], an alternative perspective on sequential data assimilation is provided by Schrödinger bridges. Given two marginal distributions \(q_0\) and \(q_T\) and stochastic process \(X_t\) (referred to as the reference process), a Schrödinger bridge is another stochastic process \(\hat {X}_t\) such that \(\hat {X}_0 \sim q_0\), \(\hat {X}_T \sim q_T\) and the Kullback–Leibler divergence between the processes \(\{\hat {X}_t\}_{t\in [0,T]}\) and \(\{X_t\}_{t\in [0,T]}\) is minimal. Specialised to our problem and considering a single data assimilation cycle that means the marginals are the initial and posterior densities, i.e. \(q_0 = \pi _0\) and \(q_T = \pi ^{\mathrm {a}}_T\), and the reference process is the solution to (1). The solution to the associated Schrödinger bridge problem is again of the form (8) with modified control term denoted by \(g_t^{\mathrm {SB}}(x)\).

A Schrödinger bridge is thus the optimal coupling as measured by the Kullback–Leibler divergence to the underlying reference process. Unfortunately Schrödinger bridges lead to boundary value problems in the space of probability measures and the required control term \(g_t^{\mathrm {SB}}\) seems rather difficult to compute in practice. In addition to the computational complexity of solving nonlinear Schrödinger bridge problems, the target distribution \(\pi _T^a\) is implicitly defined in the setting of data assimilation. The next section offers a solution to both of these issues. We point to [23] for a discussion of alternative approaches which introduce appropriate control terms into data assimilation procedures.

4 Homotopy Induced Dynamic Coupling

Since Schrödinger bridges are computationally challenging, we ask whether a less optimal but cheaper approach might also be feasible. Indeed, in the context of data assimilation a non-optimal coupling can be found via a homotopy between the initial and target distribution as follows. Let

$$\displaystyle \begin{aligned} \pi^{\mathrm{h}}_t(x) = Z^{-1}_t e^{-\frac{t}{T}L(x)} \pi_t(x) \end{aligned} $$
(9)

denote the homotopy in question, with \(Z_t = \int e^{-\frac {t}{T}L(x)} \pi _t(x) \mathrm {d}x\) the time dependent normalization constant. It clearly holds that \(\pi ^{\mathrm {h}}_0 = \pi _0\) and \(\pi ^{\mathrm {h}}_T = \pi _T^{\mathrm {a}}\). Note that the scaling \(t \mapsto e^{-\frac {t}{T}L}\) was chosen for its simplicity and follows previous work on Bayesian inference problems [7, 14]. Finding better homotopies or systematic ways of constructing one could be an interesting direction for future research.

We can then reason backwards from the Fokker–Planck equation of \(\pi ^h_t\) to conclude that if it is the density of a random variable \(X^h_t\) then that random variable must satisfy the modified SDE:

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t^{\mathrm{h}} = f(X^{\mathrm{h}}_t)\mathrm{d}t - \frac{\sigma t}{T} \nabla L(X^{\mathrm{h}}_t)\mathrm{d}t + g_t(X^{\mathrm{h}}_t) \mathrm{d}t + \sqrt{2\sigma} \mathrm{d}W_t , \end{aligned} $$
(10)

where \(g_t\) is a solution to the PDE

$$\displaystyle \begin{aligned} {} \nabla \cdot (\pi^{\mathrm{h}}_t g_t) = \frac{1}{T} \pi^h_t \left( L + t \nabla L \cdot \left( f - \frac{\sigma t}{T} \nabla L - \sigma \nabla \log \pi^{\mathrm{h}}_t \right) \right) + \pi^{\mathrm{h}}_t \frac{\dot{Z}_t}{Z_t} . \end{aligned} $$
(11)

The derivations of (11) can be found in Appendix 1.

Note that (10) constitutes a mean field model since \(g_t\) depends on the distribution \(\pi _t^{\mathrm {h}}\) of \(X_t^{\mathrm {h}}\). We also wish to point out that

$$\displaystyle \begin{aligned} \hat g_t^{\mathrm{SB}}(x) := - \frac{\sigma t}{T} \nabla L(x) + g_t(x) \end{aligned} $$
(12)

provides a solution to the associated coupling problem. The control term \(\hat g_t^{\mathrm {SB}}\) is however non-optimal in the sense of the Schrödinger bridge problem since it does not minimise the Kullback–Leibler divergence.

Since (11) is linear in \(g_t\) we can decompose (11) into a set of simpler equations \(\nabla \cdot \left ( \pi ^{\mathrm {h}}_t g^i_t \right ) = \pi ^{\mathrm {h}}_t (k^i - \mathbb {E} k^i)\) such that the \(k^i\) add up to the right hand side of (11). In order to maintain \(\int _{ }^{} \nabla \cdot \left ( \pi ^{\mathrm {h}}_t g_t \right ) = 0\) for the individual \(g^i_t\) we can make use of the fact that the terms in (11) are of the form \(\pi ^{\mathrm {h}}_t \left ( k - \mathbb {E} k \right )\). Separating the terms, we obtain the following equations, the sum of whose solutions solves (11):

$$\displaystyle \begin{aligned} {} \nabla \cdot \left( \pi^{\mathrm{h}}_t g_t^1 \right) &= \pi^{\mathrm{h}}_t \left( \frac{L}{T} - \mathbb{E}\frac{L}{T} \right) \end{aligned} $$
(13a)
$$\displaystyle \begin{aligned} {} \nabla \cdot \left( \pi^{\mathrm{h}}_t g_t^2 \right) &= \pi^{\mathrm{h}}_t \left( \frac{t}{T} \nabla L \cdot f - \mathbb{E} \frac{t}{T} \nabla L \cdot f \right) \end{aligned} $$
(13b)
$$\displaystyle \begin{aligned} {} \nabla \cdot \left( \pi^{\mathrm{h}}_t g_t^3 \right) &= - \pi^{\mathrm{h}}_t \left( \frac{\sigma t}{T} \nabla L \cdot \nabla \log \pi^{\mathrm{h}}_t - \mathbb{E} \frac{\sigma t}{T} \nabla L \cdot \nabla \log \pi^{\mathrm{h}}_t \right) \end{aligned} $$
(13c)
$$\displaystyle \begin{aligned} {} \nabla \cdot \left( \pi^{\mathrm{h}}_t g_t^4 \right) &= - \pi^{\mathrm{h}}_t \left( \frac{\sigma t^2}{T^2} \nabla L \cdot \nabla L - \mathbb{E} \frac{\sigma t^2}{T^2} \nabla L \cdot \nabla L \right) . \end{aligned} $$
(13d)

Note that

$$\displaystyle \begin{aligned} \pi^{\mathrm{h}}_t \nabla L \cdot \nabla \log \pi^{\mathrm{h}}_t = \nabla \left( \pi^{\mathrm{h}}_t \nabla L \right) - \pi^{\mathrm{h}}_t \Delta L , \end{aligned} $$
(14)

which can be used to avoid the computation of \(\nabla \log \pi _t^{\mathrm {h}}\) (with \(\Delta = \nabla \cdot \nabla \) the Laplacian operator) in (13c). Thus the controlled SDE (10) can be replaced by

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t^{\mathrm{h}} = f(X^{\mathrm{h}}_t)\mathrm{d}t - \frac{2\sigma t}{T} \nabla L(X^{\mathrm{h}}_t)\mathrm{d}t + \hat g_t(X^{\mathrm{h}}_t) \mathrm{d}t + \sqrt{2\sigma} \mathrm{d}W_t , \end{aligned} $$
(15)

where \(\hat g_t\) is a solution to

$$\displaystyle \begin{aligned} {} \nabla \cdot (\pi^{\mathrm{h}}_t \hat g_t) = \frac{1}{T} \pi^{\mathrm{h}}_t \left( L + t \nabla L \cdot \left( f - \frac{\sigma t}{T} \nabla L \right) + \sigma t \Delta L \right) + \pi^{\mathrm{h}}_t \frac{\dot{Z}_t}{Z_t} . \end{aligned} $$
(16)

Furthermore, if \(\Delta L\) is a constant (as a function of x) or small in comparison to the other contributions in (16), then (16) simplifies further. In particular, this is the case if the forward map is linear, that is, \(h(x) = Hx\).

Building upon the mean field ODE (3), one obtains the equivalent controlled mean field ODE system

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t}\tilde X_t^{\mathrm{h}} = f(\tilde X^{\mathrm{h}}_t) - \frac{\sigma t}{T} \nabla L(\tilde X^{\mathrm{h}}_t) + g_t(\tilde X^{\mathrm{h}}_t) - \sigma\nabla \log \tilde \pi^{\mathrm{h}}_t(\tilde X_t^{\mathrm{h}}) , \end{aligned} $$
(17)

with \(g_t\) defined as before. This mean field formulation again requires knowledge (or approximation) of \(\nabla \log \tilde \pi _t^{\mathrm {h}}\). A Gaussian approximation might be sufficient in certain circumstances giving rise to

$$\displaystyle \begin{aligned} \nabla \log \tilde \pi_t^{\mathrm{h}}(x) \approx -(\tilde \Sigma_t^{ \mathrm{h}})^{-1}(x-\tilde \mu_t^{\mathrm{h}}), \end{aligned} $$
(18)

where \(\tilde \mu _t^{\mathrm {h}}\) denotes the mean of \(\tilde X_t^{\mathrm {h}}\) and \(\tilde \Sigma _t^{\mathrm {h}}\) its covariance matrix.

5 Numerical Implementation

No analytic solution to (11) is known and we thus have to resort to approximations. We note that a similar PDE arises in the computation of the gain in the feedback particle filter [24] and one could use the diffusion map based approximation [22] for the problem at hand. This method also transforms the PDE into a Poisson equation which it then translated into an equivalent integral equation, the semi-group form of the Poisson equation. As the name suggests the integral equation makes use of the generator of a semi-group which can be approximated by diffusion maps. Here we instead propose to follow the constant gain approximation first introduced in the EnKF methodology [21].

5.1 Ensemble Kalman Mean Field Approximation

Let us assume that \(\Delta L \approx const\) in (16). Then we only need to deal with the modified negative log likelihood function

$$\displaystyle \begin{aligned} \tilde L(x) &= \frac{1}{T} \left( L(x) + t\nabla L(x) \cdot \left(f(x)-\frac{\sigma t}{T} \nabla L(x)\right)\right) \end{aligned} $$
(19a)
$$\displaystyle \begin{aligned} &\approx \frac{1}{T} \left( L(x) + \frac{t}{\Delta t}\left\{ L(x) - L\left( x-\Delta t f(x) + \Delta t\frac{t\sigma}{T} \nabla L(x)\right)\right\} \right) \end{aligned} $$
(19b)

with \(\Delta t\) being the time-step also used later for time-stepping the evolution equations (10) or (15), respectively. Since L is given by (5), we define the modified forward map

$$\displaystyle \begin{aligned} \tilde h(x) = h\left(x - \Delta t f(x) + \Delta t \frac{\sigma t}{T} \nabla L(x)\right) \end{aligned} $$
(20)

and thus

$$\displaystyle \begin{aligned} {} \tilde L(x) \approx \frac{t+\Delta t}{2\Delta t T} (h(x)-y_T)^\top R^{-1}(h(x)-y_T) -\, \frac{t}{2\Delta t T} (\tilde h(x)-y_T)^\top R^{-1}(\tilde h(x) - y_T). \end{aligned} $$
(21)

Following the standard EnKF methodology for quadratic loss functions, this suggests to approximate the drift function \(\hat g_t\) in (15) as follows:

$$\displaystyle \begin{aligned} \hat g_t^{\mathrm{KF}}(x) &= -\frac{t + \Delta t}{\Delta t T}\Sigma_t^{xh}R^{-1}\left( \frac{1}{2} \left( h(x)+ \pi_t^{\mathrm{h}}[h]\right) - y_T\right) \end{aligned} $$
(22a)
$$\displaystyle \begin{aligned} & \qquad \qquad + \frac{t}{\Delta t T}\Sigma_t^{x\tilde h}R^{-1} \left(\frac{1}{2} \left(\tilde h(x) +\pi_t^{\mathrm{h}}[\tilde h] \right) -y_T\right). \end{aligned} $$
(22b)

Here we have introduced the notation \(\pi _t^{\mathrm {h}}[l]\) to denote the expectation value \(\mathbb {E} l\) of a function \(l(x)\) under the PDF \(\pi _t^{\mathrm {h}}\). Furthermore, \(\Sigma _t^{xh}\) denotes the correlation matrix between x and \(h(x)\) under the PDF \(\pi _t^{\mathrm {h}}\) etc. The derivation of (22) can be found in Appendix 2.

5.2 Particle Approximation and Time-Stepping

The controlled mean field equations (15) can be implemented numerically by the standard Monte Carlo Ansatz, that is, M particles \(X_t^{(i)}\) are propagated according to

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t^{(i)} = f(X^{(i)}_t)\mathrm{d}t - \frac{2\sigma t}{T} \nabla L(X^{(i)}_t)\mathrm{d}t + \hat g_t^{\mathrm{KF}}(X^{(i)}_t) \mathrm{d}t + \sqrt{2\sigma} \mathrm{d}W_t^{(i)} \end{aligned} $$
(23)

for \(i=1,\ldots ,M\). The required expectation values in \(\hat g_t^{\mathrm {KF}}\) are evaluated with respect to the empirical measure

$$\displaystyle \begin{aligned} \hat \pi_t^{\mathrm{h}}(x) = \frac{1}{2} \sum_{i=1}^M \delta (x-X_t^{(i)}). \end{aligned} $$
(24)

The interacting particle system can be time-stepped using an appropriate adaptation of (61) from Appendix 2. The computation of gradients can be avoided by applying the statistical linearisation (60).

6 Examples

We now discuss a sequence of increasingly complex examples. The purpose is both to illuminate certain aspects of the proposed control terms as well as to indicate the computational advantages of the proposed methodology. All examples will be based on linear forward maps \(h(x) = Hx\) and, therefore \(\Delta L\) is constant and can be ignored.

6.1 Pure Diffusion Processes

We set the drift f to zero in (1) and also assuming Gaussian initial conditions. Then the control term (22) gives rise to the mean-field SDE

$$\displaystyle \begin{aligned} &\mathrm{d}X_t^{\mathrm{h}} = \sqrt{2\sigma}\mathrm{d}W_t - \frac{2\sigma t}{T} H^\top R^{-1} (H X_t^{\mathrm{h}}-y_T) \mathrm{d}t \end{aligned} $$
(25a)
$$\displaystyle \begin{aligned} & \qquad \qquad \,{-}\,\Sigma_t^{\mathrm{h}}H^\top \left\{ \frac{1}{T} R^{{-}1} \,{-}\, \frac{2\sigma t^2}{T^2} R^{{-}1} H H^\top R^{{-}1} \right\} \left(\frac{1}{2} \left(HX_t^{\mathrm{h}} \,{+}\, H\mu_t^{\mathrm{h}} \right)\,{-}\,y_T \right)\mathrm{d}t \end{aligned} $$
(25b)

in the limit \(\Delta t\to 0\). We note that \(X_t^{\mathrm {h}} \sim \pi _t^{\mathrm {h}}\) will remain Gaussian for all times and we denote the mean by \(\mu _t^{\mathrm {h}}\) and the covariance matrix by \(\Sigma _t^{\mathrm {h}}\). Hence, it holds that \(\Sigma _t^{xx} = \Sigma _t^{\mathrm {h}}\) and \(\pi _t^{\mathrm {h}}[x] = \mu _t^{\mathrm {h}}\).

Please note that the additional drift term in (25a) is pulling \(X_t^{\mathrm {h}}\) towards the observation \(y_T\) regardless of the value of \(\Sigma _t^{\mathrm {h}}\). It should also be noted that the drift term in (25b) can be both attractive or repulsive with regard to the observation \(y_T\) depending on the eigenvalues of

$$\displaystyle \begin{aligned} \Omega_t = \frac{1}{T} R^{-1} - \frac{2\sigma t^2}{T^2} R^{-1} H H^\top R^{-1}. \end{aligned} $$
(26)

The strength of this drift term is moderated by the covariance matrix \(\Sigma _t^{\mathrm {h}}\).

We consider a one-dimensional problem with \(R = 0.01\), \(\sigma = 1\), \(H = 1\), \(y_T = 1\) and \(T=1\). The initial conditions are Gaussian with mean \(\mu _0 = 0\) and variance \(\Sigma _0 = 1\). It follows that \(\pi _1\) is Gaussian with mean \(\mu _1 = 0\) and variance \(\Sigma _1 = 2\) and the resulting Gaussian posterior \(\pi _1^a\) has mean and variance given by

$$\displaystyle \begin{aligned} {} \mu_1^a = K y_T \approx 0.9524, \qquad \mu_1^a = 2 - 2K \approx 0.0952 \end{aligned} $$
(27)

with Kalman gain \(K = 2/(2+0.01) \approx 0.9524\).

In Fig. 1 one can find the time evolution of the mean and the variance under the mean field equations (25). The early impact of the data driven control term on the dynamics is perhaps surprising and quite opposite to the standard sequential approach to data assimilation where one first propagates to final time and only then adjusts according to the available data. It is also worth noticing that the sign of the corresponding \(\Omega _t\) changes sign at \(t_c = \sqrt {2}/20\) implying that the drift term in (25) has a destabilizing effect on the dynamics for \(t>t_c\).

Fig. 1
figure 1

Time evolution of the mean \(\mu _t^h\) and the variance \(\Sigma _t^h\) under the mean field equations (25). Their values at final agree with the posterior values provided by (27)

6.2 Purely Deterministic Processes

We now set \(\sigma =0\) in (1). We obtain from (22) the mean field ODE system

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^{\mathrm{h}} &= f(X_t^{\mathrm{h}}) - \frac{t+\Delta t}{\Delta t T} \Sigma_t^{\mathrm{h}} H^\top R^{-1} \left( \frac{1}{2} H \left(X_t^{\mathrm{h}} + \mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$
(28a)
$$\displaystyle \begin{aligned} & \qquad +\,\frac{t}{\Delta t T} \Sigma_t^{x\tilde h} R^{-1} \left( \frac{1}{2} \left( \tilde h(X_t^{\mathrm{h}}) + \pi_t^{\mathrm{h}}[\tilde h]\right) - y_T \right) \end{aligned} $$
(28b)

with

$$\displaystyle \begin{aligned} \tilde h(x) = Hx - \Delta t H f(x). \end{aligned} $$
(29)

These equations can be expanded giving rise to

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^{\mathrm{h}} &= f(X_t^{\mathrm{h}}) - \frac{1}{T} \left\{ \Sigma_t^{\mathrm{h}} + t \Sigma_t^{xf}\right\} H^\top R^{-1} \left( \frac{1}{2} H \left(X_t^{\mathrm{h}} + \mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$
(30a)
$$\displaystyle \begin{aligned} & \qquad -\,\frac{t}{2T} \Sigma_t^{\mathrm{h}} H^\top R^{-1} H \left( f(X_t^{\mathrm{h}}) + \pi_t^{\mathrm{h}}[f])\right) \end{aligned} $$
(30b)

upon ignoring terms of order \(\mathcal {O}(\Delta t)\). Unless the drift function f is linear, these mean field equations provide only an approximation to the controlled mean field equations (15).

6.3 Linear Gaussian Case

It is instructive to investigate the linear case

$$\displaystyle \begin{aligned} {} f(x) = Fx + b \end{aligned} $$
(31)

in more detail where again everything remains Gaussian provided \(X_0^h\) is Gaussian distributed, that is \(\pi _0(x) = \mathcal {N}(x; \mu _0, \Sigma _0)\). Under these conditions the densities \(\pi _t\) and \(\pi ^{\mathrm {h}}_t\) will also be Gaussian, we write \(\pi ^{\mathrm {h}}_t(x) = \mathcal {N}(x; \mu ^{\mathrm {h}}_t, \Sigma ^{\mathrm {h}}_t)\). The associated mean field equations follow from Appendix 2 and are given by

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^{\mathrm{h}} &= FX_t^{\mathrm{h}} + b + \sigma (\Sigma_t^{\mathrm{h}})^{-1} (X_t^{\mathrm{h}}-\mu_t^{\mathrm{h}}) - \frac{2\sigma t}{T} H^\top R^{-1} (H X_t^{\mathrm{h}}-y_T) \end{aligned} $$
(32a)
$$\displaystyle \begin{aligned} & \qquad -\, C_t H^\top R^{-1} \left( \frac{1}{2} H \left(X_t^{\mathrm{h}} + \mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$
(32b)
$$\displaystyle \begin{aligned} & \qquad -\,\frac{t}{T} \Sigma_t^h H^\top R^{-1} H \left( \frac{1}{2} F\left( X_t^{\mathrm{h}} + \mu_t^{\mathrm{h}} \right) + b\right). \end{aligned} $$
(32c)

with

$$\displaystyle \begin{aligned} C_t = \frac{1}{T} \Sigma_t^{\mathrm{h}} + \frac{t}{T} \Sigma_t^{\mathrm{h}} F^\top - \frac{2\sigma t^2}{T^2} \Sigma_t^{\mathrm{h}} H^\top R^{-1} H . \end{aligned} $$
(33)

A qualitative discussion can be performed in the scalar case, that is \(d_x = 1\), \(H = 1\), \(\sigma = 1\), \(b=0\), \(T=1\) and \(F = \lambda \). One finds that the control terms involving F stabilize the dynamics whenever \(\lambda >0\). This observation is in line with the fact that the data is crucial only if the dynamics in \(X_t\) is unstable, that is, \(\lambda > 0\).

We consider a two dimensional diffusion process with state variable \(x = (x_1,x_2)^\top \) and linear drift term (31) given by

$$\displaystyle \begin{aligned} F = \left( \begin{array}{cc} -2 & 1 \\ 1 & -2 \end{array} \right), \end{aligned} $$
(34)

\(b = 0\), and diffusion constant \(\sigma = 0.1I\). The forward operator is \(H= \begin {pmatrix} 1 & 0 \end {pmatrix} \) and thevariance of the noise \(R=0.01\). The initial distribution was \(\pi _0 = \mathcal {N}( (1, 3), 0.02I )\). The observed value at time \(T=1\) is set to \(y_T = 2.5\).

The posterior mean takes values \(\mu _1^{\mathrm {a}} \approx 2.25\) and \(\mu _2^{\mathrm {a}} \approx 1.50\), while the posterior covariance matrix becomes

$$\displaystyle \begin{aligned} \Sigma^{\mathrm{a}} \approx \left( \begin{array}{cc} 0.0086 & 0.0039\\ 0.0039 & 0.0503 \end{array} \right). \end{aligned} $$
(35)

Numerical results can be found in Fig. 2. The impact of the control term on the linear diffusion process can clearly be seen and is most prominent on the observed \(x_1\) component of the process. The final values of the controlled process agree well with their posterior counterparts.

Fig. 2
figure 2

Left panel: Time evolution of the mean in \(x_1\) and \(x_2\), the two associated variances and the covariance between \(x_1\) and \(x_2\) under the linear diffusion process. Right panel: Time evolution of the same quantities under the controlled diffusion process

6.4 Nonlinear Diffusion Example

We consider a two-dimensional problem and denote the state variable by \(x = (x_1,x_2)^\top \). The drift term is given by

$$\displaystyle \begin{aligned} f(x) = -\nabla V(x), \qquad V(x) = \frac{\lambda_1}{2} \left(x_2 - 2 + \beta x_1^2\right)^2 + \frac{\lambda_2}{2}\left(\frac{x_1^4}{2}-x_1^2\right) \end{aligned} $$
(36)

with parameters \(\lambda _1 = 2000\), \(\lambda _2 = 5\), and \(\beta = 1/5\). The diffusion constant is set to \(\sigma = 1\). The choice of the potential \(V(x)\) has two effects: (1) there is a relative high barrier for particles to pass from positive to negative \(x_1\)-values and vice versa; (2) the dynamics stay close to the parabola \(x_2 = 2- \beta x_1^2\).

The initial distribution is obtained by sampling \(x_1\) from a Gaussian with mean \(1.5\) and variance \(0.0625\). The \(x_2\) component is obtained from the relation

$$\displaystyle \begin{aligned} x_2 = 2 - \beta x_1^2. \end{aligned} $$
(37)

We observe the first component \(x_1\) of the state vector at time \(T=1\) with measurement error variance \(R = 0.01\). The observed value is set to \(y_T = -1.5\). Due to the tiny observation error the posterior is centred sharply about the observed value. Furthermore, recall that the dynamics is essentially slaved to the parabola \(x_2 = 2 - \beta x_1^2\) which makes the inference problem strongly nonlinear.

All particle simulations are run with an ensemble size of \(M=1000\). Essentially identical results are obtained for \(M=100\). Smaller ensemble sizes lead to numerical instabilities.

In Fig. 3, one can find the particle distribution at time \(t=1\) which constitutes the prior distribution for the associated Bayesian inference problem. It is obvious that a particle filter would fail to recover the posterior distribution which is sharply centered about the observed value. We found that increasing the ensemble size to \(M = 10{,}000\) allows a particle filter to recover the posterior distribution; but the effective ensemble size still drops dramatically. The approximation provided by the EnKF is also displayed. The EnKF fails to recover the posterior due to its inherent linear regression ansatz which is inappropriate for this strongly nonlinear inference problem even in the limit ensemble size \(M\to \infty \).

Fig. 3
figure 3

Initial (blue) and final particle positions (red) under the given evolution process together with the posterior approximation provided by the EnKF (yellow). The observed value is also displayed

In Fig. 4, the results from the controlled mean field formulation are displayed. It can be concluded that the posterior distribution is well approximated despite the constant gain approximation made in order to formulate the control term \(\hat g_t^{\mathrm {KF}}\) in (22).

Fig. 4
figure 4

Left panel: Initial and final particle positions under the controlled evolution process. Right panel: Particle positions at intermediate times \(t_k \in [0,1]\)

6.5 Lorenz-63 Example

All examples so far have considered a single data assimilation cycle only. We now perform a proper sequential data assimilation experiment for the standard Lorenz-63 model [11]

$$\displaystyle \begin{aligned} {} \frac{\mathrm{d}}{\mathrm{d}t} X_t = f(X_t) , \end{aligned} $$
(38)

where \(X_t:\Omega \to \mathbb {R}^3\) and

$$\displaystyle \begin{aligned} f(x,y,z) = \left(\begin{array}{c} a(y-x) \\ x(b-z)-y \\ xy-cz \end{array} \right) \end{aligned} $$
(39)

with parameters \(a=10\), \(b=28\) and \(c=8/3\).

In order to obtain a reference solution ) for \(t\ge 0\), the ODE (38) is solved numerically with step-size \(\Delta t = 0.005\) and initial condition

(40)

Scalar-valued observations are generated every \(\Delta t_{\mathrm {obs}}>0\) units of time using the forward model

$$\displaystyle \begin{aligned} y_{n\Delta t_{\mathrm{obs}}} = H X_{n\Delta t_{\mathrm{obs}}} + \nu_n, \qquad n = 1,\ldots N, \end{aligned} $$
(41)

with measurement errors \(\nu _n \sim \mathrm {N}(0,1)\) and forward map \(H = (1 \,0 \,0) \in \mathbb {R}^{1\times 3}\). We use \(\Delta t_{\mathrm {obs}} \in \{0.05,0.1,0.12\}\) in our experiments and perform \(N = 20{,}000\) assimilation cycles.

The initial ensemble \(\{X_0^{(i)}\}_{i=1}^M\) is drawn from the Gaussian distribution with mean ) and covariance matrix \(0.01 I\). We employ multiplicative ensemble inflation which amounts to replacing the Lorenz-63 dynamics by

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^{(i)} = f(X_t^{(i)}) + \sigma_k (X_t^{(i)} - \hat \mu_t ), \qquad i = 1,\ldots,M, \end{aligned} $$
(42)

with inflation factors

$$\displaystyle \begin{aligned} \sigma_k = 0.025 k, \qquad k = 0,\ldots,9. \end{aligned} $$
(43)

Here \(\hat \mu _t\) denotes the empirical mean of the ensemble \(\{X_t^{(i)}\}_{i=1}^M\). These equations are combined with the augmented evolution equations (30) and solved numerically with step-size \(\Delta t = 0.005\) and ensemble sizes \(M \in \{5,10,15\}\).

We report the resulting root mean square errors

(44)

which are computed for each ensemble size M, observation interval \(\Delta t_{\mathrm {obs}}\) and inflation factor \(\sigma _k\). The results are displayed in Table 1 where the smallest RMSE over the range of inflation factors \(\{\sigma _k\}_{k=0}^9\) is stated for each M and \(\Delta t_{\mathrm {obs}}\). We also state the corresponding RMSEs from a standard ensemble square root filter implementation [2, 8]. We find that the proposed homotopy approach outperforms the ensemble square root filter in terms of RMSE in all settings considered. The improvements increase for increasing observation intervals \(\Delta t_{\mathrm {obs}}\). The homotopy approach also appears less sensitive to the ensemble size M.

Table 1 RMSE for both a standard ensemble square root filter (ESRF) implementation and our homotopy approach in terms of ensemble sizes \(M \in \{5,10,15\}\) and observation intervals \(\Delta t_{\mathrm {obs}} \in \{0.05,0.1,0.12\}\). The homotopy based data assimilation method leads to significantly reduced RMSEs in all settings

We close this example by pointing out that less of an improvement could be expected for a fully observed Lorenz-63 system. The proposed homotopy approach seems particularly effective in guiding the unobserved solution components to regions of high posterior probability. See also the example from Sect. 6.4.

7 Conclusions

Devising alternative proposal densities has a long history in the context of sequential data assimilation and filtering. Here we have explored a computationally tractable approach which combines the concept of Schrödinger bridges with a rather straightforward homotopy approach. A further key ingredient is the approximate solution of the arising PDEs in terms of a constant gain approximation, which is also widely used within the EnKF community. Numerical examples indicate that the approach is viable and can overcome limitations of both standard sequential Monte Carlo as well as standard EnKF methods. This has been demonstrated for single assimilation steps as well as long-time data assimilation using the chaotic Lorenz-63 model with only the first component observed infrequently. It remains to be seen how the proposed methods behave for high dimensional stochastic processes.