Data Assimilation: A Dynamic Homotopy-Based Coupling Approach

Reich, Sebastian

doi:10.1007/978-3-031-40094-0_12

Sebastian Reich¹³

Part of the book series: Mathematics of Planet Earth ((MPE,volume 11))

Included in the following conference series:

Stochastic Transport in Upper Ocean Dynamics Annual Workshop

925 Accesses

Abstract

Homotopy approaches to Bayesian inference have found wide- spread use especially if the Kullback–Leibler divergence between the prior and the posterior distribution is large. Here we extend one of these homotopy approaches to include an underlying stochastic diffusion process. The underlying mathematical problem is closely related to the Schrödinger bridge problem for given marginal distributions. We demonstrate that the proposed homotopy approach provides a computationally tractable approximation to the underlying bridge problem. In particular, our implementation builds upon the widely used ensemble Kalman filter methodology and extends it to Schrödinger bridge problems within the context of sequential data assimilation.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Sequential data assimilation interlaces dynamic processes with intermittent partial state observations in order to provide reliable state estimates and their uncertainties. A wide array of numerical methods have been proposed to tackle this problem computationally. Popular methods include sequential Monte Carlo, variational inference, and various ensemble Kalman filter formulations [5, 8]. These methods can encounter difficulties whenever the predictive distribution is incompatible with the incoming data; in other words whenever the distance between the prior, as provided by the underlying stochastic process, and the data informed posterior distribution is large. It has long been realized that this challenge can be partially circumvented by altering the underlying stochastic process through appropriate control terms or modified proposal densities [5, 23, 16, 9]. Recently, the connection between devising such control terms and Schrödinger bridge problems [4] has been made explicit [16]. However, Schrödinger bridge problems are notoriously difficult to solve numerically. The key contribution of this paper is to provide a computationally tractable (sub-optimal) solution via a novel extension of established homotopy approaches [7, 14]. Similar to related homotopy approaches for purely Bayesian inference, the solution of certain partial differential equations (PDE) is required in order to find the desired control terms [14, 25]. In line with standard ensemble Kalman filter (EnKF) methodologies we approximate these PDEs via a constant gain approximation [21]. There are also alternative approaches to sequential data assimilation or inference which utilize ideas from optimal transportation; see for example [15, 6, 20, 3].

The paper is structured as follows. The mathematical formulation of the data assimilation problem, as considered in this paper, is laid out in Sect. 2. The standard optimal control and Schrödinger bridge approach to data assimilation is briefly summarized in Sect. 3, and the novel control formulation based on an homotopy formulation is introduced in Sect. 4. A practical implementation based on the EnKF methodology is proposed in Sect. 5. A series of increasingly complex data assimilation problems is considered in Sect. 6 in order to demonstrate the feasibility of the proposed methodologies. The paper concludes with some conclusions in Sect. 7. Detailed mathematical derivations can be found in Appendices 1 and 2, respectively.

2 Problem Formulation and Background

We consider drift diffusion processes given by a stochastic differential equations (SDE)

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t = f(X_t)\mathrm{d}t + \sqrt{2\sigma}\mathrm{d}W_t, \end{aligned} $$

(1)

where $X_t : \Omega \to \mathbb {R}^{d_x}$, $f:\mathbb {R}^{d_x} \to \mathbb {R}^{d_x}$, $\sigma \in \mathbb {R}_{\geq 0}$, and $W_t:\Omega \to \mathbb {R}^{d_x}$ denotes $d_x$-dimensional standard Brownian motion [12, 13].

Assuming the law of $X_t$ is absolutely continuous w.r.t. Lebesgue measure with density $\pi _t$, this leads to the Fokker–Planck equation [13]

$$\displaystyle \begin{aligned} {} \partial_t \pi_t = -\nabla \cdot \left( \pi_t \left( f - \sigma \nabla \log \pi_t \right) \right) . \end{aligned} $$

(2)

The SDE (1) can be replaced by the mean field ODE

$$\displaystyle \begin{aligned} {} \frac{\mathrm{d}}{\mathrm{d}t} \tilde X_t = f(\tilde X_t) - \sigma \nabla \log \tilde \pi_t \end{aligned} $$

(3)

where $\tilde \pi _t$ denotes the law of $\tilde X_t$. Provided $\tilde \pi _0 = \pi _0$, it holds that $\tilde \pi _t = \pi _t$ for all $t> 0$. Note that the evolution of the random variable $\tilde X_t$ is entirely deterministic subject to random initial conditions $\tilde X_0 \sim \pi _0$.

At time $t = T>0$, we have observations of the system according to

(4)

from which we wish to infer the unknown state ). Here $h:\mathbb {R}^{d_x} \to \mathbb {R}^{d_y}$ denotes the forward map and $\nu \sim \mathcal {N}(0,R)$ is $d_y$-dimensional Gaussian noise with covariance matrix $R \in \mathbb {R}^{d_y \times d_y}$.

Let $L:\mathbb {R}^{d_x} \to \mathbb {R}$ denote the corresponding negative log-likelihood function. Since $\nu $ is Gaussian it is given by

$$\displaystyle \begin{aligned} {} L(x) = \frac{1}{2}(h(x) - y_T)^\top R^{-1} (h(x) - y_T) \end{aligned} $$

(5)

up to an irrelevant constant. The observations are combined with the predictive density $\pi _T$ at time $t=T$ according to Bayes’ theorem,

$$\displaystyle \begin{aligned} {} \pi^{\mathrm{a}}_T = \frac{e^{-L} \pi_T}{\int e^{-L(x)} \pi_T(x) \mathrm{d}x} . \end{aligned} $$

(6)

The process of transforming the random variable $X_T \sim \pi _T$ into a random variable $X_T^a \sim \pi _T^{\mathrm {a}}$ is called data assimilation in the context of dynamical systems and stochastic processes [10, 17, 8].

Since performing data assimilation can be difficult if the relative Kullback–Leibler divergence

$$\displaystyle \begin{aligned} \mathrm{KL}(\pi_T|\pi_T^{\mathrm{a}}) = \int_{\mathbb{R}^{d_x}} \pi_T(x) (\log \pi_T(x)-\log \pi_T^{\mathrm{a}}(x)) \mathrm{d}x, \end{aligned} $$

(7)

also called the relative entropy [13], between the prior $\pi _T$ and posterior $\pi _T^a$ is large and/or if the involved distributions are strongly non-Gaussian [19, 1], we propose to construct a new SDE with state process $X_t^{\mathrm {h}}$ such that $X_0^{\mathrm {h}} \sim \pi _0$ and $X_T^{\mathrm {h}} \sim \pi _T^a$. In other words, we are looking for a stochastic process (bridge) with initial density $\pi _0$ and final density $\pi ^{\mathrm {a}}_T$. The problem of finding the optimal process (in the sense of minimal Kullback–Leibler divergence) is known as the Schrödinger bridge problem [4].

3 Schrödinger Bridge Approach

The Bayesian adjustment (6) at final time $t=T$ leads in fact to an adjustment over the whole solution space of the underlying diffusion process described by (1). Let us denote the so called smoothing distribution by $\pi _t^{\mathrm {a}}$, $t \in [0,T]$ [18, 5]. It is well established that these marginal distributions can be generated from a controlled SDE

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t^{\mathrm{a}} = f(X^{\mathrm{a}}_t)\mathrm{d}t + g_t^{\mathrm{a}}(X^{\mathrm{a}}_t) \mathrm{d}t + \sqrt{2\sigma} \mathrm{d}W_t \end{aligned} $$

(8)

for appropriate control $g_t^{\mathrm {a}}:\mathbb {R}^{d_x} \to \mathbb {R}^{d_x}$ such that $X_0^{\mathrm {a}}\sim \pi _0^{\mathrm {a}}$ implies $X_t^{\mathrm {a}}\sim \pi _t^{\mathrm {a}}$ for all $t>0$. It is also well known that finding a suitable $g_t^{\mathrm {a}}$ can be formulated as an optimal control problem which in turn is closely related to the backward Kolmogorov equation [13, 16]. Formulations related to (8) have also been used in the context of sequential Monte Carlo methods [5].

As proposed in [16], an alternative perspective on sequential data assimilation is provided by Schrödinger bridges. Given two marginal distributions $q_0$ and $q_T$ and stochastic process $X_t$ (referred to as the reference process), a Schrödinger bridge is another stochastic process $\hat {X}_t$ such that $\hat {X}_0 \sim q_0$, $\hat {X}_T \sim q_T$ and the Kullback–Leibler divergence between the processes $\{\hat {X}_t\}_{t\in [0,T]}$ and $\{X_t\}_{t\in [0,T]}$ is minimal. Specialised to our problem and considering a single data assimilation cycle that means the marginals are the initial and posterior densities, i.e. $q_0 = \pi _0$ and $q_T = \pi ^{\mathrm {a}}_T$, and the reference process is the solution to (1). The solution to the associated Schrödinger bridge problem is again of the form (8) with modified control term denoted by $g_t^{\mathrm {SB}}(x)$.

A Schrödinger bridge is thus the optimal coupling as measured by the Kullback–Leibler divergence to the underlying reference process. Unfortunately Schrödinger bridges lead to boundary value problems in the space of probability measures and the required control term $g_t^{\mathrm {SB}}$ seems rather difficult to compute in practice. In addition to the computational complexity of solving nonlinear Schrödinger bridge problems, the target distribution $\pi _T^a$ is implicitly defined in the setting of data assimilation. The next section offers a solution to both of these issues. We point to [23] for a discussion of alternative approaches which introduce appropriate control terms into data assimilation procedures.

4 Homotopy Induced Dynamic Coupling

Since Schrödinger bridges are computationally challenging, we ask whether a less optimal but cheaper approach might also be feasible. Indeed, in the context of data assimilation a non-optimal coupling can be found via a homotopy between the initial and target distribution as follows. Let

$$\displaystyle \begin{aligned} \pi^{\mathrm{h}}_t(x) = Z^{-1}_t e^{-\frac{t}{T}L(x)} \pi_t(x) \end{aligned} $$

(9)

denote the homotopy in question, with $Z_t = \int e^{-\frac {t}{T}L(x)} \pi _t(x) \mathrm {d}x$ the time dependent normalization constant. It clearly holds that $\pi ^{\mathrm {h}}_0 = \pi _0$ and $\pi ^{\mathrm {h}}_T = \pi _T^{\mathrm {a}}$. Note that the scaling $t \mapsto e^{-\frac {t}{T}L}$ was chosen for its simplicity and follows previous work on Bayesian inference problems [7, 14]. Finding better homotopies or systematic ways of constructing one could be an interesting direction for future research.

We can then reason backwards from the Fokker–Planck equation of $\pi ^h_t$ to conclude that if it is the density of a random variable $X^h_t$ then that random variable must satisfy the modified SDE:

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t^{\mathrm{h}} = f(X^{\mathrm{h}}_t)\mathrm{d}t - \frac{\sigma t}{T} \nabla L(X^{\mathrm{h}}_t)\mathrm{d}t + g_t(X^{\mathrm{h}}_t) \mathrm{d}t + \sqrt{2\sigma} \mathrm{d}W_t , \end{aligned} $$

(10)

where $g_t$ is a solution to the PDE

$$\displaystyle \begin{aligned} {} \nabla \cdot (\pi^{\mathrm{h}}_t g_t) = \frac{1}{T} \pi^h_t \left( L + t \nabla L \cdot \left( f - \frac{\sigma t}{T} \nabla L - \sigma \nabla \log \pi^{\mathrm{h}}_t \right) \right) + \pi^{\mathrm{h}}_t \frac{\dot{Z}_t}{Z_t} . \end{aligned} $$

(11)

The derivations of (11) can be found in Appendix 1.

Note that (10) constitutes a mean field model since $g_t$ depends on the distribution $\pi _t^{\mathrm {h}}$ of $X_t^{\mathrm {h}}$. We also wish to point out that

$$\displaystyle \begin{aligned} \hat g_t^{\mathrm{SB}}(x) := - \frac{\sigma t}{T} \nabla L(x) + g_t(x) \end{aligned} $$

(12)

provides a solution to the associated coupling problem. The control term $\hat g_t^{\mathrm {SB}}$ is however non-optimal in the sense of the Schrödinger bridge problem since it does not minimise the Kullback–Leibler divergence.

Since (11) is linear in $g_t$ we can decompose (11) into a set of simpler equations $\nabla \cdot \left ( \pi ^{\mathrm {h}}_t g^i_t \right ) = \pi ^{\mathrm {h}}_t (k^i - \mathbb {E} k^i)$ such that the $k^i$ add up to the right hand side of (11). In order to maintain $\int _{ }^{} \nabla \cdot \left ( \pi ^{\mathrm {h}}_t g_t \right ) = 0$ for the individual $g^i_t$ we can make use of the fact that the terms in (11) are of the form $\pi ^{\mathrm {h}}_t \left ( k - \mathbb {E} k \right )$. Separating the terms, we obtain the following equations, the sum of whose solutions solves (11):

$$\displaystyle \begin{aligned} {} \nabla \cdot \left( \pi^{\mathrm{h}}_t g_t^1 \right) &= \pi^{\mathrm{h}}_t \left( \frac{L}{T} - \mathbb{E}\frac{L}{T} \right) \end{aligned} $$

(13a)

$$\displaystyle \begin{aligned} {} \nabla \cdot \left( \pi^{\mathrm{h}}_t g_t^2 \right) &= \pi^{\mathrm{h}}_t \left( \frac{t}{T} \nabla L \cdot f - \mathbb{E} \frac{t}{T} \nabla L \cdot f \right) \end{aligned} $$

(13b)

$$\displaystyle \begin{aligned} {} \nabla \cdot \left( \pi^{\mathrm{h}}_t g_t^3 \right) &= - \pi^{\mathrm{h}}_t \left( \frac{\sigma t}{T} \nabla L \cdot \nabla \log \pi^{\mathrm{h}}_t - \mathbb{E} \frac{\sigma t}{T} \nabla L \cdot \nabla \log \pi^{\mathrm{h}}_t \right) \end{aligned} $$

(13c)

$$\displaystyle \begin{aligned} {} \nabla \cdot \left( \pi^{\mathrm{h}}_t g_t^4 \right) &= - \pi^{\mathrm{h}}_t \left( \frac{\sigma t^2}{T^2} \nabla L \cdot \nabla L - \mathbb{E} \frac{\sigma t^2}{T^2} \nabla L \cdot \nabla L \right) . \end{aligned} $$

(13d)

Note that

$$\displaystyle \begin{aligned} \pi^{\mathrm{h}}_t \nabla L \cdot \nabla \log \pi^{\mathrm{h}}_t = \nabla \left( \pi^{\mathrm{h}}_t \nabla L \right) - \pi^{\mathrm{h}}_t \Delta L , \end{aligned} $$

(14)

which can be used to avoid the computation of $\nabla \log \pi _t^{\mathrm {h}}$ (with $\Delta = \nabla \cdot \nabla $ the Laplacian operator) in (13c). Thus the controlled SDE (10) can be replaced by

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t^{\mathrm{h}} = f(X^{\mathrm{h}}_t)\mathrm{d}t - \frac{2\sigma t}{T} \nabla L(X^{\mathrm{h}}_t)\mathrm{d}t + \hat g_t(X^{\mathrm{h}}_t) \mathrm{d}t + \sqrt{2\sigma} \mathrm{d}W_t , \end{aligned} $$

(15)

where $\hat g_t$ is a solution to

$$\displaystyle \begin{aligned} {} \nabla \cdot (\pi^{\mathrm{h}}_t \hat g_t) = \frac{1}{T} \pi^{\mathrm{h}}_t \left( L + t \nabla L \cdot \left( f - \frac{\sigma t}{T} \nabla L \right) + \sigma t \Delta L \right) + \pi^{\mathrm{h}}_t \frac{\dot{Z}_t}{Z_t} . \end{aligned} $$

(16)

Furthermore, if $\Delta L$ is a constant (as a function of x) or small in comparison to the other contributions in (16), then (16) simplifies further. In particular, this is the case if the forward map is linear, that is, $h(x) = Hx$.

Building upon the mean field ODE (3), one obtains the equivalent controlled mean field ODE system

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t}\tilde X_t^{\mathrm{h}} = f(\tilde X^{\mathrm{h}}_t) - \frac{\sigma t}{T} \nabla L(\tilde X^{\mathrm{h}}_t) + g_t(\tilde X^{\mathrm{h}}_t) - \sigma\nabla \log \tilde \pi^{\mathrm{h}}_t(\tilde X_t^{\mathrm{h}}) , \end{aligned} $$

(17)

with $g_t$ defined as before. This mean field formulation again requires knowledge (or approximation) of $\nabla \log \tilde \pi _t^{\mathrm {h}}$. A Gaussian approximation might be sufficient in certain circumstances giving rise to

$$\displaystyle \begin{aligned} \nabla \log \tilde \pi_t^{\mathrm{h}}(x) \approx -(\tilde \Sigma_t^{ \mathrm{h}})^{-1}(x-\tilde \mu_t^{\mathrm{h}}), \end{aligned} $$

(18)

where $\tilde \mu _t^{\mathrm {h}}$ denotes the mean of $\tilde X_t^{\mathrm {h}}$ and $\tilde \Sigma _t^{\mathrm {h}}$ its covariance matrix.

5 Numerical Implementation

No analytic solution to (11) is known and we thus have to resort to approximations. We note that a similar PDE arises in the computation of the gain in the feedback particle filter [24] and one could use the diffusion map based approximation [22] for the problem at hand. This method also transforms the PDE into a Poisson equation which it then translated into an equivalent integral equation, the semi-group form of the Poisson equation. As the name suggests the integral equation makes use of the generator of a semi-group which can be approximated by diffusion maps. Here we instead propose to follow the constant gain approximation first introduced in the EnKF methodology [21].

5.1 Ensemble Kalman Mean Field Approximation

Let us assume that $\Delta L \approx const$ in (16). Then we only need to deal with the modified negative log likelihood function

$$\displaystyle \begin{aligned} \tilde L(x) &= \frac{1}{T} \left( L(x) + t\nabla L(x) \cdot \left(f(x)-\frac{\sigma t}{T} \nabla L(x)\right)\right) \end{aligned} $$

(19a)

$$\displaystyle \begin{aligned} &\approx \frac{1}{T} \left( L(x) + \frac{t}{\Delta t}\left\{ L(x) - L\left( x-\Delta t f(x) + \Delta t\frac{t\sigma}{T} \nabla L(x)\right)\right\} \right) \end{aligned} $$

(19b)

with $\Delta t$ being the time-step also used later for time-stepping the evolution equations (10) or (15), respectively. Since L is given by (5), we define the modified forward map

$$\displaystyle \begin{aligned} \tilde h(x) = h\left(x - \Delta t f(x) + \Delta t \frac{\sigma t}{T} \nabla L(x)\right) \end{aligned} $$

(20)

and thus

$$\displaystyle \begin{aligned} {} \tilde L(x) \approx \frac{t+\Delta t}{2\Delta t T} (h(x)-y_T)^\top R^{-1}(h(x)-y_T) -\, \frac{t}{2\Delta t T} (\tilde h(x)-y_T)^\top R^{-1}(\tilde h(x) - y_T). \end{aligned} $$

(21)

Following the standard EnKF methodology for quadratic loss functions, this suggests to approximate the drift function $\hat g_t$ in (15) as follows:

$$\displaystyle \begin{aligned} \hat g_t^{\mathrm{KF}}(x) &= -\frac{t + \Delta t}{\Delta t T}\Sigma_t^{xh}R^{-1}\left( \frac{1}{2} \left( h(x)+ \pi_t^{\mathrm{h}}[h]\right) - y_T\right) \end{aligned} $$

(22a)

$$\displaystyle \begin{aligned} & \qquad \qquad + \frac{t}{\Delta t T}\Sigma_t^{x\tilde h}R^{-1} \left(\frac{1}{2} \left(\tilde h(x) +\pi_t^{\mathrm{h}}[\tilde h] \right) -y_T\right). \end{aligned} $$

(22b)

Here we have introduced the notation $\pi _t^{\mathrm {h}}[l]$ to denote the expectation value $\mathbb {E} l$ of a function $l(x)$ under the PDF $\pi _t^{\mathrm {h}}$. Furthermore, $\Sigma _t^{xh}$ denotes the correlation matrix between x and $h(x)$ under the PDF $\pi _t^{\mathrm {h}}$ etc. The derivation of (22) can be found in Appendix 2.

5.2 Particle Approximation and Time-Stepping

The controlled mean field equations (15) can be implemented numerically by the standard Monte Carlo Ansatz, that is, M particles $X_t^{(i)}$ are propagated according to

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t^{(i)} = f(X^{(i)}_t)\mathrm{d}t - \frac{2\sigma t}{T} \nabla L(X^{(i)}_t)\mathrm{d}t + \hat g_t^{\mathrm{KF}}(X^{(i)}_t) \mathrm{d}t + \sqrt{2\sigma} \mathrm{d}W_t^{(i)} \end{aligned} $$

(23)

for $i=1,\ldots ,M$. The required expectation values in $\hat g_t^{\mathrm {KF}}$ are evaluated with respect to the empirical measure

$$\displaystyle \begin{aligned} \hat \pi_t^{\mathrm{h}}(x) = \frac{1}{2} \sum_{i=1}^M \delta (x-X_t^{(i)}). \end{aligned} $$

(24)

The interacting particle system can be time-stepped using an appropriate adaptation of (61) from Appendix 2. The computation of gradients can be avoided by applying the statistical linearisation (60).

6 Examples

We now discuss a sequence of increasingly complex examples. The purpose is both to illuminate certain aspects of the proposed control terms as well as to indicate the computational advantages of the proposed methodology. All examples will be based on linear forward maps $h(x) = Hx$ and, therefore $\Delta L$ is constant and can be ignored.

6.1 Pure Diffusion Processes

We set the drift f to zero in (1) and also assuming Gaussian initial conditions. Then the control term (22) gives rise to the mean-field SDE

$$\displaystyle \begin{aligned} &\mathrm{d}X_t^{\mathrm{h}} = \sqrt{2\sigma}\mathrm{d}W_t - \frac{2\sigma t}{T} H^\top R^{-1} (H X_t^{\mathrm{h}}-y_T) \mathrm{d}t \end{aligned} $$

(25a)

$$\displaystyle \begin{aligned} & \qquad \qquad \,{-}\,\Sigma_t^{\mathrm{h}}H^\top \left\{ \frac{1}{T} R^{{-}1} \,{-}\, \frac{2\sigma t^2}{T^2} R^{{-}1} H H^\top R^{{-}1} \right\} \left(\frac{1}{2} \left(HX_t^{\mathrm{h}} \,{+}\, H\mu_t^{\mathrm{h}} \right)\,{-}\,y_T \right)\mathrm{d}t \end{aligned} $$

(25b)

in the limit $\Delta t\to 0$. We note that $X_t^{\mathrm {h}} \sim \pi _t^{\mathrm {h}}$ will remain Gaussian for all times and we denote the mean by $\mu _t^{\mathrm {h}}$ and the covariance matrix by $\Sigma _t^{\mathrm {h}}$. Hence, it holds that $\Sigma _t^{xx} = \Sigma _t^{\mathrm {h}}$ and $\pi _t^{\mathrm {h}}[x] = \mu _t^{\mathrm {h}}$.

Please note that the additional drift term in (25a) is pulling $X_t^{\mathrm {h}}$ towards the observation $y_T$ regardless of the value of $\Sigma _t^{\mathrm {h}}$. It should also be noted that the drift term in (25b) can be both attractive or repulsive with regard to the observation $y_T$ depending on the eigenvalues of

$$\displaystyle \begin{aligned} \Omega_t = \frac{1}{T} R^{-1} - \frac{2\sigma t^2}{T^2} R^{-1} H H^\top R^{-1}. \end{aligned} $$

(26)

The strength of this drift term is moderated by the covariance matrix $\Sigma _t^{\mathrm {h}}$.

We consider a one-dimensional problem with $R = 0.01$, $\sigma = 1$, $H = 1$, $y_T = 1$ and $T=1$. The initial conditions are Gaussian with mean $\mu _0 = 0$ and variance $\Sigma _0 = 1$. It follows that $\pi _1$ is Gaussian with mean $\mu _1 = 0$ and variance $\Sigma _1 = 2$ and the resulting Gaussian posterior $\pi _1^a$ has mean and variance given by

$$\displaystyle \begin{aligned} {} \mu_1^a = K y_T \approx 0.9524, \qquad \mu_1^a = 2 - 2K \approx 0.0952 \end{aligned} $$

(27)

with Kalman gain $K = 2/(2+0.01) \approx 0.9524$.

In Fig. 1 one can find the time evolution of the mean and the variance under the mean field equations (25). The early impact of the data driven control term on the dynamics is perhaps surprising and quite opposite to the standard sequential approach to data assimilation where one first propagates to final time and only then adjusts according to the available data. It is also worth noticing that the sign of the corresponding $\Omega _t$ changes sign at $t_c = \sqrt {2}/20$ implying that the drift term in (25) has a destabilizing effect on the dynamics for $t>t_c$.

6.2 Purely Deterministic Processes

We now set $\sigma =0$ in (1). We obtain from (22) the mean field ODE system

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^{\mathrm{h}} &= f(X_t^{\mathrm{h}}) - \frac{t+\Delta t}{\Delta t T} \Sigma_t^{\mathrm{h}} H^\top R^{-1} \left( \frac{1}{2} H \left(X_t^{\mathrm{h}} + \mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$

(28a)

$$\displaystyle \begin{aligned} & \qquad +\,\frac{t}{\Delta t T} \Sigma_t^{x\tilde h} R^{-1} \left( \frac{1}{2} \left( \tilde h(X_t^{\mathrm{h}}) + \pi_t^{\mathrm{h}}[\tilde h]\right) - y_T \right) \end{aligned} $$

(28b)

with

$$\displaystyle \begin{aligned} \tilde h(x) = Hx - \Delta t H f(x). \end{aligned} $$

(29)

These equations can be expanded giving rise to

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^{\mathrm{h}} &= f(X_t^{\mathrm{h}}) - \frac{1}{T} \left\{ \Sigma_t^{\mathrm{h}} + t \Sigma_t^{xf}\right\} H^\top R^{-1} \left( \frac{1}{2} H \left(X_t^{\mathrm{h}} + \mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$

(30a)

$$\displaystyle \begin{aligned} & \qquad -\,\frac{t}{2T} \Sigma_t^{\mathrm{h}} H^\top R^{-1} H \left( f(X_t^{\mathrm{h}}) + \pi_t^{\mathrm{h}}[f])\right) \end{aligned} $$

(30b)

upon ignoring terms of order $\mathcal {O}(\Delta t)$. Unless the drift function f is linear, these mean field equations provide only an approximation to the controlled mean field equations (15).

6.3 Linear Gaussian Case

It is instructive to investigate the linear case

$$\displaystyle \begin{aligned} {} f(x) = Fx + b \end{aligned} $$

(31)

in more detail where again everything remains Gaussian provided $X_0^h$ is Gaussian distributed, that is $\pi _0(x) = \mathcal {N}(x; \mu _0, \Sigma _0)$. Under these conditions the densities $\pi _t$ and $\pi ^{\mathrm {h}}_t$ will also be Gaussian, we write $\pi ^{\mathrm {h}}_t(x) = \mathcal {N}(x; \mu ^{\mathrm {h}}_t, \Sigma ^{\mathrm {h}}_t)$. The associated mean field equations follow from Appendix 2 and are given by

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^{\mathrm{h}} &= FX_t^{\mathrm{h}} + b + \sigma (\Sigma_t^{\mathrm{h}})^{-1} (X_t^{\mathrm{h}}-\mu_t^{\mathrm{h}}) - \frac{2\sigma t}{T} H^\top R^{-1} (H X_t^{\mathrm{h}}-y_T) \end{aligned} $$

(32a)

$$\displaystyle \begin{aligned} & \qquad -\, C_t H^\top R^{-1} \left( \frac{1}{2} H \left(X_t^{\mathrm{h}} + \mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$

(32b)

$$\displaystyle \begin{aligned} & \qquad -\,\frac{t}{T} \Sigma_t^h H^\top R^{-1} H \left( \frac{1}{2} F\left( X_t^{\mathrm{h}} + \mu_t^{\mathrm{h}} \right) + b\right). \end{aligned} $$

(32c)

with

$$\displaystyle \begin{aligned} C_t = \frac{1}{T} \Sigma_t^{\mathrm{h}} + \frac{t}{T} \Sigma_t^{\mathrm{h}} F^\top - \frac{2\sigma t^2}{T^2} \Sigma_t^{\mathrm{h}} H^\top R^{-1} H . \end{aligned} $$

(33)

A qualitative discussion can be performed in the scalar case, that is $d_x = 1$, $H = 1$, $\sigma = 1$, $b=0$, $T=1$ and $F = \lambda $. One finds that the control terms involving F stabilize the dynamics whenever $\lambda >0$. This observation is in line with the fact that the data is crucial only if the dynamics in $X_t$ is unstable, that is, $\lambda > 0$.

We consider a two dimensional diffusion process with state variable $x = (x_1,x_2)^\top $ and linear drift term (31) given by

$$\displaystyle \begin{aligned} F = \left( \begin{array}{cc} -2 & 1 \\ 1 & -2 \end{array} \right), \end{aligned} $$

(34)

$b = 0$, and diffusion constant $\sigma = 0.1I$. The forward operator is $H= \begin {pmatrix} 1 & 0 \end {pmatrix} $ and thevariance of the noise $R=0.01$. The initial distribution was $\pi _0 = \mathcal {N}( (1, 3), 0.02I )$. The observed value at time $T=1$ is set to $y_T = 2.5$.

The posterior mean takes values $\mu _1^{\mathrm {a}} \approx 2.25$ and $\mu _2^{\mathrm {a}} \approx 1.50$, while the posterior covariance matrix becomes

$$\displaystyle \begin{aligned} \Sigma^{\mathrm{a}} \approx \left( \begin{array}{cc} 0.0086 & 0.0039\\ 0.0039 & 0.0503 \end{array} \right). \end{aligned} $$

(35)

Numerical results can be found in Fig. 2. The impact of the control term on the linear diffusion process can clearly be seen and is most prominent on the observed $x_1$ component of the process. The final values of the controlled process agree well with their posterior counterparts.

6.4 Nonlinear Diffusion Example

We consider a two-dimensional problem and denote the state variable by $x = (x_1,x_2)^\top $. The drift term is given by

$$\displaystyle \begin{aligned} f(x) = -\nabla V(x), \qquad V(x) = \frac{\lambda_1}{2} \left(x_2 - 2 + \beta x_1^2\right)^2 + \frac{\lambda_2}{2}\left(\frac{x_1^4}{2}-x_1^2\right) \end{aligned} $$

(36)

with parameters $\lambda _1 = 2000$, $\lambda _2 = 5$, and $\beta = 1/5$. The diffusion constant is set to $\sigma = 1$. The choice of the potential $V(x)$ has two effects: (1) there is a relative high barrier for particles to pass from positive to negative $x_1$-values and vice versa; (2) the dynamics stay close to the parabola $x_2 = 2- \beta x_1^2$.

The initial distribution is obtained by sampling $x_1$ from a Gaussian with mean $1.5$ and variance $0.0625$. The $x_2$ component is obtained from the relation

$$\displaystyle \begin{aligned} x_2 = 2 - \beta x_1^2. \end{aligned} $$

(37)

We observe the first component $x_1$ of the state vector at time $T=1$ with measurement error variance $R = 0.01$. The observed value is set to $y_T = -1.5$. Due to the tiny observation error the posterior is centred sharply about the observed value. Furthermore, recall that the dynamics is essentially slaved to the parabola $x_2 = 2 - \beta x_1^2$ which makes the inference problem strongly nonlinear.

All particle simulations are run with an ensemble size of $M=1000$. Essentially identical results are obtained for $M=100$. Smaller ensemble sizes lead to numerical instabilities.

In Fig. 3, one can find the particle distribution at time $t=1$ which constitutes the prior distribution for the associated Bayesian inference problem. It is obvious that a particle filter would fail to recover the posterior distribution which is sharply centered about the observed value. We found that increasing the ensemble size to $M = 10{,}000$ allows a particle filter to recover the posterior distribution; but the effective ensemble size still drops dramatically. The approximation provided by the EnKF is also displayed. The EnKF fails to recover the posterior due to its inherent linear regression ansatz which is inappropriate for this strongly nonlinear inference problem even in the limit ensemble size $M\to \infty $.

In Fig. 4, the results from the controlled mean field formulation are displayed. It can be concluded that the posterior distribution is well approximated despite the constant gain approximation made in order to formulate the control term $\hat g_t^{\mathrm {KF}}$ in (22).

6.5 Lorenz-63 Example

All examples so far have considered a single data assimilation cycle only. We now perform a proper sequential data assimilation experiment for the standard Lorenz-63 model [11]

$$\displaystyle \begin{aligned} {} \frac{\mathrm{d}}{\mathrm{d}t} X_t = f(X_t) , \end{aligned} $$

(38)

where $X_t:\Omega \to \mathbb {R}^3$ and

$$\displaystyle \begin{aligned} f(x,y,z) = \left(\begin{array}{c} a(y-x) \\ x(b-z)-y \\ xy-cz \end{array} \right) \end{aligned} $$

(39)

with parameters $a=10$, $b=28$ and $c=8/3$.

In order to obtain a reference solution ) for $t\ge 0$, the ODE (38) is solved numerically with step-size $\Delta t = 0.005$ and initial condition

(40)

Scalar-valued observations are generated every $\Delta t_{\mathrm {obs}}>0$ units of time using the forward model

$$\displaystyle \begin{aligned} y_{n\Delta t_{\mathrm{obs}}} = H X_{n\Delta t_{\mathrm{obs}}} + \nu_n, \qquad n = 1,\ldots N, \end{aligned} $$

(41)

with measurement errors $\nu _n \sim \mathrm {N}(0,1)$ and forward map $H = (1 \,0 \,0) \in \mathbb {R}^{1\times 3}$. We use $\Delta t_{\mathrm {obs}} \in \{0.05,0.1,0.12\}$ in our experiments and perform $N = 20{,}000$ assimilation cycles.

The initial ensemble $\{X_0^{(i)}\}_{i=1}^M$ is drawn from the Gaussian distribution with mean ) and covariance matrix $0.01 I$. We employ multiplicative ensemble inflation which amounts to replacing the Lorenz-63 dynamics by

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^{(i)} = f(X_t^{(i)}) + \sigma_k (X_t^{(i)} - \hat \mu_t ), \qquad i = 1,\ldots,M, \end{aligned} $$

(42)

with inflation factors

$$\displaystyle \begin{aligned} \sigma_k = 0.025 k, \qquad k = 0,\ldots,9. \end{aligned} $$

(43)

Here $\hat \mu _t$ denotes the empirical mean of the ensemble $\{X_t^{(i)}\}_{i=1}^M$. These equations are combined with the augmented evolution equations (30) and solved numerically with step-size $\Delta t = 0.005$ and ensemble sizes $M \in \{5,10,15\}$.

We report the resulting root mean square errors

(44)

which are computed for each ensemble size M, observation interval $\Delta t_{\mathrm {obs}}$ and inflation factor $\sigma _k$. The results are displayed in Table 1 where the smallest RMSE over the range of inflation factors $\{\sigma _k\}_{k=0}^9$ is stated for each M and $\Delta t_{\mathrm {obs}}$. We also state the corresponding RMSEs from a standard ensemble square root filter implementation [2, 8]. We find that the proposed homotopy approach outperforms the ensemble square root filter in terms of RMSE in all settings considered. The improvements increase for increasing observation intervals $\Delta t_{\mathrm {obs}}$. The homotopy approach also appears less sensitive to the ensemble size M.

Table 1 RMSE for both a standard ensemble square root filter (ESRF) implementation and our homotopy approach in terms of ensemble sizes $M \in \{5,10,15\}$ and observation intervals $\Delta t_{\mathrm {obs}} \in \{0.05,0.1,0.12\}$. The homotopy based data assimilation method leads to significantly reduced RMSEs in all settings

Full size table

We close this example by pointing out that less of an improvement could be expected for a fully observed Lorenz-63 system. The proposed homotopy approach seems particularly effective in guiding the unobserved solution components to regions of high posterior probability. See also the example from Sect. 6.4.

7 Conclusions

Devising alternative proposal densities has a long history in the context of sequential data assimilation and filtering. Here we have explored a computationally tractable approach which combines the concept of Schrödinger bridges with a rather straightforward homotopy approach. A further key ingredient is the approximate solution of the arising PDEs in terms of a constant gain approximation, which is also widely used within the EnKF community. Numerical examples indicate that the approach is viable and can overcome limitations of both standard sequential Monte Carlo as well as standard EnKF methods. This has been demonstrated for single assimilation steps as well as long-time data assimilation using the chaotic Lorenz-63 model with only the first component observed infrequently. It remains to be seen how the proposed methods behave for high dimensional stochastic processes.

References

S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, and A. Stuart. Importance sampling: Intrinsic dimension and computational cost. Statistical Science, pages 405–431, 2017.
Google Scholar
M. Asch, M. Bocquet, and M. Nodet. Data assimilation: Methods, algorithms, and applications. SIAM, 2016.
Book MATH Google Scholar
E. Bernton, J. Heng, A. Doucet, and P. E. Jacob. Schrödinger bridge samplers. Technical report, arXiv:1912.13170, 2019.
Google Scholar
Y. Chen, T. T. Georgiou, and M. Pavon. Stochastic control liaisons: Richard Sinkhorn meets Gaspard Monge on a Schrödinger bridge. SIAM Review, 63: 249–313, 2021. doi: https://doi.org/10.1137/20M1339982.
N. Chopin and O. Papaspiliopoulos. An Introduction to Sequential Monte Carlo. Springer Nature Switzerland AG, Cham, Switzerland, 2020.
Google Scholar
A. Corenflos, J. Thornton, G. Deligiannidis, and A. Doucet. Differentiable particle filtering via entropy-regularized optimal transport. In International Conference on Machine Learning, pages 2100–2111. PMLR, 2021.
Google Scholar
F. Daum, J. Huang, and A. Noushin. Exact particle flow for nonlinear filters. In I. Kadar, editor, Signal Processing, Sensor Fusion, and Target Recognition XIX, volume 7697, pages 92–110. International Society for Optics and Photonics, SPIE, 2010. doi: https://doi.org/10.1117/12.839590.
G. Evensen, F. Vossepoel, and P. van Leeuwen. Data Assimilation Fundamentals: A unified Formulation of the State and Parameter Estimation Problem. Springer Nature Switzerland AG, Cham, Switzerland, 2022.
Google Scholar
J. Heng, A. N. Bishop, G. Deligiannidis, and A. Doucet. Controlled sequential Monte Carlo. The Annals of Statistics, 48: 2904–2929, 2020. doi: https://doi.org/10.1214/19-AOS1914.
K. Law, A. Stuart, and K. Zygalakis. Data assimilation. Cham, Switzerland: Springer, 214, 2015.
Google Scholar
E. Lorenz. Deterministic non-periodic flows. J. Atmos. Sci., 20: 130–141, 1963.
Article Google Scholar
B. Oksendal. Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013.
Google Scholar
G. A. Pavliotis. Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, volume 60 of Texts in Applied Mathematics. Springer New York, New York, NY, 2014. doi: https://doi.org/10.1007/978-1-4939-1323-7.
S. Reich. A dynamical systems framework for intermittent data assimilation. BIT Numerical Mathematics, 51 (1): 235–249, 2011.
Article MathSciNet MATH Google Scholar
S. Reich. A nonparametric ensemble transform method for Bayesian inference. SIAM J. Sci. Comput., 35: A2013–A2024, 2013. doi: https://doi.org/10.1137/130907367.
S. Reich. Data assimilation: The Schrödinger perspective. Acta Numerica, 28: 635–711, 2019.
Article MathSciNet MATH Google Scholar
S. Reich and C. Cotter. Probabilistic forecasting and Bayesian data assimilation. Cambridge University Press, 2015.
Book MATH Google Scholar
S. Särkkä. Bayesian Filtering and Smoothing. Cambridge University Press, Cambridge, 2013.
Book MATH Google Scholar
C. Snyder, T. Bengtsson, P. Bickel, and J. Anderson. Obstacles to high-dimensional particle filtering. Monthly Weather Review, 136 (12): 4629–4640, 2008.
Article Google Scholar
A. Spantini, R. Baptista, and Y. Marzouk. Coupling techniques for nonlinear ensemble filtering. arXiv preprint arXiv:1907.00389, 2019.
Google Scholar
A. Taghvaei, J. de Wiljes, P. Mehta, and S. Reich. Kalman filter and its modern extensions for the continuous-time nonlinear filtering problem. ASME. J. Dyn. Sys., Meas., Control., 140: 030904–030904–11, 2017. doi: https://doi.org/10.1115/1.4037780.
A. Taghvaei, P. Mehta, and S. Meyn. Diffusion map-based algorithm for gain function approximation in the feedback particle filter. SIAM/ASA J. Uncertain. Quantif., 8 (3): 1090–1117, 2020. doi: https://doi.org/10.1137/19M124513X.
P. van Leeuwen, H. Künsch, L. Nerger, R. Potthast, and S. Reich. Particle filters for high-dimensional geoscience applications: A review. Q. J. Royal Meteorol. Soc., 145: 2335–2365, 2019. doi: https://doi.org/10.1002/qj.3551.
T. Yang, P. G. Mehta, and S. P. Meyn. Feedback particle filter. IEEE Trans. Automat. Control, 58 (10): 2465–2480, 2013. ISSN 0018-9286. doi: https://doi.org/10.1109/TAC.2013.2258825.
T. Yang, H. A. P. Blom, and P. G. Mehta. The continuous-discrete time feedback particle filter. In American Control Conference, pages 648–653. IEEE, 2014. doi: https://doi.org/10.1109/ACC.2014.6859259.

Download references

Acknowledgements

This research has been funded by the Deutsche Forschungsgemeinschaft (DFG)- Project-ID 318763901 - SFB1294. We thank Nikolai Zaki for earlier work on the topic of this paper.

Author information

Authors and Affiliations

University of Potsdam, Institute of Mathematics, Potsdam, Germany
Sebastian Reich

Authors

Sebastian Reich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Reich .

Editor information

Editors and Affiliations

Ifremer -- Institut Français de Recherche pour l’Exploitation de la Mer, Plouzané, France
Bertrand Chapron
Imperial College London, London, UK
Dan Crisan
Imperial College London, London, UK
Darryl Holm
Campus Universitaire de Beaulieu, Inria - Institut National de Recherche en Sciences et Technologies du Numérique, Rennes, France
Etienne Mémin
Imperial College London, London, UK
Anna Radomska

Appendices

Appendix 1: Derivation of Control Term Equation

Given an evolution equation

$$\displaystyle \begin{aligned} {} \mathrm{d}X_t = f(X_t)\mathrm{d}t + \sqrt{2\sigma}\mathrm{d}W_t \end{aligned} $$

(45)

we obtain the Fokker–Planck equation

$$\displaystyle \begin{aligned} {} \partial_t \pi_t = -\nabla \cdot \left( \pi_t \left( f - \sigma \nabla \log \pi_t \right) \right). \end{aligned} $$

(46)

Now we modify (45) by an additional drift term, i.e.

$$\displaystyle \begin{aligned} \mathrm{d}X^{\mathrm{h}}_t = f(X^{\mathrm{h}}_t)\mathrm{d}t + \tilde g_t(X^{\mathrm{h}}_t)\mathrm{d}t + \sqrt{2\sigma}\mathrm{d}W_t \end{aligned} $$

(47)

with $\tilde g_t:\mathbb {R}^{d_x} \to \mathbb {R}^{d_x}$. In that case, we would get a Fokker–Planck equation for the new equation:

$$\displaystyle \begin{aligned} {} \partial_t \pi^{\mathrm{h}}_t = -\nabla \cdot \left( \pi^{\mathrm{h}}_t \left( f + \tilde g - \sigma \nabla \log \pi^{\mathrm{h}}_t \right) \right) = -\nabla \cdot \left( \pi^{\mathrm{h}}_t \left( f - \sigma \nabla \log \pi^{\mathrm{h}}_t \right) \right) - \nabla \cdot (\pi^{\mathrm{h}}_t \tilde g_t). \end{aligned} $$

(48)

We can find $\tilde g_t$ in terms of known quantities as follows: we begin by taking the derivative of $\pi ^{\mathrm {h}}_t$ with respect to time:

$$\displaystyle \begin{aligned} \partial_t \pi^{\mathrm{h}}_t = - \pi^{\mathrm{h}}_t \left( \frac{\dot{Z}_t}{Z_t} + \frac{L}{T} \right) + \frac{1}{Z_t} e^{- \frac{t}{T} L } \partial_t \pi_t . \end{aligned} $$

(49)

Next we substitute (46) for $\partial _t \pi _t$ and use $\pi _t = Z_t e^{\frac {t}{T}L} \pi ^{\mathrm {h}}_t$:

$$\displaystyle \begin{aligned} \partial_t \pi^{\mathrm{h}}_t =& - \pi^{\mathrm{h}}_t \left( \frac{\dot{Z}_t}{Z_t} + \frac{L}{T} \right) - Z_t^{-1} e^{- \frac{t}{T} L } \nabla \cdot \left( \pi_t \left( f - \sigma \nabla \log \pi_t \right) \right) \end{aligned} $$

(50a)

$$\displaystyle \begin{aligned} =& - \pi^{\mathrm{h}}_t \left( \frac{\dot{Z}_t}{Z_t} + \frac{L}{T} \right) - Z_t^{-1}e^{- \frac{t}{T} L } \nabla \cdot \left( Z_t e^{\frac{t}{T}L}\pi^{\mathrm{h}}_t \left( f - \sigma \nabla \log Z_t e^{\frac{t}{T}L}\pi^{\mathrm{h}}_t \right) \right) \end{aligned} $$

(50b)

$$\displaystyle \begin{aligned} =& - \pi^{\mathrm{h}}_t \left( \frac{\dot{Z}_t}{Z_t} + \frac{L}{T} \right) - e^{- \frac{t}{T} L } \nabla \cdot \left( e^{\frac{t}{T}L}\pi^{\mathrm{h}}_t \left( f - \sigma \frac{t}{T}\nabla L - \sigma \nabla \log \pi^{\mathrm{h}}_t \right) \right) \end{aligned} $$

(50c)

$$\displaystyle \begin{aligned} =& - \pi^h_t \left( \frac{\dot{Z}_t}{Z_t} + \frac{L}{T} \right) - \nabla \cdot \left(\pi^{\mathrm{h}}_t \left(f- \sigma \frac{t}{T} \nabla L - \sigma \nabla \log \pi_t^{\mathrm{h}} \right) \right) \end{aligned} $$

(50d)

$$\displaystyle \begin{aligned} & \qquad - \,\,\frac{t}{T} \pi^{\mathrm{h}}_t \nabla L \cdot \left( f -\sigma \frac{t}{T}\nabla L- \sigma \nabla \log \pi_t^{\mathrm{h}}\right) . \end{aligned} $$

(50e)

Comparing with (48) it follows that we require

$$\displaystyle \begin{aligned} \nabla \cdot (\pi^{\mathrm{h}}_t \tilde g_t) = \frac{1}{T} \pi^{\mathrm{h}}_t \left( L + t \nabla L \cdot \left( f - \sigma \frac{t}{T} \nabla L - \sigma \nabla \log \pi^{\mathrm{h}}_t \right) \right) + \,\pi^{\mathrm{h}}_t \frac{\dot{Z}_t}{Z_t} - \nabla \cdot \left( \pi^{\mathrm{h}}_t \sigma \frac{t}{T} \nabla L \right). \end{aligned} $$

(51)

For the $\dot {Z}_t$ term we have

$$\displaystyle \begin{aligned} \frac{\dot{Z}_t}{Z_t} =& \frac{1}{Z_t} \int \partial_t e^{-\frac{t}{T}L(x)} \pi_t(x) dx \end{aligned} $$

(52a)

$$\displaystyle \begin{aligned} =& \frac{1}{Z_t} \int -\frac{L}{T} e^{-\frac{t}{T}L(x)} \pi_t(x) + e^{-\frac{t}{T}L(x)} \partial_t \pi_t(x) dx \end{aligned} $$

(52b)

$$\displaystyle \begin{aligned} =& -\frac{1}{T} \mathbb{E} L - \frac{1}{Z_t} \int e^{-\frac{t}{T}L(x)} \nabla \cdot \left( \pi_t \left( f -\sigma \nabla \log \pi_t \right) \right) dx \end{aligned} $$

(52c)

$$\displaystyle \begin{aligned} =& -\frac{1}{T} \mathbb{E} L - \frac{1}{Z_t} \int \frac{t}{T} e^{-\frac{t}{T}L(x)} \pi_t \nabla L \cdot \left( f -\sigma \nabla \log \pi_t \right) dx \end{aligned} $$

(52d)

$$\displaystyle \begin{aligned} =& -\frac{1}{T} \mathbb{E} L - \frac{t}{T} \mathbb{E} \nabla L \cdot \left( f -\sigma \nabla \log \pi_t \right) \end{aligned} $$

(52e)

$$\displaystyle \begin{aligned} =& -\frac{1}{T} \mathbb{E} L - \frac{t}{T} \mathbb{E} \nabla L \cdot \left( f -\sigma \nabla \log \pi^{\mathrm{h}}_t - \sigma \frac{t}{T} \nabla L \right) , \end{aligned} $$

(52f)

where the third equality follows from integration by parts and the expected value is with taken with respect to $\pi ^{\mathrm {h}}_t$. We finally note that

$$\displaystyle \begin{aligned} \tilde g_t(x) = g_t(x) - \sigma \frac{t}{T}\nabla L(x). \end{aligned} $$

(53)

Appendix 2: Ensemble Kalman Filter Approximations

We provide details on the derivation of the EnKF-like approximation (22) to the controlled mean field equation (10) and the various simplifications that arise from assuming a linear forward map.

We first recall that a continuous time formulation of the EnKF for a generic likelihood function

$$\displaystyle \begin{aligned} L(x) = \frac{1}{2} (\hat h(x)-y_T)^\top \hat R^{-1}(\hat h(x)-y_T) \end{aligned} $$

(54)

is provided by

$$\displaystyle \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t = - \Sigma_t^{x\hat h} \hat R^{-1} \left( \frac{1}{2}\left(\hat h(X_t) + \pi_t[h]\right) - y_T\right). \end{aligned} $$

(55)

Here $\pi _t$ denotes the law of $X_t$, $\pi _t[g]$ the expectation value of a function g under $\pi _t$, and

$$\displaystyle \begin{aligned} \Sigma_t^{x\hat h} = \pi_t\left[ (x-\pi_t[x]) (\hat h - \pi_t[h])^\top\right] \end{aligned} $$

(56)

is the covariance matrix between the state x and the forward map $\hat h$.

Formal application of this approach to the two contributions to the likelihood function (21) leads to (22). More precisely, the first term leads to $\hat h = h$ and

$$\displaystyle \begin{aligned} \hat R = \frac{t+\Delta t}{2\Delta t T}R \end{aligned} $$

(57)

while the second term results in $\hat h = \tilde h$ and

$$\displaystyle \begin{aligned} \hat R = -\frac{t}{\Delta t T}R. \end{aligned} $$

(58)

The EnKF makes use of statistical linearization

$$\displaystyle \begin{aligned} \Sigma_t^{xx} \pi_t[\nabla h] = \Sigma_t^{xh} \end{aligned} $$

(59)

which holds provided $\pi _t$ is Gaussian or if h is linear; a result known as Stein’s identity. The identity can also be used to approximate derivatives in a (weakly) non-Gaussian setting giving rise to

$$\displaystyle \begin{aligned} {} \nabla h(x) \approx (\Sigma_t^{xx})^{-1} \Sigma_t^{xh}. \end{aligned} $$

(60)

We also recall the robust time-stepping method

$$\displaystyle \begin{aligned} {} X_{t_{n+1}}-X_{t_n} = -\Delta t \Sigma_{t_n}^{x\hat h}\left(\Delta t \Sigma_{t_n}^{\hat h\hat h}+ R\right)^{-1}\left( \frac{1}{2}\left(\hat h(X_{t_n}) + \pi_{t_n}[h]\right) - y_T\right), \end{aligned} $$

(61)

which again can be adjusted appropriately to (22).

We now assume a linear forward map, that is $h(x) = Hx$, and discuss the simplifications that result in the computation of (22). Note that

$$\displaystyle \begin{aligned} \tilde h(x) = Hx - \Delta t Hf(x) + \Delta t \frac{\sigma t}{T} H H^\top R^{-1} H x. \end{aligned} $$

(62)

Hence the covariance matrix $\Sigma _t^{x\tilde h}$ can be reformulated to

$$\displaystyle \begin{aligned} \Sigma_t^{x\tilde h} = \Sigma_t^{xx}H^\top - \Delta t \Sigma_t^{xf}H^\top + \Delta t \frac{\sigma t}{T} \Sigma_t^{xx}H^\top R^{-1} H H^\top \end{aligned} $$

(63)

and (22) simplifies to

$$\displaystyle \begin{aligned} \hat g_t^{\mathrm{KF}}(x) &= -\frac{1}{T} \Sigma_t^{xx}H^\top R^{-1} \left( \frac{1}{2} \left( Hx + H\mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$

(64a)

$$\displaystyle \begin{aligned} &\quad - \,\frac{t}{T} \Sigma_t^{xf}H^\top R^{-1} \left( \frac{1}{2} \left( Hx + H\mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$

(64b)

$$\displaystyle \begin{aligned} &\quad +\,\frac{\sigma t^2}{T^2} \Sigma_t^{xx}H^\top R^{-1} H H^\top R^{-1} \left( \frac{1}{2} \left( Hx + H\mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$

(64c)

$$\displaystyle \begin{aligned} & \quad - \,\frac{t}{2T}\Sigma_t^{xx}H^\top R^{-1}H\left( f(x) + \pi_t^{\mathrm{h}}[f]\right) \end{aligned} $$

(64d)

$$\displaystyle \begin{aligned} & \quad + \,\frac{\sigma t^2}{T^2}\Sigma_t^{xx} H^\top R^{-1} H H^\top R^{-1} \left( \frac{1}{2} \left(Hx+H\mu_t^{\mathrm{h}}\right) - y_T \right) + \mathcal{O}(\Delta t) \end{aligned} $$

(64e)

$$\displaystyle \begin{aligned} &= - C_t H^\top R^{-1} \left( \frac{1}{2} \left( Hx + H\mu_t^{\mathrm{h}}\right) - y_T \right) \end{aligned} $$

(64f)

$$\displaystyle \begin{aligned} & \quad -\, \frac{t}{2T}\Sigma_t^{xx}H^\top R^{-1}H\left( f(x) + \pi_t^{\mathrm{h}}[f]\right) + \mathcal{O}(\Delta t) \end{aligned} $$

(64g)

with

$$\displaystyle \begin{aligned} C_t = \frac{1}{T}\Sigma_t^{xx} + \frac{t}{T}\Sigma_t^{xf} - \frac{2\sigma t^2}{T^2} \Sigma_t^{xx}H^\top R^{-1} H. \end{aligned} $$

(65)

Upon dropping terms of order $\mathcal {O}(\Delta t)$ and using $\Sigma _t^{xx}= \Sigma _t^{\mathrm {h}}$, we obtain (25) for $f=0$ and (30) for $\sigma = 0$ as special cases. The mean field equations (32) also follow easily from $f(x) = Fx + b$.

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reich, S. (2024). Data Assimilation: A Dynamic Homotopy-Based Coupling Approach. In: Chapron, B., Crisan, D., Holm, D., Mémin, E., Radomska, A. (eds) Stochastic Transport in Upper Ocean Dynamics II. STUOD 2022. Mathematics of Planet Earth, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-031-40094-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-40094-0_12
Published: 04 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40093-3
Online ISBN: 978-3-031-40094-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Data Assimilation: A Dynamic Homotopy-Based Coupling Approach

Abstract

1 Introduction

2 Problem Formulation and Background

3 Schrödinger Bridge Approach

4 Homotopy Induced Dynamic Coupling

5 Numerical Implementation

5.1 Ensemble Kalman Mean Field Approximation

5.2 Particle Approximation and Time-Stepping

6 Examples

6.1 Pure Diffusion Processes

6.2 Purely Deterministic Processes

6.3 Linear Gaussian Case

6.4 Nonlinear Diffusion Example

6.5 Lorenz-63 Example

7 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1: Derivation of Control Term Equation

Appendix 2: Ensemble Kalman Filter Approximations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation