This chapter aims to demonstrate the impact of some of the critical approximations we apply with different assimilation methods. Using a simple scalar model, we can simulate many realizations and eliminate sampling errors. We will examine the advanced particle flow and compare its performance to iterative smoothers and the linear updates from the ensemble Kalman filter (EnKF) or ensemble smoother (ES). By testing the methods on models of varying degrees of nonlinearity, we develop an overall understanding of how different data-assimilation methods perform in different situations.

1 Scalar Model and Inverse Problem

Let’s define two nonlinear scalar models

$$\begin{aligned} \begin{aligned} y&= g(x,q)= x + \beta x^3 + q , \end{aligned} \end{aligned}$$
(18.1)

and

$$\begin{aligned} \begin{aligned} y&= g(x,q)= 1 + \sin (x) + q , \end{aligned} \end{aligned}$$
(18.2)

which, given an initial state x and a model error q, define a prediction y.

In Eq. (18.1) \(\beta \) is a parameter that determines the non-linearity of the model. In the current example, we have used \(\beta =0.3\). Evensen  (2018) used the same model but without the model error, while Evensen (2019) included model errors but used \(\beta =0.2\). This model introduces a nonlinearity while retaining a monotonic model response given the inputs x and q. In Eq. (18.2), we introduce a model with stronger nonlinearity, resulting in multimodal posteriors, since one model output can result from different model inputs.

The goal is to demonstrate the impact of some of the approximations we introduced in Chap. 2 when we try to sample the Bayes’ posterior for x and q given a measurement of y. We will use techniques that solve the data-assimilation problem with different approximations. We start with a particle-flow approach that samples the Bayes’ posterior exactly when we use an adjoint-based model sensitivity. Then, we introduce the approximate ensemble-based model sensitivity in Approx. 7 to examine its impact. The EnRML approach with adjoint model sensitivity minimizes the cost functions Eq. (7.1) exactly. Still, this approach only approximately samples the Bayes’ posterior due to Approx. 6. We further examine the impact of the linearization in Approx. 5 introduced by the ES scheme. Finally, we examine the convergence of the ESMDA method.

2 Discussion of Data-Assimilation Examples

We run three cases of different levels of nonlinearity. In the first case, we use the model in Eq. (18.1). We sample the prior ensemble \(x^\mathrm {f}_j\) from a normal distribution \(\mathcal {N}(x^\mathrm {f}=0.0,C_{xx}=1.0)\). and we sample the perturbed observations, \(d_j\) of y, from \(\mathcal {N}(0.0,1.0)\). Thus, except for the model nonlinearity, this is a rather trivial case.

In the second case, we also use the model in Eq. (18.1), but now we sample the prior from to \(\mathcal {N}(1.0,1.0)\) the perturbed measurements from \(\mathcal {N}(-1.0,1.0)\). Thus, we introduce stronger nonlinearity and non-symmetry to the problem.

In the third case, we use the multimodal model in Eq. (18.2) with the prior ensemble sampled from \(\mathcal {N}(1.0,1.0)\) and the measurement ensemble from \(\mathcal {N}(0.0,1.0)\). The model error q is a random variable sampled from \(\mathcal {N}(0,C_{qq}=0.25)\) for all the three cases.

In these examples, we use a sufficiently large number of samples, i.e., \(10^7\), to generate accurate estimates of the probability density functions. Furthermore, the large ensemble allows us to work directly with the pdfs and examine the converged solutions of the methods and the impact of the various approximations. We use a smaller ensemble of \(10^5\) realizations for the particle flow computations due to the increased computational cost of the kernel matrix multiplications.

Stordal et al. (2021)  discussed the particle flow for this type of data-assimilation problem. Given a cost function as in Eq. (3.9) and its gradient Eq. (3.10) we can write an iteration using the gradient from Eq. (9.49) as

$$\begin{aligned} {\mathbf {z}}^{i+1}_j = {\mathbf {z}}^i_j - \gamma \frac{1}{N} \sum _{l=1}^N {\mathbf {K}}\big ({\mathbf {z}}^i_l,{\mathbf {z}}^i_j\big ) \nabla _{{\mathbf {z}}^i_l} \mathcal {J}\big ({\mathbf {z}}^i_l\big ) - \nabla _{{\mathbf {z}}^i_l} {\mathbf {K}}\big ({\mathbf {z}}^i_l,{\mathbf {z}}^i_j\big ) . \end{aligned}$$
(18.3)

Here we use the particle-flow filter from Eq. (9.49) and that we can write any posterior pdf as \(f({\mathbf {z}}|{\mathbf {d}}) \propto \exp \left( - \mathcal {J}\big ({\mathbf {z}}\big )\right) \), such that the gradient of the log-posterior is identical to the gradient of the cost function \(\mathcal {J}\big ({\mathbf {z}}\big )\). As explained in Sect. 9.3.2, we can choose any symmetric smooth kernel when the number of particles is large, and in this example, we use a Gaussian kernel

(18.4)

Using a “narrow” kernel will lead to an ineffective repulsion term and a gradient term where only realization j impacts the gradient for itself. In the examples below, we used a diagonal C with the number \(2/\pi \) on the diagonal.

In Figs. 18.1, 18.2 and 18.3 we present the results from the three cases with increasing degrees of nonlinearity. From top to bottom, we show the results using particle flow, an iterative ensemble smoother (IES) based on EnRML, ES, and finally, ESMDA.

In the upper plots we see that the particle-flow method with adjoint model sensitivity (full line) converges to the correct posterior independent of model nonlinearity. Moreover, the technique even recovers the bimodality in the rightmost plot. This result is in agreement with the theory. When introducing the ensemble-averaged model sensitivity from Approx. 7 in the particle flow methods, we still obtain good results in the weakly nonlinear cases with only a slight distortion of the sampled distribution compared to the true posterior. However, in the bimodal case, none of the methods show any skill in recovering the correct distribution. Thus, the linear regression Approx. 7 requires the model to be weakly nonlinear in the sense that the model is a monotonic function of the estimated inputs.

Although the particle-flow algorithm shows excellent potential for sampling the posterior pdf, extending it to higher-dimensional problems and practical applications poses certain challenges. For example, in a high-dimensional model, we must ensure that the repulsion term with a given kernel is “active,” using an appropriate kernel and a sufficient number of samples.

In the second row of plots in Figs. 18.1, 18.2 and 18.3 we computed the results from the EnRML sampling where we apply the Approx. 6 and minimize an ensemble of cost functions Eq. (7.1) using the IES. The solid line shows the sampled pdf when using the adjoint model sensitivity, while the dashed lines illustrate the results when we introduce the ensemble-averaged model sensitivity from Approx. 7. In the nearly linear case in Fig. 18.1, the EnRML sampling using both the adjoint and the linear regression representation for model sensitivity gives nearly the same answer. However, when the degree of nonlinearity increases, the two estimates diverge from each other and the true posterior. In the bimodal case, the EnRML sampling with adjoint model sensitivity captures both the modes but only approximately. This example represents all methods that exactly minimize the cost functions in Eq. (7.1), including the En4DVar.

Fig. 18.1
figure 1

Sampling Bayes’ for the scalar inverse problem in Eq. (18.1) with the prior and observation centered at zero. From top to bottom we show results using particle flow (PFF), an iterative ensemble smoother (IES), ES, and ESMDA. Each plot shows the results using an ensemble based model sensitivity (full lines) and an adjoint model sensitivity (dashed lines). Additionally, for ES and ESMDA, we plot the results when omitting the correction when computing \({\mathbf {C}}_{yy}\)

Fig. 18.2
figure 2

As in Fig. 18.1 but with the prior centered at \(x=1\) and the measurement at \(x=-1\)

Fig. 18.3
figure 3

As in Fig. 18.1 but for the highly nonlinear model defined by Eq. (18.2)

The third row of plots in Figs. 18.1, 18.2 and 18.3 shows the results when we minimize the cost functions in Eq. (7.1) using the ES method that includes the additional linearization of Approx. 5. Clearly, in both cases using an ensemble-averaged linear regression and an adjoint model sensitivity, the linearization causes the resulting pdf to deviate even further from the true posterior pdf. Surprisingly, we obtain a nearly perfect fit using ES in the case with the multimodal model when using the adjoint model sensitivity. In these plots, we also show the result with an ensemble-averaged sensitivity and where we computed \({\mathbf {C}}_{yy}\) directly without using the correct regression in Eq. (7.11). This error causes a significant shift in the estimated pdf and a worsening of the result.

Finally, the bottom plots of Figs. 18.1, 18.2 and 18.3 show the results when sampling the posterior distribution using ESMDA. Recall that ESMDA is just a sequence of ES updates using measurements with inflated errors. In the linear case, ESMDA samples exactly the Bayes’ posterior pdf. In the nonlinear case, ESMDA is subject to the linearity Approx. 5, but the impact of this approximation can be made negligible by using many small update steps. The many minor updates are nearly linear, and we have an almost insignificant effect of using the regression formula in Eq. (7.11). ESMDA with an ensemble gradient does amazingly well for this particular problem, while when using the adjoint model sensitivity, we obtain a much worse result. In the highly nonlinear case in Fig. 18.3, we observe that ESMDA with 32 steps and an adjoint model sensitivity does not work. This problem is a result of the vastly inflated measurements used in each update step. The large perturbations create some perturbed measurements with non-physical values that exceed the possible function outputs. For the model in Eq. (18.2), the outputs are restricted to the interval \(y \in [0:2]\), neglecting the stochastic model errors q. Evensen (2018) pointed out this issue with ESMDA. Using fewer update steps or even a square root formulation for the ESMDA (Emerick, 2018) might resolve this problem.

In summary, the particle-flow algorithm converges to the Bayesian posterior when using the adjoint model sensitivity. However, the computational cost is significantly larger than for the other methods. Moreover, we must choose the kernel wisely and care about the required number of particles for high-dimensional systems. Furthermore, introducing an ensemble gradient results in a solution that diverts from the correct Bayesian solution. Thus, when we don’t have an adjoint, we may still obtain an improved estimate using particle flow. Finally, the EnRML solution introduces another approximation, and the result is less good than the particle-flow solution. The code used for these examples is available from https://github.com/geirev/EnKF_scalar.git.

Fig. 18.4
figure 4

Illustration of the linear regression update used to solve the parameter estimation problem. The upper frame illustrates the ensemble prediction of five realizations of a scalar parameter, which misses the measurement in blue. The lower frame shows how the updated parameter realizations lead to predictions in better agreement with the measurement

3 Summary

As we have seen from the examples discussed in this chapter, the approximate sampling of the Bayes’ posterior worsen with increasing nonlinearity. Particularly for the multimodal problem, the results from using ensemble smoothers with an ensemble model sensitivity completely fail. In Fig. 18.4 we illustrate the basis for these methods. In this example, we have a monotonically increasing prediction with an increasing value for the input parameter. This situation corresponds to the two examples in Figs. 18.1 and 18.2. Thus, we have a positive correlation between the input parameter and the predicted measurement. When we are underpredicting the observation, the positive correlation indicates that we can increase the input-parameter value to obtain a better fit. The ES is using precisely this approach. The iterative EnRML smoothers take this one step further to handle better a certain degree of nonlinearity in the dynamics. However, the stronger the nonlinearity, the poorer is the regression representation of the model sensitivity. And, for non-monotonic functions, the correlation between input parameters and predicted measurements approach zero, explaining the poor performance of the regression updates in Fig. 18.3.