The previous chapters illustrate how we could start from Bayes’ theorem and apply a sequence of assumptions or approximations that allow us to derive the most popular data-assimilation methods in use today. This chapter attempts to summarize the different techniques and present and compare them in the context of the approximations we made to derive them. We provide a graphical overview that makes it easy to relate different methods and lists the applied approximations.

1 Discussion of Methods

The graphical presentation in Fig. 11.1 summarizes all methods, assuming that the underlying dynamical model is nonlinear.

Fig. 11.1
figure 1

Unified derivation of DA methods. We have summarized the data-assimilation methods and their applied approximations when solving the update over one assimilation window. Other approximations may apply for sequential-in-time data assimilation, e.g., using a stationary background covariance like in 4DVar

In Chap. 2, we saw that we could split an extended timeline into separate assimilation windows as long as the dynamical model is a Markov process and the measurement errors are uncorrelated between assimilation windows. Thus, we can treat the assimilation problem one window at a time, and the recursive version of Bayes’ formula in Eq. (2.23) applies for each assimilation window. We also saw that the parameter-estimation problem is analogous to the assimilation problem for one assimilation window.

As discussed in Sect. 2.3.3, ensemble integrations are a common and probably the only practical means of propagating the state error covariances, or, more generally, the state’s pdf, over an assimilation window or from one assimilation window to the next.

Starting from the recursive Bayes’ formulation, we can choose two routes. We can solve the Bayesian problem by using a particle representation of the pdf and particle or particle-flow filters to compute the recursive update steps described in Chap. 9. These methods tend to be expensive, and we should only use them when the system is strongly nonlinear. The alternative is to apply the Gaussian priors’ assumption in Approx. 4, which allows for deriving most data-assimilation methods currently in use.

The Gaussian-priors’ approximation effectively allows us to search for the MAP estimate by minimizing the cost function in Eq. (3.9). Some examples include 4DVar schemes and the representer method, as discussed in Chaps. 4 and 5. Note that these methods only compute the MAP estimate and do not sample the posterior Bayes’ distribution. Also, they do not have a direct means for computing and propagating the error statistics to the next assimilation window. Therefore, these methods typically require an additional approximation of a stationary-in-time background-error-covariance matrix for the prior state estimate in each assimilation window. However, with the correct priors for an assimilation window, if these iterative and adjoint-based methods converge to the cost function’s global minimum, they find the MAP solution.

We also saw that, compared to the 4DVar solution, the extended Kalman filter (EKF) applies an additional approximation by linearizing the model and measurement operators to find an explicit update equation. The update only approximates the MAP estimate, but the EKF provides the means for updating and evolving the error statistics in time. The Kalman Filter (KF) solves the data-assimilation problem defined by Eq. (3.8) in the case of a linear model with Gaussian priors. In the weakly nonlinear case with Gaussian priors, the EKF provides an approximate solution due to linearization. However, both KF and EKF require the storage and propagation of the state error-covariance matrix, which becomes overwhelming for real-size geoscience data-assimilation problems, not to mention the severity of the linearization of the error covariance propagation, see Eq. (2.28).

Another alternative route is to follow the Gaussian-priors assumption with the Randomized Maximum Likelihood sampling approach described in Chap 7, providing an approximate sampling of the posterior pdf. The RML sampling requires minimizing an ensemble of cost functions with different prior state vectors and perturbed measurements. RML sampling turns out to be exact in the Gauss-linear case. At the same time, with significant nonlinearities, we cannot expect that it will work satisfactorily, and we might need to use the particle or particle-flow filters.

The RML sampling allows us to derive several ensemble-based assimilation methods. One alternative would be to use the 4DVar schemes to minimize each cost function to provide an RML ensemble of solutions. En4DVar uses adjoints and solves each cost function realization exactly, at least as long as there are no local minima. Hence, En4DVar solves the RML sampling problem within the approximations made on the background matrix. This approach is close to the procedure used by some operational En4DVar systems. But the method still largely ignores updating and evolving the error statistics.

Using a sufficiently large ensemble makes it possible to compute posterior error statistics and propagate error statistics from one time window to the next. We can then use the forecast ensemble to compute the prediction error covariance and use it instead of the stationary background matrix. Thus, since we are not able to calculate the full exact covariances, Approx. 8, where we compute the covariances from the ensemble, allows us to design methods with consistent error statistics evolving in time. Both En4DVar, using either the strong-constraint algorithms or the weak-constraint representer method and other adjoint-based Gauss–Newton methods, would work in this configuration. As the standard En4DVar uses a stationary background matrix, a complete ensemble-based En4DVar should outperform the traditional version as long as we use a sufficiently large ensemble.

If an adjoint model is available, we would not need to introduce additional approximations. However, in many cases, the adjoint model does not exist, and with commercial software, we often do not have access to the model code, so we cannot implement the adjoint model. Also, in this case, there is an alternative that uses the averaged model sensitivity from Approx. 7. We replace the individual adjoints for each realization with an ensemble-averaged model sensitivity. This best linear fit of the model sensitivity is the same for all the realizations. Thus for nonlinear systems, we introduce an approximation by changing each realization’s gradient slightly. Using an averaged model sensitivity leads to modern and efficient methods like EnRML and ESMDA, discussed in Chap. 8. For instance, the petroleum industry uses these methods operationally to history match reservoir models.

EnRML and ESMDA are iterative methods and require multiple integrations of the ensemble over the assimilation window. A computationally more efficient approach is to use the ensemble smoother (ES) for the assimilation window. The ES introduces a linearization of the model in the expression for the gradient, which allows for deriving a closed-form solution that we can compute without iterating. This equation is only valid for minor updates or modest nonlinearity. This property is precisely the basis for ESMDA. By calculating many minor updates using the ES equation instead of a single large one, ESMDA reduces the impact of the linearization.

In the “trivial” case with linear models and measurement operators, Fig. 11.1 would simplify the algorithms significantly. The use of Gaussian priors ensures that the distribution at all future times is also Gaussian. Furthermore, the RML sampling is precisely sampling the posterior pdf and does not introduce any approximation. There will be sampling errors, of course, since we use a low-rank ensemble approximation. The averaged model sensitivity is exact in this case, and there is no need for any linearizations or iterations.

2 So Which Method to Use?

It is impossible to provide specific advice on which method to use for a particular data-assimilation problem in a book like this one. There are just too many different problems out there. But, perhaps, more importantly, even the experts can disagree on the best method based on their favorite techniques and experience. However, we can offer some general advice. At the same time, the user has to keep in mind that fine-tuning a data-assimilation method remains an art, as it is true with any valuable thing in life.

The choice of method depends on the data assimilation application and its purpose. We will come back to this question in the final Chap. 23. But first, it is essential to evaluate the nature of the system, as this will determine the efficacy of the various data-assimilation methods. Thus, in this book’s Part II, we explain several example applications to demonstrate how different assimilation methods work with various dynamic models and assimilation problems. We will present examples that illustrate smoother versus sequential estimation, state versus parameter estimation, weak-constraint versus strong-constraint solutions, and linear versus nonlinear or even highly nonlinear problems.