This final chapter provides a general summary and discussion of the data-assimilation problem that will help the reader to choose a suitable assimilation method given the problem at hand, the application’s purpose, and the available time and resources. As the reader will notice, the classification below follows to a large extent the graphical representation of all the methods in Fig. 11.1 from bottom to top.

1 Classification of the Nonlinearity

Whether the dynamical model is linear or nonlinear, and eventually the level of nonlinearity, is the primary factor to consider when selecting a suitable data-assimilation method. Thus, we will start by discussing how different levels of nonlinearity impacts the choice of the data-assimilation method. In each case, the minimal requirement is the mean or mode of the posterior pdf. However, an uncertainty estimate is crucial for scientific significance, and uncertainty estimates are essential for any real-life application.

1.1 Linear to Weakly-Nonlinear Systems with Gaussian Priors

We first consider the case where the system is linear Gaussian, or the forward model is weakly nonlinear. We only have to solve for the posterior mean and covariance in this case. The Kalman filter and possibly the extended Kalman filter from Chap. 6, finds the optimal solution for these problems. However, they have a drawback: the state and its covariance matrix must be propagated from one update step to the next, which is infeasible or impossible for high-dimensional systems. Furthermore, even with weak nonlinearity present, the linearization of the error-covariance equation might lead to instabilities and unphysical solutions. Evensen (1994) developed the EnKF to handle the error-covariance propagation’s nonlinearity and resolve the dimensionality issue. Thus, even for linear models, the EnKF can be a computationally attractive alternative, e.g., as in the linear advection example in Chap. 14.

One would use an EnKF with a large ensemble of \(\mathcal {O}(100)\) realizations for high-dimensional systems. Of course, using a limited ensemble size introduces sampling errors, and we should select the ensemble size to reduce the sampling errors to an acceptable level. If a large ensemble size is not affordable, we can use an EnKF with a smaller ensemble size with the localization and inflation methods from Chap. 10. In the EnKF, the forward model equations can be nonlinear and do not need to be linearized, so the error propagation is exact.

The Kalman smoother is optimal for a smoother problem with observations distributed over an assimilation window. We have not discussed this method as we rarely use it in geosciences. A drawback is the forward and backward propagation of the error-covariance matrix over the assimilation window, making the method of little use for high-dimensional problems. For these problems, we can apply either an EnKS or an ES with a large ensemble size, or in combination with the localization and inflation methods from Chap. 10 when the ensemble size is small. If we have an adjoint model available, we can also use the representer method from Chap. 5 to solve the weak-constraint problem effectively for the mode. We can even use an ensemble of representer solutions in an RML setting (see Chap. 7) to represent the posterior uncertainty.

1.2 Weakly Nonlinear Systems with Gaussian Priors

We now consider model systems with weakly to modest nonlinearity and Gaussian priors. Modest nonlinearity means that the predicted measurements are monotonic functions of the state vector. This constraint eliminates cases with multiple modes in the pdf. We can distinguish situations where the adjoint of the forward model is available and situations where it is not. A general rule is that if we have access to the adjoint of the forward model, it is such a powerful tool that we should use it. In this case, the method of choice is RML sampling with adjoints, which includes EnRML, ensembles of variational methods, i.e., En3DVar, En4DVar, and the representer method. Note that the requirement of a posterior uncertainty estimate rules out a single 3DVar, or, for a smoother, a single SC-4DVar or WC-4DVar. For this reason, all operational weather prediction centers run either En4DVar, EnKF, or a single 3DVar or 4DVar augmented with an EnKF in some combination.

It is of interest to consider the complexity of implementing the different methods. The development and coding of an adjoint model can be an overwhelming task. If the adjoint of the forward model is unavailable, the recommended choices are EnRML and ESMDA, either with a large ensemble or exploring localization and inflation. Note that neither of these methods, with or without an adjoint, will provide the exact solution, but if the nonlinearity is modest, the error made is often negligible compared to other approximations in the system. For sequential data assimilation aimed for prediction, the EnRML and ESMDA may be unnecessarily computationally demanding. EnKF updates at the end of the assimilation window are straightforward and highly efficient to compute. Moreover, using existing well-tested codes and libraries for the EnKF analysis scheme, an EnKF application can be up and running in a few days. For a sequential prediction problem, it is not clear that an ensemble of 4DVar systems will perform better than a standard EnKF implementation configured to similar computational cost.

Another consideration in the choice of a method for a weakly nonlinear system, is the timing of the estimate update. We have seen that some methods update the estimate at the beginning of the window, e.g., SC-4DVar, while others estimate the update at the end of the window, e.g., EnKF. SC-4DVar assumes zero model errors, and we obtain the final solution by integrating the model over the assimilation window. The weak constraint 4DVar and the ensemble smoothers simultaneously update the model state in the whole window.

Finally, we mention that SC-4DVar and WC-4Dvar, including the representer method, are only efficient with appropriate preconditioning and reasonable estimates of the background covariance matrix. Unfortunately, no efficient preconditioning for WC-4DVar is available for the observation-space variant, the representer method, or the state- or forcing formulation. The lack of an efficient preconditioner for these weak-constraint methods is related to the vast problem size, which makes parallelization essential. However, efficient preconditioners developed for the strong-constraint problem interfere with the parallelization. Randomized preconditioners might be able  to break this deadlock (see, e.g., Bousserez et al., 2020; Daužickaitė et al., 2020, 2021a, b).

1.3 Strongly Nonlinear Systems

When the system is strongly nonlinear with multiple modes in the pdf, we must use fully nonlinear data-assimilation methods. We recommend using either particle filters or particle-flow filters. Note that we can reduce the effective nonlinearity in well-observed nonlinear systems. We saw an example in Chap. 15 where sufficiently frequent measurements managed to keep the ensemble tracking the observed attractor, avoiding bimodality. Particle filters require equivalent-weights schemes or localization, and there are applications of local particle filters with high-dimensional atmospheric problems. Particle-flow filters do not need localization by construction, and the research community has not yet fully explored this approach in detail. The only variant tested in high-dimensional systems uses kernels, and one has to find good kernels for each specific problem. However, it seems that solutions might be less sensitive to the exact kernel choice than previously thought.

2 Purpose of the Data Assimilation

In choosing a data-assimilation method, it is essential to consider the linearity or the Gaussianity of the system and its priors. Still, it also depends on the purpose of the data assimilation. The classification below lists several possible goals for using an assimilation system.

2.1 Hindcasts and Re-analyses

To analyze a system’s behavior, we can assimilate all available data over a certain period into a numerical model to obtain a consistent evolution of the state. In this case, it is more important to choose a data-assimilation scheme that is computationally efficient and can incorporate a long time series of (heterogeneous) measurements than a scheme that provides accurate posterior distributions of the variables. We would still split the hindcast period into several assimilation windows and use, e.g., an EnKS or a 4DVar.

2.2 Prediction Systems

Let’s imagine that we want to design a sequential data assimilation system used for weather predictions. What is the preferred approach? The most common and original purpose of assimilating data is to obtain the best model state at the end of an assimilation window and the best model parameters to forecast a natural system’s behavior accurately. We need the solution at the end of the assimilation window to compute a new forecast. The EnKF and WC-4DVar readily provide the necessary initial conditions if the system is weakly nonlinear and has Gaussian priors. The EnKF and an ensemble of WC-4DVar systems have the added advantage of providing consistent error statistics at the initial time of the next assimilation window. These methods thereby support recursive data assimilation and allow us to initialize a prediction with quantified uncertainty.

Interestingly, the most used data-assimilation method for weather prediction is SC-4DVar, which optimizes the solution at the beginning of an assimilation window. After that, one integrates the model solution to the end of the assimilation window with the deterministic nonlinear model, where the actual weather prediction starts. The reasons for this procedure’s success are twofold. First, the model state at the beginning of the assimilation window will be accurate because we have used past and future observations to update it. Second, a data-assimilation update will always push the model slightly out of its preferred balanced state, resulting in an adjustment of the model state during the first part of the forward integration, leading to less accurate forecasts. An SC-4DVar ensures that this adjustment is happening before the actual prediction.

There is a, perhaps slightly overlooked, problem when using SC-4DVar for predictions. While the SC-4DVar finds the posterior pdf’s mode at the start of the assimilation window, there is no guarantee that the solution will be the mode after propagation to the end of the assimilation window where we initialize the actual forecasts  (see, e.g., Van Leeuwen et al., 2015).

The pdf evolution over the assimilation window also affects smoother methods that estimate the joint pdf’s mode over the whole assimilation window. For nonlinear systems, the joint pdf’s mode will not be the same as the marginal pdf’s mode at the end of the assimilation window  (Van Leeuwen et al., 2015). And the marginal pdf’s mode at the end of the assimilation window is, at least in theory, the best starting point for a prediction. This point explains one reason why data-assimilation methods that update the state at the end of the assimilation window might be preferable.

2.3 Uncertainty Quantification and Risk Assessment

In many applications, assessing the uncertainties in the state or parameter estimates is critical. A particular example relates to subsurface uncertainty quantification or geotechnical applications for risk assessment  (e.g., Mohsan et al., 2021). EnRML, ESMDA, and EnKF are commonly used methods for these applications.

But would, in these applications, estimating the mode be enough, or do we need to sample the posterior distribution? While the mentioned methods may be easy to implement and computationally efficient, their estimates of posterior distributions in the case of the system’s nonlinear behavior may be flawed. Therefore, it is essential to evaluate the nonlinearity and Gaussianity of the system, as for highly nonlinear problems, the assumptions on linearity or Gaussianity may have implications for quantifying the uncertainties or risks. The choice of method depends on whether the problem is a parameter-estimation problem that we can solve over a single assimilation window or a data assimilation problem where data become available sequentially and where we must evolve the error statistics.

2.4 Model Improvement and Parameter Estimation

One can distinguish parameter estimation from estimating missing physics and parameterizations. Specifically, we use data assimilation to identify missing physics, inaccurate forcing, or model errors. In specific cases, data assimilation aims not to obtain a description, forecast, or better understand a system, but the main aim is a model improvement. In some cases, data assimilation has involved estimating missing terms in model equations (Lang et al., 2016) , estimating model factors (Vossepoel & Behringer, 2000), and adjusting the geometry of the model domain  (Glegola et al., 2012). It is essential to apply a method that explicitly considers the model errors in such cases. We can include model errors by adding \({\mathbf {q}}\) to the state vectors of any ensemble or variational methods. The estimated model errors will contain both random and structural parts. The structural components point to estimated missing physics at each time step, hence at the level of the model equations, and allow for direct model improvement, contrary to comparing model forecast to observations where finding the source of the model issues is almost impossible.

2.5 Scenario Forecasts and Optimal Controls

In some applications, we use data assimilation to forecast a given system under a given control. It is then possible to evaluate different scenarios or optimize the control strategy. Other examples of the application of data assimilation for forecasts, scenarios, or optimal controls include the control of a producing hydrocarbon field as discussed by Jansen et al. (2009) and in Chap. 21, scenario planning for the evolution of a pandemic as presented in Chap. 22, regularizing economic processes or forecasting and controlling traffic  (van Hinsbergen et al., 2012; Wang & Papageorgiou, 2005; Xie et al., 2018) . In many cases, the EnKF is a practical and efficient method. In the case of parameter and control variables estimation, techniques such as EnRML or ESMDA are popular.

3 How to Reduce Computational Costs

Section 11.2 assumes that we can meet the necessary computational requirements, while this can be a severe issue. Present-day computer architectures are parallel, and we must explore this structure fully. Ensemble methods are naturally parallel in the forward propagation of the model state. The update in ensemble space is harder to parallelize because communication between ensemble realizations is essential. Iterative methods need the solution of one iteration before the next iteration can be processed, making these methods sequential by construction. However, research is ongoing on parallelization in the time domain. In this case, one splits the assimilation window into smaller time segments and runs parallel iterations for each, with communication only needed after each iteration.

As mentioned in Sect. 23.1, an adjoint of the forward model or the observation operator will increase accuracy and efficiency. Unfortunately, generating the adjoint of a complex forward model is a highly challenging process, often taking many years to complete. Automatic adjoint-compilers are available that need as input a forward model and a complete specification of the meaning of each symbol used in the forward model code and generate as output the tangent-linear and adjoint model codes. While these compilers are pretty sophisticated and helpful for many forward models, their output codes are (as yet) not as efficient as a human can make them. A way to improve the efficiency is to generate the adjoint code simultaneously with the forward code. Indeed, the adjoint compilers have generated some remarkably efficient codes this way, but it requires coding the forward model from scratch.

A frequent statement is that ensemble methods are too expensive, and variational methods require less computational costs and are preferable. This statement is, however, a misrepresentation of reality. First of all, variational methods need an ensemble component for uncertainty quantification. But the most crucial argument is that each iteration of a 4DVar contains one forward tangent-linear and one adjoint integration over the assimilation window. This computation corresponds roughly to integrating two ensemble realizations. Hence, 50 iterations correspond to an ensemble size of approximately 100.

Furthermore, the linearizations in the iterative methods lead to additional terms in the model equations. A tangent linear or adjoint integration is two to four times more expensive than a nonlinear model integration used in an ensemble method. However, there are ways of compensating for this additional cost in variational methods. We can reduce the resolution of the models used in the iterations or simplify the nonlinear forward model’s nonlinear parameterizations. Both approaches introduce additional approximations. It seems that variational and ensemble methods have similar costs for similar accuracy in practice.

4 What Will the Future Hold?

Of course, it is almost impossible to predict the future, even given all the data we have, because human evolution is perhaps more chaotic than nature. Nonetheless, we can pinpoint some trends that might have some momentum, hence predictive power.

First, there is an increasing trend of pushing for nonlinear data-assimilation methods. For instance, the ever-increasing resolution in weather prediction models resolves turbulent features in the atmospheric boundary layer. We do not have enough observations to avoid the development of strongly non-Gaussian pdfs. Indeed, a growing number of scientists within applied and even pure mathematics are among those pushing the boundaries.

Another trend is the incorporation of machine learning in data assimilation. Examples include using machine learning to make models more efficient, accelerating data-assimilation methods, and replacing parts of the data-assimilation process. It is hard to predict where this is heading. An important lesson seems to be that, e.g., neural networks from image processing are not directly applicable for data assimilation, and we need particular machine architectures to make real progress. Physically realistic predictions seem achievable by building strong model constraints into the machine-learning cost function, bringing machine learning closer to variational methods where model constraints incorporate prior knowledge.

A general weakness of machine learning is its inability to estimate uncertainty. That will need to change if machine learning will be an alternative for data assimilation for real-world applications in geosciences. Indeed, many ideas from data-assimilation on uncertainty quantification are starting to find their way in the machine learning literature, either being actively brought in or reinvented. Unfortunately, not all application areas are fully aware of other methods, and some practitioners reinvent techniques that have been mainstream in different research fields for decades.

Whatever the future, it will most likely result from the collaborations of many scientists from many different disciplines. If we keep the data-assimilation field as collaborative as it is now, the future is bright.