Data assimilation combines prior information from numerical model simulations with observed data to obtain the best possible description of a dynamical system and its uncertainty. The purpose of using data assimilation is often to compute the best possible estimate of the model state. Alternatively, we use data assimilation to estimate the model parameters or infer the best characterization of the model forcing or controls. In some cases, we would like to find the best descriptions of combinations of uncertain state variables, parameters, and model controls, or all of them together.

Data assimilation provides the best tool for optimally combining all available information whenever one has access to a numerical model and observations of a dynamical system. The notion of data assimilation finds its origin in numerical weather prediction and operational oceanography. However, its mathematical formulation originates from Bayesian inference, control theory, and variational calculus. Data-assimilation methods have evolved over the three previous decades from simplistic and ad-hoc approaches to advanced techniques for sampling the Bayesian posterior. Furthermore, it is common to use similar data-assimilation methods both for state and parameter estimation. Data-assimilation practices have also spread from operational systems in the ocean and weather-prediction communities to a wide range of research fields, particularly within the geosciences and the medical, economic, transportation, chemical, biological, physical, and general statistical research. There are currently many different data-assimilation methods to choose from, and there are several routes of deriving them.

This book’s significant contribution is the unified derivation of data-assimilation methods from a common fundamental and optimal starting point, namely Bayes’ theorem. And, Bayes’ theorem is indeed the optimal starting point, as Bui-Thanh  (2021) pointed out. By reviewing earlier research, they show how Bayes’ formula has a firm foundation within optimization. E.g., Bayes’ formula arises from the joint minimization of the Kullback-Leibler (KL) divergence between a posterior and prior distribution and the mean-square errors of the data represented by the likelihood. As stated by Bui-Thanh  (2021), “the first-order optimality condition of this optimization problem is precisely Bayes’ formula, and its unique updated distribution is the Bayes’ posterior.” But perhaps a more compelling argument is that Bayes’ theorem is the natural learning framework as it elegantly shows how to update prior information when new information becomes available. One of the strengths of Bayes’ theorem is that it does not try to solve the ill-defined problem of “inverting observations” but instead updates prior knowledge. In that sense, it is one of the very foundations of science.

Unique in this book is the “top-down” derivation of the assimilation methods. We start from Bayes theorem and gradually introduce the assumptions and approximations needed to arrive at today’s popular data-assimilation methods. This strategy is the opposite of most textbooks and reviews on data assimilation that typically take a bottom-up approach to derive a particular assimilation method. Examples of the bottom-up approach include, e.g., the derivation of the Kalman filter from linear estimation or control theory, the derivation of the ensemble Kalman filter as a low-rank approximation of the standard Kalman filter, and the derivation of 4DVar from variational principles. The bottom-up approach derives the assimilation methods from different mathematical principles, making it difficult to compare them. Thus, it may be unclear which assumptions we base a data-assimilation formulation on and sometimes even which problem it aspires to solve. Our top-down approach allows us to categorize data-assimilation methods based on the approximations used. This approach enables the user to choose the most suitable method for a particular problem or application. Have you ever wondered about the difference between the ensemble 4DVar and the “ensemble randomized maximum likelihood” (EnRML) methods? Do you know the differences between the ensemble smoother (ES) and the ensemble Kalman smoother (EnKS)? Would you like to understand how a particle-flow filter compares to a particle filter? In this book, we will provide clear answers to several such questions.

We will show how we can consistently derive the formulations and solution methods for recursive model-state and parameter estimation from Bayes’ theorem while discussing the required assumptions and approximations. We build up towards a focus on assimilation methods that attempt to sample the Bayes’ posterior pdf. Thus, we search for ensemble formulations that characterize uncertainty, such as ensemble 4DVar, ensemble Kalman filters, ensemble RML, particle filters, and particle-flow filters.

This book contains two parts, where Part I covers the theory, and Part II illustrates several data-assimilation methods applied with a range of simple models.

In Part I, we start from Bayes’ formula and introduce the approximations and assumptions needed to derive the various assimilation methods. In Chap. 2, we discuss the general mathematical formulation and notation that we will use throughout the book. We introduce the concept of an assimilation window and discuss problem formulations commonly solved in data assimilation. Chapter 2 is fundamental to understanding the subsequent chapters.

In Chaps. 35, we derive and discuss methods that solve for the maximum a posteriori solution. These are iterative methods, typically derived from a Gauss-Newton formulation introduced in Chap. 3. After that, Chap. 4 discusses the strong-constraint 4DVar approach while Chap. 5 introduces solution methods for the weak-constraint or so-called “generalized-inverse” formulation, leading to the representer method and weak-constraint 4DVar.

Chapter 6 discusses simple methods like 3DVar and Kalman filters. These methods apply significant approximations related to either linearizations or an approximate evolution of error statistics.

Then, in Chap. 7, we introduce the concept of randomized-maximum-likelihood (RML) sampling that approximately samples the posterior pdf from Bayes’ formula. In RML, we minimize an ensemble of cost functions, and we derive several popular data-assimilation methods from the RML formulation. Furthermore, we can use the assimilation methods described in the previous chapters to minimize the RML ensemble of cost functions. Thus, although these methods, by design, solve for the MAP solution, we can also use them to sample the Bayesian posterior approximately. This chapter illustrates how so-called “hybrid methods” follow naturally from Bayes’ theorem.

Chapter 8 takes the RML sampling one step further by using the ensemble statistics to represent the background-error-covariance matrix and propagate error statistics forward in time. Here we derive popular and highly efficient methods such as ensemble RML and ensemble Kalman filters and smoothers.

The final methods discussed in Chap. 9, include particle filters and particle-flow filters, which aspire to solve the fully nonlinear Bayesian problem through an exact sampling of the posterior pdf. Using proposal densities, we demonstrate that we can naturally combine all the data-assimilation methods derived in earlier chapters with the particle-filter techniques, showing that we can derive the hybrid approaches directly from Bayes’ theorem.

To complete the theoretical discussion in Part I, we discuss localization and inflation methods in Chap. 10. After that, Chap. 11 gives an overall assessment of all the assimilation methods discussed and the approximations used to derive them.

In Part II, we discuss the performance of different assimilation strategies in simple examples and demonstrate some real applications of data assimilation to illustrate the methods’ potential. We start with a Kalman filter and extended Kalman filter application with the Roessler model in Chap. 12, demonstrating the impact of linearizations in EKF.

In Chap. 13, we discuss the properties of the linear ensemble Kalman filter update and assess and demonstrate the “sub-space inversion” method that allows computing efficient EnKF updates with large data sets and correlated measurement errors. We follow up this discussion in Chap. 14, where we illustrate sequential data assimilation with a linear advection equation, and in Chap. 15, we use different ensemble methods with the chaotic Lorenz’63 model.

In Chap. 16, we switch to apply strong-constraint 4DVar with the Lorenz’63 model. We show how a formulation with multiple data-assimilation windows allows using 4DVar with a highly nonlinear model.

We then demonstrate using the representer method for solving the weak constraint 4DVar problem in Chap. 17, where we also explain the difference between the weak- and strong-constraint assumptions.

Interesting is also the nonlinear scalar example in Chap. 18, where we examine the convergence of some advanced data-assimilation methods, including iterative ensemble smoothers. We find that only a particle-flow filter can sample the correct posterior pdf in the highly nonlinear case.

In the following Chap. 19, we use a particle filter to estimate the state and parameters in a nonlinear seismic-cycle model, followed by a particle-flow implementation with a quasi-geostrophic ocean model in Chap. 20.

Finally, we present data-assimilation applications for history matching an oil-reservoir model in Chap. 21, including control-variable estimation, and in Chap. 22, we consider joint parameter estimation and control-variable estimation for predicting the Covid-19 epidemic.

This book is complementary to the following previously published books on data assimilation. Jazwinski (1970) is a masterpiece on linear and nonlinear filtering and is still relevant today. Daley  (1991) focuses on atmospheric data assimilation. Bennett (1992, 2002) explains the representer method for oceanic and atmospheric data assimilation. Kalnay (2002) discusses data assimilation in meteorology. Tarantola (2005) provides a fundamental treatment of especially variational methods, emphasizing solid-Earth problems. Fichtner (2011) and Nolet (2008) are advanced and introductory texts on seismic tomography, focusing on variational methods. Lewis et al. (2006) treat the data-assimilation problem from the least-squares perspective. Evensen (2009b) gives an extensive introduction to ensemble data assimilation. Bain and Crisan (2009) provide a mathematical foundation for stochastic filtering. Oliver et al. (2008) discuss  history matching in petroleum applications. Majda and Harlim  (2012) discuss ensemble filtering techniques for turbulent flows, emphasizing on low-order modeling of the filtering problem . Law et al. (2015) provide a mathematical description of the probabilistic approach. Reich and Cotter (2015) consider a general dynamic-systems approach with low-dimensional examples. Van Leeuwen et al. (2015) introduce nonlinear data assimilation focussing on particle filtering. Asch et al. (2017) present statistical, variational, and hybrid data-assimilation methods and their applications. Fletcher (2017) introduces variational and ensemble data assimilation methods and numerical methods used in meteorology.

Additionally, there are several review papers on data assimilation that discuss both the methods and their applications. Evensen (2009a) reviews ensemble Kalman filters and smoothers and introduces the combined parameter and state estimation problem.

Carrassi et al. (2018) offer  a mathematical description that serves as the first guide for readers relatively new to data assimilation. It provides a comprehensive overview of data-assimilation methods starting from Bayesian theory and examples that include data assimilation for chaotic systems and non-Gaussian problems.

Van Leeuwen et al. (2019)  provide  an overview of particle filters for high-dimensional geoscience applications. It places the particle filter in the context of other data-assimilation methods and provides a mathematical description from a probabilistic perspective. The publication includes pseudo-code for several particle-filter algorithms. Notably, the paper discusses proposal-density particle filters, transportation particle filters, and localization in particle filters. It also presents several hybrid methods that include particle filters.

Stuart (2010) gives a broad mathematical overview and presents a common mathematical framework in a Bayesian approach starting from a continuous infinite-dimensional description. He classifies methods into three categories; maximum a posteriori probability estimators, filters, and sampling methods. He continues to discuss a wide range of inverse problems in fluid mechanics, weather prediction, oceanography (Navier-Stokes), and subsurface geophysics (Darcy).

Vetra-Carvalho et al. (2018) provide a valuable overview of all ensemble Kalman filter variants in use at that time derived from a unifying framework, including pseudo-code for efficient implementation.

Several review papers address the use of data assimilation in different applications areas, e.g., for weather prediction (Bannister, 2017; Houtekamer & Zhang, 2016), history-matching of petroleum reservoir models (Aanonsen et al., 2009; Oliver & Chen, 2011), or hydrology (Liu et al., 2012; McLaughlin, 1995). This book provides easy access to these review papers for further reading into specific topics not covered here.

We are all too well aware that this book will have its shortcomings beyond typos and other mistakes. It represents our view on the field, which might be controversial in places. We had to leave out many essential subjects to keep the book focused. E.g., we do not cover the extensive literature on preconditioning in variational methods, and we do not discuss representation errors in any depth (a topic that currently is not treated well in any book).

Finally, we have not discussed the many “hybrid” methods between variational approaches and ensemble Kalman filters and between ensemble Kalman filters and particle filters. Fortunately, the material in this book should be sufficient for understanding the pros and cons of these hybrid methods. More important is perhaps our choice of references. This choice is biased by our knowledge, familiarity, and bias, and we apologize beforehand for the many omissions. We hope our friendships will not be affected and encourage colleagues to point us to gross errors.