This chapter will introduce another approximation where we represent all state error covariances using a finite ensemble of the state. This approximation allows us to search for the solution in the ensemble subspace, leading to very efficient ensemble data-assimilation methods. The most well-known is the ensemble Kalman filter, but we also have newer advanced schemes like the ensemble-randomized-maximum-likelihood method. In this chapter, we introduce the ensemble approximation and derive these ensemble subspace methods. We also illustrate using a single algorithm, i.e., an ensemble-subspace RML formulation, to compute the update in several traditional ensemble methods.

## 1 Ensemble Approximation

To ease the computational aspects of the methods discussed in the previous chapter, let’s introduce a new approximation

### Approximation 8 (Ensemble approximation)

It is possible to approximately represent a covariance matrix by a low-rank ensemble of states with fewer realizations than the state dimension. $$\square$$

Ensemble data-assimilation methods (Evensen, 1994) use a finite ensemble of state vectors to approximate the prior error covariance matrix $${{\mathbf {C}}_{\textit{zz}}}$$. It is easy to show that we restrict the data-assimilation estimate to the ensemble space when representing $${{\mathbf {C}}_{\textit{zz}}}$$ by an ensemble of state vectors. This approach significantly simplifies the computational problem. In the following, we will introduce the ensemble covariance matrices into the Kalman-filter update and Gauss–Newton methods described in Chaps. 3 and 6.

## 2 Definition of Ensemble Matrices

We start by defining the prior ensemble of N model realizations, $${\mathbf {z}}_j \in \Re ^n$$, stored in the ensemble matrix $${\mathbf {Z}}\in \Re ^{n\times N}$$,

(8.1)

Furthermore, we define the projection $$\boldsymbol{\Pi }\in \Re ^{N\times N}$$ as

(8.2)

where $$\mathbf {1}\in \Re ^{N}$$ is a vector with all elements equal to one and $${\mathbf {I}}_N$$ is the N-dimensional identity matrix. If we multiply an ensemble matrix with the orthogonal projection $$\boldsymbol{\Pi }$$, this subtracts the mean from the ensemble and scales the result with $$1/\sqrt{N-1}$$.

We can then define the zero-mean and scaled ensemble-anomaly matrix as

\begin{aligned} {\mathbf {A}}= {\mathbf {Z}}\boldsymbol{\Pi }. \end{aligned}
(8.3)

Thus, the ensemble covariance is

\begin{aligned} \overline{{\mathbf {C}}}_{zz} = {\mathbf {A}}{\mathbf {A}}^\mathrm {T}, \end{aligned}
(8.4)

where the “overbar” denotes that we have an ensemble-covariance matrix.

Correspondingly, we can define an ensemble of perturbed measurements, $${\mathbf {D}}\in \Re ^{m\times N}$$, when given the real measurement vector, $${\mathbf {d}}\in \Re ^m$$, as

\begin{aligned} {\mathbf {D}}= {\mathbf {d}}\mathbf {1}^\mathrm {T}+ \sqrt{N-1} {\mathbf {E}}, \end{aligned}
(8.5)

where $${\mathbf {E}}\in \Re ^{m\times N}$$ is the centered measurement-perturbation matrix whose columns are sampled from $$\mathcal {N}(0,{\mathbf {C}}_\textit{dd})$$ and divided by $$\sqrt{N-1}$$. Thus, we define the ensemble covariance matrix for the measurement perturbations as

\begin{aligned} \overline{{\mathbf {C}}}_{dd} = {\mathbf {E}}{\mathbf {E}}^\mathrm {T}. \end{aligned}
(8.6)

The ensemble algorithms derived below work both with a full-rank $${\mathbf {C}}_\textit{dd}$$ or the ensemble version represented by the perturbations in $${\mathbf {E}}$$.

Finally, we define the ensemble of model-predicted measurements

\begin{aligned} \boldsymbol{\Upsilon }= {\mathbf {g}}({\mathbf {Z}}), \end{aligned}
(8.7)

with anomalies

\begin{aligned} {\mathbf {Y}}= \boldsymbol{\Upsilon }\boldsymbol{\Pi }, \end{aligned}
(8.8)

where we have multiplied the model prediction by the projection $$\boldsymbol{\Pi }$$ to subtract the ensemble mean and dividing the resulting anomalies by $$\sqrt{N-1}$$.

## 3 Cost Function in the Ensemble Subspace

We now introduce the ensemble representation from Eq.  (8.4) into the approximate EnKF sampling in Eq. (7.3) or the RML sampling in Eqs. (7.8) or (7.9). It is then easy to show that the updated samples are confined to the space spanned by the prior ensemble since the leftmost matrix in the gradient is the ensemble anomaly matrix.

Thus, we will search for the solution in the ensemble subspace spanned by the prior ensemble by assuming that an updated ensemble realization, $${\mathbf {z}}^\mathrm {a}_j$$, is equal to the prior realization, $${\mathbf {z}}^\mathrm {f}_j$$, plus a linear combination of the ensemble anomalies,

\begin{aligned} {\mathbf {z}}^\mathrm {a}_j = {\mathbf {z}}_j^\mathrm {f}+ {\mathbf {A}}{\mathbf {w}}_j. \end{aligned}
(8.9)

In matrix form, we can rewrite Eq. (8.9) as

\begin{aligned} {\mathbf {Z}}^\mathrm {a}= {\mathbf {Z}}^\mathrm {f}+ {\mathbf {A}}{\mathbf {W}}, \end{aligned}
(8.10)

where column j of $${\mathbf {W}}\in \Re ^{N\times N}$$ is just $${\mathbf {w}}_j$$ from Eq. (8.9).

Following Hunt et al. (2007) we write the cost function (7.1) in terms of $${\mathbf {w}}_j$$ as

Cost function in ensemble subspace

(8.11)

where we have used that

(8.12)

in which we defined $$\widetilde{{\mathbf {w}}}_j = \bigl ({\mathbf {A}}^\dagger {\mathbf {A}}\bigr ) {\mathbf {w}}_j$$. The superscript $$\dagger$$ denotes the pseudo inverse. The expression $${\mathbf {A}}^\dagger {\mathbf {A}}$$ is the orthogonal projection onto the range of $${\mathbf {A}}^\mathrm {T}$$. Also, we have the projection property $$\bigl ({\mathbf {A}}^\dagger {\mathbf {A}}\bigr )\, \bigl ({\mathbf {A}}^\dagger {\mathbf {A}}\bigr ) = {\mathbf {A}}^\dagger {\mathbf {A}}$$. Thus, $$\widetilde{{\mathbf {w}}}_j$$ is just the projection of $${\mathbf {w}}$$ onto ensemble perturbation space, and from

\begin{aligned} {\mathbf {A}}{\mathbf {w}}_j = {\mathbf {A}}\bigl ({\mathbf {A}}^\dagger {\mathbf {A}}\bigr ) {\mathbf {w}}_j = {\mathbf {A}}\widetilde{{\mathbf {w}}}_j , \end{aligned}
(8.13)

we see that it does not matter whether we solve for $${\mathbf {w}}_j$$ or $$\widetilde{{\mathbf {w}}}_j$$.

Note that we have used an ensemble representation for the measurement-error-covariance matrix. We could have retained the complete $${\mathbf {C}}_\textit{dd}$$, but in many cases, this matrix is too large for practical computations, and it is therefore commonly approximated by a diagonal matrix. Thus, one typically neglects all measurement error correlations, which can have dire consequences. Evensen  (2021) proposed using the ensemble representation in Eq. (8.6), but with an increased ensemble size to mitigate additional sampling errors. We will discuss this issue further in Chap. 13.

Minimizing the cost functions in Eq. (8.11) implies solving for the minima of the original cost functions in Eq. (7.1), but restricted to the ensemble subspace and with $$\overline{{\mathbf {C}}}_{zz}$$ in place of $${{\mathbf {C}}_{\textit{zz}}}$$, as explained by Bocquet  et al. (2015). The ensemble of cost functions in Eq. (8.11) does not refer to the high-dimensional state-covariance matrix, $${{\mathbf {C}}_{\textit{zz}}}$$. In the original formulation, we searched for the solution in the state space. We now have a simpler problem where we search for the ensemble sub-space solution. Thus, we solve for the N vectors $${\mathbf {w}}_j\in \Re ^N$$, one for each realization.

## 4 Ensemble Subspace RML

We will formulate a Gauss–Newton method for minimizing the cost function in Eq. (8.11) and the following algorithm comes from  Evensen et al. (2019). The Jacobian (gradient) of the cost function $$\nabla _{\mathbf {w}}\mathcal {J}({\mathbf {w}}_j) \in \Re ^{N\times 1}$$ is

\begin{aligned} \begin{aligned} \nabla _{\mathbf {w}}\mathcal {J}({\mathbf {w}}_j)&= {\mathbf {w}}_j + \bigl ({{\mathbf {G}}_{j}}{\mathbf {A}}\bigr )^\mathrm {T}\overline{{\mathbf {C}}}_\textit{dd}^{\,-1} \bigl ({\mathbf {g}}\bigl ({\mathbf {z}}_j^\mathrm {f}+ {\mathbf {A}}{\mathbf {w}}_j\bigr )-{\mathbf {d}}_j\bigr ), \end{aligned} \end{aligned}
(8.14)

and an approximate Hessian (gradient of the Jacobian) $$\nabla _{\mathbf {w}}\nabla _{\mathbf {w}}\mathcal {J}({\mathbf {W}}) \in \Re ^{N\times N}$$ becomes

\begin{aligned} \begin{aligned} \nabla _{\mathbf {w}}\nabla _{\mathbf {w}}\mathcal {J}\bigl ({\mathbf {w}}_j\bigr ) \approx {\mathbf {I}}+ \bigl ({{\mathbf {G}}_{j}}{\mathbf {A}}\bigr )^\mathrm {T}\overline{{\mathbf {C}}}_\textit{dd}^{\,-1} \bigl ({{\mathbf {G}}_{j}}{\mathbf {A}}\bigr ). \end{aligned} \end{aligned}
(8.15)

We have defined the tangent-linear model

(8.16)

and in the Hessian we have neglected the second-order derivatives.

The iterative Gauss–Newton scheme for minimizing the cost function in Eq. (8.11), analogous to (7.5), is then

(8.17)

where we introduce $$\gamma \in (0,1]$$ as a step-length parameter, and we have the tangent-linear operator evaluated for realization j at the current iteration i as

(8.18)

Now using the corollaries from Eqs. (6.9) and (6.10), we can write the Gauss–Newton iteration in Eq. (8.17) as

(8.19)

By introducing the ensemble representation for the covariances in the linear regression we obtain

\begin{aligned} {{\mathbf {G}}_{j}^{i}}\approx {\overline{{\mathbf {G}}}^{i}}\triangleq \overline{{\mathbf {C}}}_{yz}^{i}{\overline{{\mathbf {C}}}_{zz}^{i}}^{\dagger } = {\mathbf {Y}}^{i}{{\mathbf {A}}^{i}}^\dagger , \end{aligned}
(8.20)

where $${\mathbf {Y}}^{i}$$ is defined from Eq. (8.21) and evaluated at iteration i, i.e.,

(8.21)

The tricky term in Eq. (8.19), which correponds to the one mentioned in relation to Eq. (7.9), is the product $${{\mathbf {G}}_{j}^{i}}{\mathbf {A}}$$. Evensen et al. (2019) showed that we can write

\begin{aligned} {\mathbf {S}}^{i}={\overline{{\mathbf {G}}}^{i}}{\mathbf {A}}&= {\mathbf {Y}}^{i}{{\mathbf {A}}^{i}}^\dagger {\mathbf {A}} \end{aligned}
(8.22)
\begin{aligned}&= {\mathbf {Y}}^{i}{{\mathbf {A}}^{i}}^\dagger {\mathbf {A}}^{i}{\boldsymbol{\Omega }^{i}}^{-1} \end{aligned}
(8.23)
\begin{aligned}&= {\mathbf {Y}}^{i}{\boldsymbol{\Omega }^{i}}^{-1} \quad \text {if } n \ge N-1 \text { or if }{\mathbf {g}}\text { is linear.} \end{aligned}
(8.24)

In this expression, we have defined the quadratic matrix

\begin{aligned} \boldsymbol{\Omega }^{i}= {\mathbf {I}}+ {\mathbf {W}}^{i}\boldsymbol{\Pi }, \end{aligned}
(8.25)

that relates the ensemble anomalies at iteration i to the initial anomalies $${\mathbf {A}}= {\mathbf {A}}^{i}\boldsymbol{\Omega }^{-1}$$.

Note that we cannot use Eq. (8.24) when $$n < N-1$$, i.e., when the state dimension is less that the ensemble size minus one. We, then, need to retain the projection $${{\mathbf {A}}^{i}}^\dagger {\mathbf {A}}^{i}$$ and use Eq. (8.23) rather than Eq. (8.24). Evensen et al.   (2019) derived the proofs of this result and we refer to this paper for the details. This result is also complementary to Eq. (7.11) which was derived by Evensen (2019).

We can now write the iteration of Eq. (8.19) in matrix form as

(8.26)

where we have defined the “innovation” term

\begin{aligned} \widetilde{{\mathbf {D}}}^{i}= {\mathbf {S}}^{i}{\mathbf {W}}^{i}+ {\mathbf {D}}- {\mathbf {g}}\bigl ({\mathbf {Z}}^{i}\bigr ). \end{aligned}
(8.27)

The update for iteration i is

\begin{aligned} \begin{aligned} {\mathbf {Z}}^{i}&= {\mathbf {Z}}+ {\mathbf {A}}{\mathbf {W}}^{i}\\&= {\mathbf {Z}}\Bigl ({\mathbf {I}}+ \boldsymbol{\Pi }{\mathbf {W}}^{i}/\sqrt{N-1}\Bigr ) \\&= {\mathbf {Z}}\Bigl ({\mathbf {I}}+ {\mathbf {W}}^{i}/\sqrt{N-1}\Bigr ) , \end{aligned} \end{aligned}
(8.28)

where we have used $${\mathbf {W}}^i = \boldsymbol{\Pi }{\mathbf {W}}^i$$, which we get from Eq. (8.26) using $${\mathbf {S}}^\mathrm {T}=\boldsymbol{\Pi }{\mathbf {S}}^\mathrm {T}$$. Thus, we can compute the final update to a cost of $$n N^2$$ operations. The updated ensemble is a linear combination of the prior ensemble members, and the prior ensemble space contains the updated ensemble of solutions. Algorithm 5 details the implementation of the ensemble subspace RML algorithm. The algorithm takes as inputs the prior ensemble and the perturbed measurements, and runs an ensemble of model simulations to evaluate $${\mathbf {g}}\bigl ({\mathbf {Z}}^{i}\bigr )$$. Thus, the algorithm is generic and we can use it for any model or problem configuration. In Sect. 8.9 we discuss a practical and efficient implementation for inverting the expression where we replace the full measurement error covariance matrix with the ensemble representation, $${\mathbf {C}}_\textit{dd}\approx \overline{{\mathbf {C}}}_\textit{dd}= {\mathbf {E}}{\mathbf {E}}^\mathrm {T}$$. We have used this subspace EnRML method in the petroleum example in Chap. 21.

## 5 Ensemble Kalman Filter (EnKF) Update

We can derive the EnKF update as a minimizer of the ensemble of cost functions in Eq. (7.1). For this, we compute the solution from Eq. (7.12), but using the ensemble of realizations in Eq. (8.1) to represent the error covariance matrix. Thus, we can write

(8.29)

Equation (8.29) represents the EnKF update equation of Evensen (1994) with the perturbed observations proposed by  Burgers et al. (1998).

It is straightforward to show that

(8.30)

by using the following result from  Sakov et al. (2012),

\begin{aligned} {\mathbf {A}}^\dagger {\mathbf {A}}= {\mathbf {I}}_N-\frac{1}{N} \mathbf {1}\mathbf {1}^\mathrm {T}\qquad \text {for }n > N-1. \end{aligned}
(8.31)

However, only in the low-rank case, when $$n > N-1$$, is $${\mathbf {A}}^\dagger {\mathbf {A}}$$ a projection that removes the ensemble mean as defined in Eq. (8.12). But, since in Eq. (8.30), the mean of $${\mathbf {Y}}$$ is already zero by the definition in Eq. (8.8), the additional multiplication with $${\mathbf {A}}^\dagger {\mathbf {A}}$$ has no effect.

Evensen (2003) reformulated the EnKF update Eq. (8.29) in terms of the ensemble, as

(8.32)

with

(8.33)

but he did not realize the limitation in Eq. (8.30). Thus, for the Eq. (8.32) to be generally valid, we need to redefine $${\mathbf {Y}}$$ as follows

\begin{aligned} {\mathbf {Y}}= {\left\{ \begin{array}{ll} {\mathbf {Y}}&{} \text {for }n\ge N-1 \\ {\mathbf {Y}}{\mathbf {A}}^\dagger {\mathbf {A}}&{} \text {for }n<N-1 . \end{array}\right. } \end{aligned}
(8.34)

Interestingly, it is possible to compute the EnKF solution from the first ensemble-subspace RML iteration. The prior value of $${\mathbf {W}}$$ is $${\mathbf {W}}^{(0)}=0$$, and if we set the step-length $$\gamma =1.0$$, then the first Gauss–Newton iteration of (8.26) becomes just the EnKF update equation. Hence, the EnKF solution is also confined to the prior ensemble subspace. Moreover, if we implement Algorithm 5, we can use it to compute both the EnRML and the EnKF solutions. We have presented a simplified EnKF computation in Algorithm 6.

Let us define $${\mathbf {X}}$$ where each column stores an ensemble realization for all of the K time steps of the window. This notation allows us to write $${\mathbf {X}}_k$$ to represent the rows in $${\mathbf {X}}$$ holding the solution corresponding to time step k. But more importantly, we can have measurements distributed over the assimilation window, and we can compute the predicted measurements just by measuring $${\mathbf {X}}$$, i.e., $$\boldsymbol{\Upsilon }= {\mathbf {h}}({\mathbf {X}})$$.

Note that by using this formulation of the EnKF, we have complete flexibility in defining what the state vector is (see Algorithm 7). It allows us to update the model state at the end of the assimilation window, as is common in sequential data assimilation using EnKF (see also the filter solution in Fig. 2.2). And it also allows us to use measurements distributed over the assimilation interval in the update calculation. But, this formulation also allows us to compute the solution at the initial time of the assimilation window and then integrate the posterior ensemble over the assimilation window to obtain the prior for the next window as in Fig. 2.4. Or we can update the model state $${\mathbf {X}}$$ over the whole assimilation window, which corresponds to the ensemble smoother (ES) solution in Fig. 2.1 as introduced in Van Leeuwen and Evensen (1996) and first applied to a real oceanographic applications in Van Leeuwen (1999, 2001). Finally, the formulation is flexible enough to augment the updates from the previous windows to the state vector and recursively update the model state in the current and some previous windows. This last approach corresponds to the (lagged) ensemble Kalman smoother (EnKS), as introduced in Evensen and Van Leeuwen (2000). We illustrate all these alternatives in Algorithm 7, where the main difference between them is the definition of the state vector we are updating.

A final remark is that the original Kalman filter usually writes the update equations using the Kalman gain matrix $${\mathbf {K}}$$. However, it only makes sense to compute $${\mathbf {K}}$$ when one has a full-rank error covariance matrix $${{\mathbf {C}}_{\textit{zz}}}$$. In the ensemble methods, the use of low-rank representation $$\overline{{\mathbf {C}}}_{zz}$$ implies that $${\mathbf {K}}$$ is also of low rank as we compute an ensemble representation, $$\overline{{\mathbf {K}}}$$, from an outer product of low-rank ensemble matrices. In other words, one should never compute the Kalman gain matrix using ensemble methods unless maybe when the number of measurements is less than the ensemble size.

## 6 Ensemble DA with Multiple Updating (ESMDA)

For some non-linear problems, the Gauss–Newton method may not converge if the normalizing Hessian is of low rank. We can then use an alternative formulation named ensemble smoother with multiple data assimilation (ESMDA) proposed by  Emerick and Reynolds (2013). ESMDA approximately samples from $$f\bigl ({\mathbf {z}}|{\mathbf {d}}\bigr )$$ by gradually introducing measurements using the so-called tapering of the likelihood function (Neal, 1996).

When requiring that

\begin{aligned} \sum _{i=1}^{N_{\text {\tiny {mda}}}} \frac{1}{\alpha ^{i}} = 1, \end{aligned}
(8.35)

we can write the following

\begin{aligned} \begin{aligned} f\bigl ({\mathbf {z}}|{\mathbf {d}}\bigr )&\propto {f\bigl ({\mathbf {d}}|{\mathbf {g}}({\mathbf {z}})\bigr )} \,{f\bigl ({\mathbf {z}}\bigr )} \\&= {f\bigl ({\mathbf {d}}|{\mathbf {g}}({\mathbf {z}})\bigr )}^{\left( \sum _{i=1}^{N_{\text {\tiny {mda}}}} \frac{1}{\alpha ^{i}} \right) } \,{f\bigl ({\mathbf {z}}\bigr )} \\&= {f\bigl ({\mathbf {d}}|{\mathbf {g}}({\mathbf {z}})\bigr )^\frac{1}{\alpha ^{N_{\text {\tiny {mda}}}}} \cdots \, f\bigl ({\mathbf {d}}\,|\,{\mathbf {g}}({\mathbf {z}})\bigr )^\frac{1}{\alpha ^{2}} \, f\bigl ({\mathbf {d}}\,|\,{\mathbf {g}}({\mathbf {z}})\bigr )^\frac{1}{\alpha ^{1}} } \, {f\bigl ({\mathbf {z}}\bigr ) }. \end{aligned} \end{aligned}
(8.36)

We can then compute $$N_{\text {\tiny {mda}}}$$ recursive EnKF steps that gradually introduce the observations using inflated observation errors. This gradual introduction of the update reduces the impact of the linearization in the ES scheme, see Approx. 5. The method converges precisely to the ES solution in the linear case when the ensemble size goes to infinity.

In ESMDA, we can use Algorithm 6 to compute the solution. We follow each step of the algorithm as we do for the EnKF solution, but we repeat the procedure $$N_{\text {\tiny {mda}}}$$ times. For each recursive call to the algorithm, we will resample the perturbed measurements from $$\mathcal {N}({\mathbf {d}}, \alpha ^{i}\overline{{\mathbf {C}}}_\textit{dd})$$. Thus, the effective measurement error variance in each step is increased with a factor $$\alpha ^{i}$$.

ESMDA has gained popularity due to its ease of implementation and its successful use in various applications. Although it is unclear what the method converges to in the nonlinear case, it appears to provide an acceptable solution in many cases. Note that ESMDA with one step corresponds to the EnKF estimate for the start of the assimilation window.

When we consider the convergence of ESMDA, we mean the number of steps needed before a further decrease in step length does not change the final solution. The required number of steps depends on the nonlinearity of the model. In the COVID example in Chap. 22, we found that 16–32 steps were necessary. We note that in ESMDA, as the number of steps increases, the measurement perturbations also increase. Thus, one can imagine cases where the perturbed measurements take unphysical values causing the algorithm to break down. Emerick (2018) resolved this particular issue by using a square-root formulation for the update calculation.

To summarize, both EnRML and ESMDA solve the recursive smoother problem over a sequence of assimilation windows. It is unclear which method will work the best for a particular situation, and both have their advantages and disadvantages. EnKF-type approaches are more efficient to compute as they linearize the measurement prediction and avoid iterations. However, as we will see in the following section, we can also use the 4DVar methods in an ensemble setting and compute the ensemble update without applying the linear regression of Approx. 7.

## 7 Ensemble 4DVar with Consistent Error Statistics

In Chaps. 4 and 5, we learned that the 4DVar method solves for the maximum a posteriori solution if it converges to the global minimum of the cost function in Eq. (3.9). Thus, we can use 4DVar to minimize the ensemble of cost functions in Eq. (7.1). These cost functions are all independent of each other. Suppose the operator $${\mathbf {g}}({\mathbf {z}})$$ is weakly nonlinear. In that case, we can minimize each cost function independently using 4DVar to obtain an ensemble of solutions representing the minima defined by the cost functions in Eq. (7.1). This approach lets us sample approximately the marginal probability in Bayes’ theorem in Eq. (2.43). This approach of sampling realizations from the posterior pdf, referred to as the Ensemble of Data Assimilations’ (En4DVar) approach (Isaksen et al., 2010) , is an example of RML sampling and is the method currently used at ECMWF for operational data assimilation.

En4DVar samples an ensemble of realizations from the posterior pdf, so it can also represent the posterior error covariance matrix. Using SC-4DVar, we sample the ensemble of initial conditions for the time window and obtain the ensemble solution over the time window by a final ensemble integration. At the end of the time window, the ensemble prediction provides the initial conditions for the next assimilation window, and the ensemble spread represents its updated background error-covariance matrix.

The WC-4DVar solution of En4DVar gives an updated ensemble for the whole assimilation window. While SC-En4DVar initializes a prediction from the estimate at the beginning of the assimilation window, when using WC-En4DVar, we should initialize the forecast from the ensemble estimate at the end of the assimilation window. Like SC-En4DVar, the posterior ensemble of WC-En4DVar solutions also represents the posterior error covariances and the background error covariance for the next assimilation window.

Hence, we can use both SC-En4DVar and WC-En4DVar to use the ensemble statistics in recursive model updating as we do when using EnKF. The advantage of En4DVar is that we minimize precisely the cost functions in Eq. (7.1) without using the ensemble-averaged model sensitivity as in EnKF. In practice, the ensemble size is small for high-dimensional 4DVar applications, typically much less than 100 members. In this case, the ensemble covariance matrix is a poor estimate of the prior covariance matrix in a 4Dvar, which degrades the assimilation results. A partial solution to this problem is to use the ensemble to update only the variances and a few length scales in a climatological prior covariance matrix.

Note that the issue of correctly representing the error-covariance matrix also exists in the EnKF. But, since the computational cost of EnKF is much less than for an ensemble 4DVar, we can partly resolve this problem by using a larger ensemble size with EnKF. Additionally, we often use the localization and inflation schemes, as discussed in Chap. 10.

## 8 Square-Root EnKF

The so-called square-root filters belong to a popular class of ensemble Kalman filters. These are ensemble filters that do not attempt to sample the Bayesian posterior pdf. Instead, the square root methods assume a Gaussian posterior distribution with a covariance matrix equal to the Kalman-filter analysis covariance matrix in Eq. (6.33). The square-root filters’ popularity comes from their avoidance of using perturbed observations, reducing sampling errors. We can use different routes of deriving the square-root update equation, but one most commonly starts from a factorization of Eq. (6.33) using ensemble covariances.

Of course, we usually do not know the analysis error covariance, and if we knew it, we would not be able to factorize it for many real-sized applications. On the other hand, when using ensemble methods, we can replace the covariances in Eq. (6.33) with their ensemble representations, i.e.,

(8.37)
(8.38)
(8.39)
(8.40)
(8.41)
(8.42)

In Eq. (8.37), we have used the definitions in Eqs. (8.4), (8.21), and (8.34) to rewrite the Kalman filter error covariance update of Eq. (6.33) using the ensemble matrices. In Eq. (8.38), we define the matrix $${\mathbf {C}}= {\mathbf {Y}}{\mathbf {Y}}^\mathrm {T}+ {\mathbf {C}}_\textit{dd}$$. After that, in Eq. (8.39), we use the eigendecomposition

(8.43)

where we factorize a matrix of the dimension the number of ensemble members N. To make the method even more efficient, in Sect. 8.9, we will present an algorithm that computes the inversion of $${\mathbf {C}}$$ in the ensemble subspace of dimension N.

One alternative for updating the forecast ensemble anomalies is to use (8.40) and write

\begin{aligned} {\mathbf {A}}^\mathrm {a}= {\mathbf {A}}{\mathbf {Q}}\boldsymbol{\Lambda }^\frac{1}{2}, \end{aligned}
(8.44)

which we usually refer to as the one-sided square-root update. Evensen  (2009b) and Leeuwenburgh (2005) showed that this asymmetrical scheme leads to a solution that does not conserve the mean, and it creates outliers that hold most of the variance. However, by using the symmetric square root in Eq. (8.40), we obtain an update-equation

\begin{aligned} {\mathbf {A}}^\mathrm {a}= {\mathbf {A}}{\mathbf {Q}}\boldsymbol{\Lambda }^\frac{1}{2}{\mathbf {Q}}^\mathrm {T}, \end{aligned}
(8.45)

that ensures zero mean for the anomalies. The updated ensemble becomes a “symmetric” and “scaled” contraction along the different eigenvectors in $${\mathbf {Q}}$$. If we desire a more randomized update, we can add the orthogonal random matrix $$\boldsymbol{\Theta }$$, which randomizes the anomaly updates among the different directions in the eigenvector space.

Several publications show the superiority of the square-root schemes for low dimensional models and with small ensemble sizes of $$\mathcal {O}(10)$$. In these examples, the square-root solution becomes nearly identical to the traditional EnKF solution when the ensemble size is $$\mathcal {O}(100)$$. One can question how well the ensemble represents the error statistics for these small ensemble sizes. We may switch from a random sampling interpretation to an error-subspace formulation with such small ensembles.

For those who want to explore the square-root filters further, we refer to Evensen (Evensen, 2009b, Chap. 13) and the overwhelming literature on the ensemble transform Kalman filter (ETKF) and its implementation with localization LETKF  (Hunt et al., 2007). For a review comparing different ensemble square-root filters with unified notation , see  Vetra-Carvalho et al. (2018).

## 9 Ensemble Subspace Inversion

In Algorithm 5, we have represented the measurement error-covariance matrix by an ensemble of measurement perturbations, $${\mathbf {E}}$$. The inversion is, in this case, computed using an ensemble subspace scheme as was proposed by Evensen (2004), further discussed in Evensen (2009b) and recently in Evensen et al. (2019) and Evensen  (2021). This scheme projects the measurement error perturbations onto the ensemble subspace. It computes the pseudo inverse of the following factorization

\begin{aligned} \big ({\mathbf {S}}{\mathbf {S}}^\mathrm {T}&+ {\mathbf {E}}{\mathbf {E}}^\mathrm {T}\big ) \end{aligned}
(8.46)
\begin{aligned}&\approx {\mathbf {S}}{\mathbf {S}}^\mathrm {T}+ ({\mathbf {S}}{\mathbf {S}}^+) {\mathbf {E}}{\mathbf {E}}^\mathrm {T}({\mathbf {S}}{\mathbf {S}}^+)^\mathrm {T} \end{aligned}
(8.47)
\begin{aligned}&= {\mathbf {U}}\boldsymbol{\Sigma }\big ( {\mathbf {I}}_N + \boldsymbol{\Sigma }^+ {\mathbf {U}}^\mathrm {T}{\mathbf {E}}{\mathbf {E}}^\mathrm {T}{\mathbf {U}}(\boldsymbol{\Sigma }^+)^\mathrm {T}\big ) \boldsymbol{\Sigma }^\mathrm {T}{\mathbf {U}}^\mathrm {T} \end{aligned}
(8.48)
\begin{aligned}&= {\mathbf {U}}\boldsymbol{\Sigma }\big ( {\mathbf {I}}_N + {\mathbf {Q}}\boldsymbol{\Lambda }{\mathbf {Q}}^\mathrm {T}\big ) \boldsymbol{\Sigma }^\mathrm {T}{\mathbf {U}}^\mathrm {T} \end{aligned}
(8.49)
\begin{aligned}&= {\mathbf {U}}\boldsymbol{\Sigma }{\mathbf {Q}}\big ( {\mathbf {I}}_N + \boldsymbol{\Lambda }\big ){\mathbf {Q}}^\mathrm {T}\boldsymbol{\Sigma }^\mathrm {T}{\mathbf {U}}^\mathrm {T}, \end{aligned}
(8.50)

where we define the singular-value decomposition

\begin{aligned} {\mathbf {S}}={\mathbf {U}}\boldsymbol{\Sigma }{\mathbf {V}}^\mathrm {T}, \end{aligned}
(8.51)

and the identity matrix $${\mathbf {I}}_N \in \Re ^{N\times N}$$. The eigenvalue decomposition in Eq. (8.49) is of the matrix product in (8.48). Note that this eigenvalue decomposition is most efficiently computed by a singular value decomposition of the product $$\boldsymbol{\Sigma }^+ {\mathbf {U}}^\mathrm {T}{\mathbf {E}}$$. The left singular vectors will then equal the eigenvectors in $${\mathbf {Q}}$$, and the squares of the singular values will equal the eigenvalues in $$\boldsymbol{\Lambda }$$. Thus, the inversion becomes

\begin{aligned} \begin{aligned} \big ({\mathbf {S}}{\mathbf {S}}^\mathrm {T}&+ {\mathbf {E}}{\mathbf {E}}^\mathrm {T}\big )^{-1} \\&\approx \big ( {\mathbf {U}}(\boldsymbol{\Sigma }^\dagger )^\mathrm {T}{\mathbf {Q}}\big ) \, \big ( {\mathbf {I}}_N + \boldsymbol{\Lambda }\big )^{-1} \, \big ({\mathbf {U}}(\boldsymbol{\Sigma }^\dagger )^\mathrm {T}{\mathbf {Q}}\big )^\mathrm {T}.\\&= {\mathbf {U}}(\boldsymbol{\Sigma }^\dagger )^\mathrm {T}{\mathbf {Q}}\, \big ( {\mathbf {I}}_N + \boldsymbol{\Lambda }\big )^{-1} \, {\mathbf {Q}}^\mathrm {T}\boldsymbol{\Sigma }^\dagger {\mathbf {U}}^\mathrm {T}\end{aligned} \end{aligned}
(8.52)

The main advantage of this algorithm is that it allows for computing the inverse to a linear cost in the number of measurements, $$\mathcal {O}(mN^2)$$. Also, it is usually easier to simulate measurement perturbations with given statistics than to construct a complete error covariance matrix. The disadvantage is that using a finite ensemble to represent the measurement error covariance matrix introduces additional sampling errors. However, (Evensen, 2021) demonstrated that, by using a larger ensemble to represent $${\mathbf {E}}$$ in Eq. (8.46), one could reduce the associated sampling errors to a negligible magnitude and with little additional computational cost. We will demonstrate the consistency of this subspace inversion scheme in the examples in Chap. 13.

## 10 A Note on the EnKF Analysis Equation

Most operational ensemble-based schemes apply an assumption of uncorrelated measurement errors and use a diagonal $${\mathbf {C}}_\textit{dd}= {\mathbf {I}}$$, see, e.g., the reviews on data assimilation in the geosciences (Carrassi et al., 2018), weather prediction (Houtekamer & Zhang, 2016), and petroleum applications  (Aanonsen et al., 2009). This assumption is employed for simplicity for two reasons. First, the measurement error covariances are often not well known, and additionally, it simplifies the update scheme Eq. (13.5) considerably. With $${\mathbf {C}}_\textit{dd}= {\mathbf {I}}$$, Eq. (13.5) becomes

\begin{aligned} {\mathbf {Z}}^\mathrm {a}= {\mathbf {Z}}^\mathrm {f}+ {\mathbf {A}}{\mathbf {S}}^\mathrm {T}\big ({\mathbf {S}}{\mathbf {S}}^\mathrm {T}+ {\mathbf {I}}\big )^{-1} \big ({\mathbf {D}}- {\mathbf {H}}{\mathbf {Z}}\big ), \end{aligned}
(8.53)

which makes it possible to use an efficient algorithm proposed by Hunt et al. (2007) where, by using a Woodbury identity, the EnKF update becomes

\begin{aligned} {\mathbf {Z}}^\mathrm {a}= {\mathbf {Z}}^\mathrm {f}+ {\mathbf {A}}\big ({\mathbf {S}}^\mathrm {T}{\mathbf {S}}+ {\mathbf {I}}\big )^{-1} {\mathbf {S}}^\mathrm {T}\big ({\mathbf {D}}- {\mathbf {H}}{\mathbf {Z}}\big ). \end{aligned}
(8.54)

This modification reduces the size of the matrix to be inverted from $$m\times m$$ in Eq. (8.53) to $$N\times N$$ in Eq. (8.54). See also the discussion related to this particular implementation in Evensen et al. (2019, Sect. 3.2).

Alternatively, it is possible to obtain an update equation like Eq. (8.53) if one has access to a factorization $${\mathbf {C}}_\textit{dd}= {\mathbf {C}}_\textit{dd}^{\scriptscriptstyle \frac{1}{2}}{\mathbf {C}}_\textit{dd}^{\scriptscriptstyle \frac{1}{2}}$$ with $${\mathbf {C}}_\textit{dd}^{\scriptscriptstyle \frac{1}{2}}$$ being a symmetrical square root of a full rank $${\mathbf {C}}_\textit{dd}$$. E.g., write the eigenvalue decomposition

\begin{aligned} {\mathbf {C}}_\textit{dd}= {\mathbf {Q}}\boldsymbol{\Lambda }{\mathbf {Q}}^\mathrm {T}= {\mathbf {Q}}\boldsymbol{\Lambda }^{\scriptscriptstyle \frac{1}{2}}{\mathbf {Q}}^\mathrm {T}\, {\mathbf {Q}}\boldsymbol{\Lambda }^{\scriptscriptstyle \frac{1}{2}}{\mathbf {Q}}^\mathrm {T}= {\mathbf {C}}_\textit{dd}^{\scriptscriptstyle \frac{1}{2}}{\mathbf {C}}_\textit{dd}^{\scriptscriptstyle \frac{1}{2}} \end{aligned}
(8.55)

and define the symmetrical square root

\begin{aligned} {\mathbf {C}}_\textit{dd}^{\scriptscriptstyle \frac{1}{2}}= {\mathbf {Q}}\boldsymbol{\Lambda }^{\scriptscriptstyle \frac{1}{2}}{\mathbf {Q}}^\mathrm {T}, \end{aligned}
(8.56)

and its’ inverse

\begin{aligned} {\mathbf {C}}_\textit{dd}^{-{\scriptscriptstyle \frac{1}{2}}} = {\mathbf {Q}}\boldsymbol{\Lambda }^{-{\scriptscriptstyle \frac{1}{2}}} {\mathbf {Q}}^\mathrm {T}. \end{aligned}
(8.57)

Now, by scaling the predicted measurement anomalies and the innovations according to

\begin{aligned} \widehat{{\mathbf {S}}}&= {\mathbf {C}}_\textit{dd}^{-{\scriptscriptstyle \frac{1}{2}}} {\mathbf {S}}, \end{aligned}
(8.58)
\begin{aligned} \widehat{{\mathbf {D}}}&= {\mathbf {C}}_\textit{dd}^{-{\scriptscriptstyle \frac{1}{2}}} \big ({\mathbf {D}}- {\mathbf {H}}{\mathbf {Z}}\big ) , \end{aligned}
(8.59)

and with some algebra, Eq. (13.5) becomes

\begin{aligned} {\mathbf {Z}}^\mathrm {a}= {\mathbf {Z}}^\mathrm {f}+ {\mathbf {A}}\widehat{{\mathbf {S}}}^\mathrm {T}\big (\widehat{{\mathbf {S}}} \widehat{{\mathbf {S}}}^\mathrm {T}+ {\mathbf {I}}\big )^{-1} \widehat{{\mathbf {D}}}, \end{aligned}
(8.60)

and using the Woodbury identity,

\begin{aligned} {\mathbf {Z}}^\mathrm {a}= {\mathbf {Z}}^\mathrm {f}+ {\mathbf {A}}\big (\widehat{{\mathbf {S}}}^\mathrm {T}\widehat{{\mathbf {S}}} + {\mathbf {I}}\big )^{-1} \widehat{{\mathbf {S}}}^\mathrm {T}\widehat{{\mathbf {D}}}. \end{aligned}
(8.61)

Typically, high numerical costs are associated with establishing $${\mathbf {C}}_\textit{dd}^{\scriptscriptstyle \frac{1}{2}}$$ and the associated rescalings in Eqs. (8.58) and (8.59), which are both $$\mathcal {O}(m^2N)$$ operations. Additionally, $${\mathbf {C}}_\textit{dd}^{\scriptscriptstyle \frac{1}{2}}$$ needs to be of full rank or a formulation based on pseudo inverses must be employed. Thus, the discussion in this section justifies using the ensemble subspace projection scheme in Eq. (8.52) for computing updates consistently at the cost of $$\mathcal {O}(mN^2)$$, while taking measurement error-correlations into account.