This chapter discusss solution methods for particular cases of the minimization problem defined by the cost function in Eq. (3.9). We start by looking for a closed-form solution that minimizes the cost function, and then we continue discussing how specific cases lead to several well-known methods. The first case assumes that the measurements are all located at the initial time of the assimilation window. Thus, there is no need for any model integrations during the minimization. The problem then reduces to the classical 3-dimensional variational (3DVar) formulation. The second case assumes that the model and the measurement operator are linear, allowing us to find an explicit gradient solution in Eq. (3.11). This particular case leads to the Kalman filter (KF) update equations. And, additionally, if we have measurements located at the initial time of the assimilation window, we obtain the standard form of the KF. To further simplify the KF, we get the simplified optimal interpolation (OI) algorithm by ignoring the time evolution of error statistics. In addition to these specific methods, we consider the weakly nonlinear case where we can sometimes still use the Kalman filter equations with its linearized model and measurement operator in the extended Kalman filter (EKF).

   

1 Linear Update from Predicted Measurements

To explore possible linear solutions to the estimation problem, let’s start from a closed-form solution of the estimation problem in Eq. (3.11) in the trivial case when \({\mathbf {g}}({\mathbf {z}})={\mathbf {G}}{\mathbf {z}}\) is linear. We assume that the state vector is the model solution at the initial time of the assimilation window \({\mathbf {z}}={\mathbf {x}}_0\), and we have the measurements distributed over the assimilation window. In this case, Eq. (3.11) becomes

(6.1)

which has an explicit solution

(6.2)

Here \({\mathbf {y}}={\mathbf {G}}{\mathbf {z}}\) is the linear prediction of the measurements, which we can write as

$$\begin{aligned} {\mathbf {y}}= {\mathbf {G}}{\mathbf {z}}= {\mathbf {H}}\begin{pmatrix} {\mathbf {z}}\\ {\mathbf {x}}_1 \\ \vdots \\ {\mathbf {x}}_K \end{pmatrix} = {\mathbf {H}}\begin{pmatrix} {\mathbf {z}}\\ {\mathbf {M}}_1 {\mathbf {z}}\\ \vdots \\ {\mathbf {M}}_K \ldots {\mathbf {M}}_1 {\mathbf {z}}\end{pmatrix} = {\mathbf {H}}\begin{pmatrix} {\mathbf {I}}\\ {\mathbf {M}}_1 \\ \vdots \\ {\mathbf {M}}_K \ldots {\mathbf {M}}_1 \end{pmatrix} {\mathbf {z}}={\mathbf {H}}\boldsymbol{\mathcal {M}}{\mathbf {z}}, \end{aligned}$$
(6.3)

which defines \({\mathbf {G}}\), and where we introduce \(\boldsymbol{\mathcal {M}}\) for later use. Thus, if the model is linear, we can compute the update at the initial time of the assimilation window from measurements located throughout the assimilation window by solving Eq. (6.2). We include a time-step index on \({\mathbf {M}}\) so that the equation will also apply in the nonlinear case where \({\mathbf {M}}_k\) is the tangent-linear model at time \(t_k\).

To find an explicit solution of the estimation problem in Eq. (3.11) in the nonlinear case, we introduce the following approximation

Approximation 5 (Linearization)

Linearize \({\mathbf {g}}({\mathbf {z}})\) around the prior estimate \({\mathbf {z}}^\mathrm {f}\),

$$\begin{aligned} {\mathbf {g}}\bigl ({\mathbf {z}}\bigr ) \approx {\mathbf {g}}\bigl ({\mathbf {z}}^\mathrm {f}\bigr ) + {\mathbf {G}}\bigl ({\mathbf {z}}- {\mathbf {z}}^\mathrm {f}\bigr ), \end{aligned}$$
(6.4)

and approximate the gradient in Eq. (6.1) with the gradient evaluated at the prior estimate

$$\begin{aligned} \nabla _{\mathbf {z}}{\mathbf {g}}\bigl ({\mathbf {z}}\bigr ) \approx {\mathbf {G}}^\mathrm {T}, \end{aligned}$$
(6.5)

where we have defined

$$\begin{aligned} {\mathbf {G}}^\mathrm {T}= \nabla _{\mathbf {z}}{\mathbf {g}}\bigl ({\mathbf {z}}\bigr )\big |_{{\mathbf {z}}={\mathbf {z}}^\mathrm {f}} . \end{aligned}$$
(6.6)

Here, \({\mathbf {G}}\) is the tangent-linear operator of \({\mathbf {g}}({\mathbf {z}})\) evaluated at \({\mathbf {z}}^\mathrm {f}\), and \({\mathbf {G}}^\mathrm {T}\) is its adjoint. Note that the Eq. (6.6) implies the following

$$\begin{aligned} {\mathbf {M}}_k^\mathrm {T}&= \nabla _{\mathbf {z}}{\mathbf {m}}\bigl ({\mathbf {z}}\bigr )\big |_{{\mathbf {z}}={\mathbf {z}}_k} \quad \text {and} \quad {\mathbf {H}}^\mathrm {T}= \nabla _{{\mathbf {m}}({\mathbf {z}})} {\mathbf {h}}\bigl ({\mathbf {m}}({\mathbf {z}})\bigr )\big |_{{\mathbf {z}}={\mathbf {z}}_k}. \end{aligned}$$
(6.7)

The linearization in Eq. (6.4) and approximation in Eq. (6.5) allow us to rewrite Eq. (3.11) in terms of \({\mathbf {g}}({\mathbf {z}}^\mathrm {f})\) and \({\mathbf {G}}\), with an explicit solution

(6.8)

We can write this equation in an alternative form by using the corollaries 

Woodbury corollaries

(6.9)
(6.10)

which we derive from the Woodbury identity. We then obtain

(6.11)

where we solve for the update in measurement space. Due to Approx. 5, the Eqs. (6.8) and (6.11) are only valid for small updates.

Using Eq. (6.11), we can compute an approximate update of the state vector \({\mathbf {z}}\) from measurements distributed over the assimilation window. Thus, the method solves a similar problem to the SC-4DVar discussed in Chap. 4 without using iterations.

Interestingly, from Eq. (6.3), the products \({\mathbf {G}}{{\mathbf {C}}_{\textit{zz}}}\) and \({\mathbf {G}}{{\mathbf {C}}_{\textit{zz}}}{\mathbf {G}}^\mathrm {T}\) include a forward propagation of the background-error-covariance matrix \({{\mathbf {C}}_{\textit{zz}}}\) leading to a covariance matrix for the model state over the whole data assimilation window. When we measure the resulting covariances, we obtain the covariances between the predicted measurements and the state vector, i.e., \({\mathbf {C}}_\textit{yz}= {\mathbf {G}}{{\mathbf {C}}_{\textit{zz}}}\) and \({\mathbf {C}}_\textit{yy}= {\mathbf {G}}{{\mathbf {C}}_{\textit{zz}}}{\mathbf {G}}^\mathrm {T}\). With this, we can write Eq. (6.11) as

(6.12)

In a prediction system, we would like to initialize the prediction for the next assimilation window. A question is whether it is possible to update the solution at the end of the assimilation window. Below, we will see that this is possible with the Kalman filter that provides an update and optimal solution at the end of the assimilation window by sequentially assimilating the measurements while evolving the model solution and its error statistics forward in time.

Let’s revert to Eq. (6.11) and multiply the equation with \(\boldsymbol{\mathcal {M}}\) defined in Eq. (6.3) to obtain

(6.13)

We can write this equation as

(6.14)

It is clear that this formulation gives a smoother update of the model solution over the whole assimilation window, processing all the measurements in one go, as discussed in Sect. 2.4.1. If we are only interested in the solution at the time \(t_K\), we can compute the following

(6.15)
(6.16)

As for the case where we updated the initial model state of the assimilation window, we now use the covariance matrix \({\mathbf {C}}_{x_{K}y}\) to update the final model state of the assimilation window. Using Eq. (6.14), we can update the model-state vector at any time within the assimilation window.

In Eq. (6.14), we must integrate the model to predict \({\mathbf {x}}_K^\mathrm {f}\) and the covariance matrix through \(\boldsymbol{\mathcal {M}}{{\mathbf {C}}_{\textit{zz}}}\boldsymbol{\mathcal {M}}^\mathrm {T}\), where the equation to update \(\boldsymbol{\mathcal {M}}{{\mathbf {C}}_{\textit{zz}}}\boldsymbol{\mathcal {M}}^\mathrm {T}\) is similar to the error covariance equation from Eq. (2.28) without explicit model errors.

From the above, we learn that it is possible to update the model state at a particular time using measurements distributed in time by exploiting the time correlations . Sakov et al. (2010) discussed this “asynchronous” data assimilation in the ensemble Kalman filter. They showed how to assimilate batches of multiple measurements distributed in time to avoid stopping and restarting the model integration too frequently.

The above computations are not convenient, but the approach becomes practical with the ensemble data-assimilation methods. Finally, note that in this linear case, without model errors, updating \({\mathbf {x}}_0\) and propagating this solution forward to time \(t_K\) to find \({\mathbf {x}}_K\) gives the same result for \({\mathbf {x}}_K\) as if we first run the model to obtain a forecast of \({\mathbf {x}}_K\) and then compute its update.

2 3DVar

 

3DVar used to be a popular approach to minimize the cost function from Eq. (3.9), assuming that the prior and the measurements are both available at the initial time of the assimilation window. This “time-independence” implies that the possibly nonlinear function \({\mathbf {g}}({\mathbf {z}})\) represents only the measurement operator and not the model operator. In this case, the state vector \({\mathbf {z}}\) includes the model state at the initial time of the assimilation window and may contain model parameters. We then start from the cost function

3DVar costfunction

$$\begin{aligned} \mathcal {J}({\mathbf {z}}) = \frac{1}{2} \bigl ({\mathbf {z}}-{\mathbf {z}}^\mathrm {f}\bigr )^{\mathrm {T}}\, {{\mathbf {C}}_{\textit{zz}}^{-1}}\, \bigl ({\mathbf {z}}-{\mathbf {z}}^\mathrm {f}\bigr ) + \frac{1}{2} \bigl ({\mathbf {h}}({\mathbf {z}})-{\mathbf {d}}\bigr )^{\mathrm {T}}\, {\mathbf {C}}_\textit{dd}^{-1}\,\bigl ({\mathbf {h}}({\mathbf {z}})-{\mathbf {d}}\bigr ). \end{aligned}$$
(6.17)

The 3DVar method refers specifically to a sequential data-assimilation approach where we use a constant-in-time background or prior error covariance \({{\mathbf {C}}_{\textit{zz}}}\) for each subsequent update step. Thus, the method does not propagate error statistics from one update time till the next, and there is no updating of the analysis error covariance. This 3DVar update scheme solves a Gauss–Newton iteration like in Eqs. (3.14) or (3.17). Thus, 3DVar is a computationally efficient, although approximate method.

The gradient of the cost function is still the one in Eq. (3.10) but with \({\mathbf {g}}({\mathbf {z}})\) replaced by \({\mathbf {h}}({\mathbf {z}})\), i.e.,

$$\begin{aligned} \nabla _{\mathbf {z}}\mathcal {J}({\mathbf {z}}) = {{\mathbf {C}}_{\textit{zz}}^{-1}}\bigl ({\mathbf {z}}- {\mathbf {z}}^\mathrm {f}\bigr ) + {{\mathbf {H}}}^\mathrm {T}\, {\mathbf {C}}_\textit{dd}^{-1} \, \bigl ({\mathbf {h}}({\mathbf {z}}) - {\mathbf {d}}\bigr ) , \end{aligned}$$
(6.18)

where we have used

$$\begin{aligned} {\mathbf {H}}^\mathrm {T}= \nabla _{\mathbf {z}}{\mathbf {h}}\bigl ({\mathbf {z}}\bigr )\big |_{{\mathbf {z}}={\mathbf {z}}} . \end{aligned}$$
(6.19)

No explicit solution solves Eq. (6.18) being equal to zero since the gradient includes the nonlinear measurement operator \({\mathbf {h}}({\mathbf {z}})\). Thus, to compute the 3DVar solution, we must use an iterative solver. Another reason for using an iterative approach is that we avoid forming the explicit model state covariance matrix and invert it, as in the 4DVar methods.

Typically, we initialize a Gauss–Newton iteration with the prior estimate

$$\begin{aligned} {\mathbf {z}}^0 = {\mathbf {z}}^\mathrm {f}, \end{aligned}$$
(6.20)

and iterate

$$\begin{aligned} {\mathbf {z}}^{i+1}= {\mathbf {z}}^{i}- \gamma ^{i}{\mathbf {B}}^{i}\nabla _{{\mathbf {z}}}\mathcal {J}\bigl ({\mathbf {z}}^{i}\bigr ), \end{aligned}$$
(6.21)

until convergence. In this expression, \({\mathbf {B}}^{i}\) is the inverse of the Hessian given by

$$\begin{aligned} {{\mathbf {C}}_{\textit{zz}}^{-1}}+ {{\mathbf {H}}^{i}}^\mathrm {T}{\mathbf {C}}_\textit{dd}^{-1} {\mathbf {H}}^{i}. \end{aligned}$$
(6.22)

If we use the gradient from Eq. (6.18), we can write the iteration in Eq. (6.21) as

$$\begin{aligned} {\mathbf {z}}^{i+1}&= {\mathbf {z}}^{i}- \gamma ^{i}\Bigl ({{\mathbf {C}}_{\textit{zz}}^{-1}}+ {{\mathbf {H}}^{i}}^\mathrm {T}{\mathbf {C}}_\textit{dd}^{-1} {\mathbf {H}}^{i}\Bigr )^{-1} \Bigl ( {{\mathbf {C}}_{\textit{zz}}^{-1}}\bigl ({\mathbf {z}}^{i}- {\mathbf {z}}^\mathrm {f}\bigr ) + {{\mathbf {H}}^{i}}^{\mathrm {T}}\, {\mathbf {C}}_\textit{dd}^{-1} \, \bigl ({\mathbf {h}}({\mathbf {z}}^{i}) - {\mathbf {d}}\bigr ) \Bigr )&= {\mathbf {z}}^{i}- \gamma ^{i}({\mathbf {z}}^{i}-{\mathbf {z}}^\mathrm {f}) \nonumber \\ \end{aligned}$$
(6.23)
(6.24)

where we again used the corollaries from Eqs. (6.9) and (6.10).

Instead of forming the Hessian and inverting, it is common to introduce the control-variable transform as in strong-constraint 4Dvar. It is then possible to apply iterative methods like conjugate-gradient to solve the linearized problem.

An advantage of 3DVar is that it minimizes the cost function with a nonlinear measurement operator. The method is highly efficient as it does not update or evolve error statistics in time. Still, the approximation of a constant-in-time background-error-covariance matrix can be an essential drawback of 3DVar. Furthermore, while 3DVar solves for the update with a nonlinear measurement functional, we can compute the update even more efficiently in the case with a linear measurement functional using the Kalman-filter update equations. With the assumptions on linearity, this approach avoids iterations and leads to the Optimal Interpolation method in the case of stationary error statistics.

3 Kalman Filter

 

Generally, it is only possible to write a closed-form solution of the Eq. (3.11) in the trivial case when \({\mathbf {g}}({\mathbf {z}})\) is linear. In the case with a linear measurement functional, and when the measurements are available at the initial time of the assimilation window, Eq. (3.11) reduces to Eq. (6.17). With a linear measurement operator, \({\mathbf {h}}({\mathbf {z}})\) becomes \({\mathbf {H}}{\mathbf {z}}\), and we write the cost function in Eq. (6.17) as

$$\begin{aligned} \mathcal {J}({\mathbf {z}}) = \frac{1}{2} \bigl ({\mathbf {z}}-{\mathbf {z}}^\mathrm {f}\bigr )^{\mathrm {T}}\, {{\mathbf {C}}_{\textit{zz}}^{-1}}\, \bigl ({\mathbf {z}}-{\mathbf {z}}^\mathrm {f}\bigr ) + \frac{1}{2} \bigl ({\mathbf {H}}{\mathbf {z}}-{\mathbf {d}}\bigr )^{\mathrm {T}}\, {\mathbf {C}}_\textit{dd}^{-1}\,\bigl ({\mathbf {H}}{\mathbf {z}}-{\mathbf {d}}\bigr ). \end{aligned}$$
(6.25)

By setting the gradient of the cost function to zero,

$$\begin{aligned} {{\mathbf {C}}_{\textit{zz}}^{-1}}\bigl ({\mathbf {z}}^\mathrm {a}- {\mathbf {z}}^\mathrm {f}\bigr ) + {{\mathbf {H}}}^{\mathrm {T}}\, {\mathbf {C}}_\textit{dd}^{-1} \, \bigl ({\mathbf {H}}{\mathbf {z}}^\mathrm {a}- {\mathbf {d}}\bigr ) =0, \end{aligned}$$
(6.26)

we find the Kalman filter update equation

(6.27)

We solve the analysis \({\mathbf {z}}^\mathrm {a}\) in the state space because we define the matrices we invert in the state space. By using the matrix identity from Eq. (6.10), we can rewrite Eq. (6.27) to obtain the standard form of the Kalman filter update equation

The Kalman filter state update

(6.28)

which solves for the solution in the measurement space just like the formulation in Eq. (6.11).    Indeed, the matrix we have to invert is defined in the measurement space.

We can find the Hessian of the cost function in Eq. (6.25) by taking the second derivative, leading to

$$\begin{aligned} \nabla _z \nabla _z \mathcal {J}({\mathbf {z}}) = {\mathbf {C}}_{zz}^{-1} + {\mathbf {H}}{\mathbf {C}}_{dd}^{-1} {\mathbf {H}}^T . \end{aligned}$$
(6.29)

Note that the Hessian is not dependent on the state \({\mathbf {z}}\) but only on the prior and the measurement covariances. We know that the posterior is Gaussian, with covariance matrix \({\mathbf {C}}_{zz}^\mathrm {a}\). Hence, we can write the cost function as

$$\begin{aligned} \mathcal {J}({\mathbf {z}}) = \frac{1}{2} \bigl ({\mathbf {z}}-{\mathbf {z}}^\mathrm {a}\bigr )^{\mathrm {T}}\, \left( {\mathbf {C}}_{zz}^\mathrm {a}\right) ^{-1}\, \bigl ({\mathbf {z}}-{\mathbf {z}}^\mathrm {a}\bigr ) + constant \end{aligned}$$
(6.30)

in which the constant depends on the measurement \({\mathbf {d}}\), but not on \({\mathbf {z}}\). Taking the second derivative of this version of the cost function gives

$$\begin{aligned} \nabla _z \nabla _z \mathcal {J}({\mathbf {z}}^\mathrm {a}) = \left( {\mathbf {C}}_{zz}^\mathrm {a}\right) ^{-1} . \end{aligned}$$
(6.31)

Since the two expressions for the Hessian must be the same, we find for the posterior covariance

$$\begin{aligned} \left( {\mathbf {C}}_{zz}^\mathrm {a}\right) ^{-1} = {\mathbf {C}}_{zz}^{-1} + {\mathbf {H}}{\mathbf {C}}_{dd}^{-1} {\mathbf {H}}^T . \end{aligned}$$
(6.32)

With this expression, we have proven that in the linear case, the inverse of the Hessian is the posterior error-covariance matrix. Eq. (6.32) is the equation for computing the error-covariance matrix in the state space, and we can rewrite it using the matrix identity (6.9) to find

The Kalman filter error-covariance update

(6.33)

In contrast to using a stationary background-error-covariance matrix in 3DVar, the Kalman filter updates and evolves the error statistics in time, using Eq. (2.28). The standard form for the Kalman filter is a recursion over measurement times where we evolve the state vector and its covariance from one measurement time till the next by solving the following model and error covariance equations, starting from the update \({\mathbf {z}}_k={\mathbf {z}}^\mathrm {a}\) and \({{\mathbf {C}}_{\textit{zz}}}_{,k} = {{\mathbf {C}}_{\textit{zz}^\mathrm {a}}}\) at the time \(t_k\)

$$\begin{aligned} {\mathbf {z}}_{k+1}&= {\mathbf {M}}{\mathbf {z}}_{k} , \end{aligned}$$
(6.34)
$$\begin{aligned} {{{\mathbf {C}}_{\textit{zz}}}}_{,k+1}&= {\mathbf {M}}{{{\mathbf {C}}_{\textit{zz}}}}_{,k} {\mathbf {M}}^\mathrm {T}+ {\mathbf {C}}_\textit{qq} . \end{aligned}$$
(6.35)

We integrate these equations until the next time \(t_k\) when we update the model with new measurements. We then set \({\mathbf {z}}^\mathrm {f}= {\mathbf {z}}_{k}\) and \({{\mathbf {C}}_{\textit{zz}}}= {{{\mathbf {C}}_{\textit{zz}}}}_{,k}\), and we compute the update from Eqs. (6.28) and (6.33), before we continue the integration of Eqs. (6.34) and (6.35).

Thus, we define the assimilation windows in the KF to cover the time intervals between two consecutive measurement times. We integrate the model solution and the error covariance matrix from the start of an assimilation window till the next. We then update the predicted model state vector and its error covariance matrix at the initial time of the next assimilation window. We repeat this recursion as we progress from one assimilation window to the next.

In the case of a linear measurement operator and Gaussian priors, Eqs. (6.28) and (6.33) provide the variance minimizing solution of

$$\begin{aligned} f({\mathbf {z}}|{\mathbf {d}}) \propto \exp \biggl ( -\frac{1}{2}\mathcal {J}({\mathbf {z}}) \biggr ), \end{aligned}$$
(6.36)

with cost function \(\mathcal {J}\) defined in Eq. (6.25). We note that the variance-minimizing solution is equal to the MAP solution for a purely Gaussian problem. Thus, the Eqs. (6.28) and (6.33) exactly represent the posterior pdf for the Gauss-linear case as described in the cost function in Eq. (3.9).

One of the main issues with the KF is the storage of the vast error-covariance matrix \({{\mathbf {C}}_{\textit{zz}}}\) and the computational cost of its evolution in time. For an example of using the KF and its properties, we refer to Chaps. 12 and 13.

We observe that we can also write the KF update in Eq. (6.28) using the notation

(6.37)
(6.38)

where \({\mathbf {K}}\in \Re ^{n\times m}\) defines the Kalman gain matrix of the Kalman filter (Kalman, 1960),

(6.39)

and Eqs. (6.37)–(6.39) represent the update step of the Kalman filter.  

This section showed that the solution of Eq. (3.11) is defined in the state space, i.e., we solve for \({\mathbf {z}}^\mathrm {a}\). However, we have transformed the update computation to the measurement space using the Woodbury identity, identifiable by the matrix that we must invert in the Kalman filter. Bennett (1992) showed that it is possible to write the exact solution of Eq. (6.28) as

$$\begin{aligned} {\mathbf {z}}^\mathrm {a}= {\mathbf {z}}^\mathrm {f}+ {\mathbf {C}}_\textit{zy}{\mathbf {b}}. \end{aligned}$$
(6.40)

which is a linear combination of “representer functions” 

$$\begin{aligned} {\mathbf {C}}_\textit{zy}= {{\mathbf {C}}_{\textit{zz}}}{\mathbf {H}}^\mathrm {T}, \end{aligned}$$
(6.41)

with coefficients \({\mathbf {b}}\) found by solving the linear system

(6.42)

  The update to \({\mathbf {z}}^\mathrm {f}\) resides in an m-dimensional space in this formulation, hence the notation measurement space. The notation state space makes sense as the state vector belongs to it. On the other hand, the notation measurement space refers to the fact that we first find the solution in the measurement space, and then transform it to the state space via the \({\mathbf {C}}_{zy}\) matrix. We are still computing the solution in the state space, but the update is a linear combination of m representer functions \({\mathbf {C}}_\textit{zy}\). Hence, as we calculate the update to \({\mathbf {z}}^\mathrm {f}\) in the space spanned by the m representer functions, we could also have used a name like “representer space.” The critical point is that we reduce the inverse calculations from an n-dimensional problem to an m-dimensional one.

4 Optimal Interpolation

 

From the KF formulation, we obtain an even more straightforward sequential updating algorithm named optimal interpolation (OI) by applying the approximation of time-invariant error statistics. In OI, we only solve for the model prediction in Eq. (6.34) and update the model state according to Eq. (6.28), with \({{\mathbf {C}}_{\textit{zz}}}\) being a constant-in-time prior error-covariance matrix. In the linear case, optimal interpolation is equivalent to the 3DVar method. However, the iterative solution method used in 3DVar allows for solving the update with a nonlinear measurement functional. Avoiding a matrix inversion can be much more efficient, even in the linear case.

5 Extended Kalman Filter

The extended Kalman filter (EKF) allows applying the KF with a nonlinear model and measurement operator. To derive the equations for the update step, we again start with the 3DVar cost function in Eq. (6.17) and its gradient Eq. (6.18). To find an explicit but approximate solution \({\mathbf {z}}\) of the gradient in Eq. (6.18) equal to zero, we use the Approx. 5, which allows us to write 

$$\begin{aligned} {{\mathbf {C}}_{\textit{zz}}^{-1}}\bigl ({\mathbf {z}}^\mathrm {a}- {\mathbf {z}}^\mathrm {f}\bigr ) + {{\mathbf {H}}}^{\mathrm {T}}\, {\mathbf {C}}_\textit{dd}^{-1} \, \bigl ({\mathbf {h}}({\mathbf {z}}^\mathrm {f}) +{\mathbf {H}}({\mathbf {z}}^\mathrm {a}- {\mathbf {z}}^\mathrm {f}) - {\mathbf {d}}\bigr ). \end{aligned}$$
(6.43)

Like for the KF, we find an explicit solution of this equation as

(6.44)

The update equation becomes identical to the Eq. (6.33) used in the Kalman filter but now uses \({\mathbf {h}}\) instead of \({\mathbf {H}}\) when comparing the measurements to the model prediction. To derive an equation for the error covariance evolution, we need to linearize the model equations. By introducing the linearization from Eq. (6.7) in Approx. 5, we can compute the time evolution of the model state and its error covariance from

$$\begin{aligned} {\mathbf {z}}_{k+1}&= {\mathbf {m}}\bigl ({\mathbf {z}}_k\bigr ) \end{aligned}$$
(6.45)
$$\begin{aligned} {{\mathbf {C}}_{\textit{zz}}}_{,k+1}&= {\mathbf {M}}_k {{\mathbf {C}}_{\textit{zz}}}_{,k} {\mathbf {M}}_k^\mathrm {T}+ {\mathbf {C}}_\textit{qq}, \end{aligned}$$
(6.46)

where Eq. (6.46) was derived in Sect. 2.3.2.

Note that the EKF applies linear versions of the model and measurement operators. The state vector \({\mathbf {z}}^\mathrm {f}\) contains the predicted model state at an update time originating from the nonlinear model Eq. (2.5). However, an approximate linearized equation (2.28) describes the time evolution of the state-error-covariance matrix. Evensen  (1992 found that using a linearized error-covariance equation led to linear instabilities, which would cause the predicted error covariance to blow up for many nonlinear models with unstable dynamics such as ocean and atmospheric models. We will discuss the EKF further in the example in Chap. 12.