Abstract
In the following, we derive some methods for sampling the posterior conditional pdf in Eq. (3.8). We aim to estimate the full pdf, not only finding its maximum. We will, in this chapter, use an approach named randomized maximum likelihood (RML) sampling.
Download chapter PDF
In the following, we derive some methods for sampling the posterior conditional pdf in Eq. (3.8). We aim to estimate the full pdf, not only finding its maximum. We will, in this chapter, use an approach named randomized maximum likelihood (RML) sampling. Note that the name is not precise as the method attempts to sample the posterior pdf and not just the likelihood. However, we will continue using the name RML when we refer to the technique. RML provides a highly efficient approach for approximate sampling of the posterior pdf and lays the ground for developing many popular ensemble methods.
1 RML Sampling
To introduce randomized-maximum-likelihood sampling, let’s define an ensemble of cost functions where the prior vectors \({\mathbf {z}}_j^\mathrm {f}\) are samples from the Gaussian distribution in Eq. (3.5), and we introduce the perturbed measurements \({\mathbf {d}}_j={\mathbf {d}}+ \boldsymbol{\epsilon }_j\) where the perturbations \(\boldsymbol{\epsilon }_j\) are samples from (3.6),
Ensemble of cost functions
as proposed by Kitanidis, (1995) and Oliver et al., (1996). These cost functions are independent of each other and differ from the cost function (3.9) by the introduction of the random samples \({\mathbf {z}}_j^\mathrm {f}\sim \mathcal {N}({\mathbf {z}}^\mathrm {f}, {{\mathbf {C}}_{\textit{zz}}})\) and \({\mathbf {d}}_j \sim \mathcal {N}({\mathbf {d}}, {\mathbf {C}}_\textit{dd})\).
One might ask why we need to perturb the measurements when Bayes theorem tells us that we are already given the measurements in the data-assimilation problem. Indeed, Van Leeuwen (2020) contains a detailed discussion on why it is more consistent to perturbing the predicted measurements \({\mathbf {g}}({\mathbf {z}}_j)\) with a draw from the measurement error pdf. There is no practical advantage for either choice. The reason is that the cost function in Eq. (7.1) only contains the difference between the predicted and actual measurements, and the Gaussian is symmetric in its arguments. In this chapter, we will use the conventional “perturbed measurements” formalism.
Approximation 6 (RML sampling)
In the weakly nonlinear case, we can approximately sample the posterior pdf in Eq. (3.8) by minimizing the ensemble of cost functions defined by Eq. (7.1). \(\square \)
In the Gauss-linear case, the minimizing solutions of these cost functions precisely sample the posterior conditional pdf in Eq. (3.8). Furthermore, with an infinite number of samples, the sample mean and covariance will converge to the KF solution given by Eqs. (6.28) and (6.33). When we introduce nonlinearity into the problem, the samples will deviate from the pdf in Eq. (3.8). But in many cases with only weak nonlinearity, this approximation is acceptable. The fun fact is that nobody knows precisely which distribution the method samples in the nonlinear case. Note also that we can minimize each of the cost functions independently of the others using the Gauss–Newton method described in Chap. 3.
Similarly to Eq. (3.11), we now have an ensemble of gradients that we set to zero to minimize the ensemble of cost functions in Eq. (7.1),
Ensemble of gradients set to zero
2 Approximate EKF Sampling
The simplest way to solve Eq. (7.2) for an ensemble of realizations is to use the Kalman filter update Eq. (6.44) to solve for each sample, \(j=1, \dots , N_{ens}\),
Ensemble of Kalman-filter updates

However, as we noted in the previous chapter, these equations are only valid in the linear case or for modest updates in the nonlinear case.
3 Approximate Gauss–Newton Sampling
As an alternative to the EKF solution from Sect. 6.5, we can minimize the cost function in Eq. (7.1) without introducing the Approx. 5. We do this by using the Gauss–Newton method as in Sect. 3.4 for each of the cost functions in the ensemble. Taking the derivative of Eq. (3.10) while neglecting terms including second derivatives, we obtain an approximation to the Hessian
We can then write a Gauss–Newton iteration for \({\mathbf {z}}\) as
Ensemble of GN iterations

where we have defined the gradient of the observation operator at iteration i and for ensemble member j as

In this formulation, each realization uses the tangent-linear model \({{\mathbf {G}}_{j}^{i}}\) evaluated at the solution for realization j at iteration i. Thus, each realization has a model sensitivity that is independent of the other realizations. This approach and any other method that minimizes the cost functions in Eq. (7.1) will correctly sample the posterior distribution in the Gauss-linear case. Still, for a posterior non-Gaussian distribution, Approx. 6 applies. Thus, we can use any of the methods discussed in Chaps. 3, 4, and 5 to solve for the minimizing solution of each cost-function realization.
4 Least-Squares Best-Fit Model Sensitivity
There are two aspects of the solutions defined in Eqs. (7.3) and (7.5) that require our attention. First, we assume we know the tangent-linear model \({{\mathbf {G}}_{j}^{i}}\) and its adjoint, \({{\mathbf {G}}_{j}^{i}}^\mathrm {T}\), which is not always the case. The other aspect relates to the storage and inversion of \({{\mathbf {C}}_{\textit{zz}}}\), a huge matrix.
In cases when we do not have access to a tangent-linear model or the adjoint operator, we can use a statistical representation of the model sensitivity. Rather than computing different tangent linear operators \({{\mathbf {G}}_{j}^{i}}\) for each sample, we represent them by a statistical least-squares best-fit model sensitivity \({{\mathbf {G}}^{i}}\) common for all realizations (Chen & Oliver, 2013; Evensen, 2019; Reynolds et al., 2006) , and we introduce the following approximation.
Approximation 7 (Best-fit ensemble-averaged model sensitivity)
Interpret \({{\mathbf {G}}_{j}}\) in Eq. (7.3) and \({{\mathbf {G}}_{j}^{i}}\) in Eq. (7.5) as the sensitivity matrix in linear regression and represent them using the definition
Note that we have dropped the superscript j for the realizations. Hence, we approximate the individual model sensitivities with a common averaged sensitivity used for all realizations.
A consequence of this approximation is that we slightly alter the gradient in Eq. (7.2) and thus also the minimizing solution that the Kalman filter updates or the Gauss–Newton iterations would provide.
By introducing the averaged model-sensitivity from Eq. (7.7), we can rewrite the Gauss–Newton iteration in Eq. (7.5) as


where we have used the corollaries from Eqs. (6.9) and (6.10).
A rather tricky issue with Eq. (7.9) is the appearance of products between the averaged model sensitivity \({{\mathbf {G}}^{i}}\) evaluated at iteration i with the prior covariance matrix \({{\mathbf {C}}_{\textit{zz}}}\). Chen and Oliver (2013) provided an alternative approach by evaluating the state covariance in the Hessian at the current iterate. This modification does not impact the final solution, but it alters the update step. They introduced various strategies for solving Eqs. (7.8) and (7.9) using ensemble representations for the state covariances. The next chapter will present a recent and efficient algorithm that searches for the solution in the ensemble subspace.
Recall that \({\mathbf {y}}= {\mathbf {g}}({\mathbf {z}})\) is the model equivalent of the observed state and \({\mathbf {C}}_{yz}\) is the covariance between the state vector \({\mathbf {z}}\) and the predicted measurements \({\mathbf {y}}\). The operator \({\mathbf {G}}\), defined in Eq. (7.7), is the linear regression between \({\mathbf {y}}\) and \({\mathbf {z}}\), and we have
and
We will use these expressions further in the following chapter. For now, we note that we can use the EKF update Eq. (7.3) to formulate an ensemble of Kalman-filter updates without using the tangent-linear operator, as

It is common to replace the term \({\mathbf {G}}{{\mathbf {C}}_{\textit{zz}}}{\mathbf {G}}^\mathrm {T}= {\mathbf {C}}_{yz} {{\mathbf {C}}_{\textit{zz}}^{-1}}{\mathbf {C}}_{zy}\) with \({\mathbf {C}}_{yy}\). However, most data-assimilation practitioners are unaware that this replacement introduces another approximation if \({\mathbf {g}}({\mathbf {z}})\) is nonlinear. In the following chapter, we will come back to this issue when discussing a low-rank ensemble approximation of the prior covariance matrix that leads to efficient ensemble-data-assimilation methods.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Evensen, G., Vossepoel, F.C., van Leeuwen, P.J. (2022). Randomized-Maximum-Likelihood Sampling. In: Data Assimilation Fundamentals. Springer Textbooks in Earth Sciences, Geography and Environment. Springer, Cham. https://doi.org/10.1007/978-3-030-96709-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-96709-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96708-6
Online ISBN: 978-3-030-96709-3
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)