Randomized-Maximum-Likelihood Sampling

Evensen, Geir; Vossepoel, Femke C.; van Leeuwen, Peter Jan

doi:10.1007/978-3-030-96709-3_7

Geir Evensen^4,8,
Femke C. Vossepoel⁵ &
Peter Jan van Leeuwen^6,7

Part of the book series: Springer Textbooks in Earth Sciences, Geography and Environment ((STEGE))

5947 Accesses

Abstract

In the following, we derive some methods for sampling the posterior conditional pdf in Eq. (3.8). We aim to estimate the full pdf, not only finding its maximum. We will, in this chapter, use an approach named randomized maximum likelihood (RML) sampling.

You have full access to this open access chapter, Download chapter PDF

In the following, we derive some methods for sampling the posterior conditional pdf in Eq. (3.8). We aim to estimate the full pdf, not only finding its maximum. We will, in this chapter, use an approach named randomized maximum likelihood (RML) sampling. Note that the name is not precise as the method attempts to sample the posterior pdf and not just the likelihood. However, we will continue using the name RML when we refer to the technique. RML provides a highly efficient approach for approximate sampling of the posterior pdf and lays the ground for developing many popular ensemble methods.

1 RML Sampling

To introduce randomized-maximum-likelihood sampling, let’s define an ensemble of cost functions where the prior vectors ${\mathbf {z}}_j^\mathrm {f}$ are samples from the Gaussian distribution in Eq. (3.5), and we introduce the perturbed measurements ${\mathbf {d}}_j={\mathbf {d}}+ \boldsymbol{\epsilon }_j$ where the perturbations $\boldsymbol{\epsilon }_j$ are samples from (3.6),

Ensemble of cost functions

$$\begin{aligned} \mathcal {J}({\mathbf {z}}_j) =\frac{1}{2}\bigl ({\mathbf {z}}_j-{\mathbf {z}}_j^\mathrm {f}\bigr )^\mathrm {T}{{\mathbf {C}}_{\textit{zz}}^{-1}}\bigl ({\mathbf {z}}_j-{\mathbf {z}}_j^\mathrm {f}\bigr ) +\frac{1}{2}\bigl ({\mathbf {g}}({\mathbf {z}}_j)-{\mathbf {d}}_j\bigr )^\mathrm {T}{\mathbf {C}}_\textit{dd}^{-1}\bigl ({\mathbf {g}}({\mathbf {z}}_j)-{\mathbf {d}}_j\bigr ), \end{aligned}$$

(7.1)

as proposed by Kitanidis, (1995) and Oliver et al., (1996). These cost functions are independent of each other and differ from the cost function (3.9) by the introduction of the random samples ${\mathbf {z}}_j^\mathrm {f}\sim \mathcal {N}({\mathbf {z}}^\mathrm {f}, {{\mathbf {C}}_{\textit{zz}}})$ and ${\mathbf {d}}_j \sim \mathcal {N}({\mathbf {d}}, {\mathbf {C}}_\textit{dd})$.

One might ask why we need to perturb the measurements when Bayes theorem tells us that we are already given the measurements in the data-assimilation problem. Indeed, Van Leeuwen (2020) contains a detailed discussion on why it is more consistent to perturbing the predicted measurements ${\mathbf {g}}({\mathbf {z}}_j)$ with a draw from the measurement error pdf. There is no practical advantage for either choice. The reason is that the cost function in Eq. (7.1) only contains the difference between the predicted and actual measurements, and the Gaussian is symmetric in its arguments. In this chapter, we will use the conventional “perturbed measurements” formalism.

Approximation 6 (RML sampling)

In the weakly nonlinear case, we can approximately sample the posterior pdf in Eq. (3.8) by minimizing the ensemble of cost functions defined by Eq. (7.1). $\square $

In the Gauss-linear case, the minimizing solutions of these cost functions precisely sample the posterior conditional pdf in Eq. (3.8). Furthermore, with an infinite number of samples, the sample mean and covariance will converge to the KF solution given by Eqs. (6.28) and (6.33). When we introduce nonlinearity into the problem, the samples will deviate from the pdf in Eq. (3.8). But in many cases with only weak nonlinearity, this approximation is acceptable. The fun fact is that nobody knows precisely which distribution the method samples in the nonlinear case. Note also that we can minimize each of the cost functions independently of the others using the Gauss–Newton method described in Chap. 3.

Similarly to Eq. (3.11), we now have an ensemble of gradients that we set to zero to minimize the ensemble of cost functions in Eq. (7.1),

Ensemble of gradients set to zero

$$\begin{aligned} {{\mathbf {C}}_{\textit{zz}}^{-1}}\bigl ({\mathbf {z}}_j - {\mathbf {z}}_j^\mathrm {f}\bigr ) + \nabla _{{\mathbf {z}}} {\mathbf {g}}\bigl ({\mathbf {z}}_j \bigr ) {\mathbf {C}}_\textit{dd}^{-1} \bigl ({\mathbf {g}}({\mathbf {z}}_j) - {\mathbf {d}}_j \bigr ) = 0. \end{aligned}$$

(7.2)

2 Approximate EKF Sampling

The simplest way to solve Eq. (7.2) for an ensemble of realizations is to use the Kalman filter update Eq. (6.44) to solve for each sample, $j=1, \dots , N_{ens}$,

Ensemble of Kalman-filter updates

(7.3)

However, as we noted in the previous chapter, these equations are only valid in the linear case or for modest updates in the nonlinear case.

3 Approximate Gauss–Newton Sampling

As an alternative to the EKF solution from Sect. 6.5, we can minimize the cost function in Eq. (7.1) without introducing the Approx. 5. We do this by using the Gauss–Newton method as in Sect. 3.4 for each of the cost functions in the ensemble. Taking the derivative of Eq. (3.10) while neglecting terms including second derivatives, we obtain an approximation to the Hessian

$$\begin{aligned} \nabla _{\mathbf {z}}\nabla _{\mathbf {z}}\mathcal {J}({\mathbf {z}}_j) \approx {{\mathbf {C}}_{\textit{zz}}^{-1}}+ \nabla _{\mathbf {z}}{\mathbf {g}}({\mathbf {z}}_j) {\mathbf {C}}_\textit{dd}^{-1} \bigl (\nabla _{\mathbf {z}}{\mathbf {g}}({\mathbf {z}}_j)\bigr )^\mathrm {T}. \end{aligned}$$

(7.4)

We can then write a Gauss–Newton iteration for ${\mathbf {z}}$ as

Ensemble of GN iterations

(7.5)

where we have defined the gradient of the observation operator at iteration i and for ensemble member j as

(7.6)

In this formulation, each realization uses the tangent-linear model ${{\mathbf {G}}_{j}^{i}}$ evaluated at the solution for realization j at iteration i. Thus, each realization has a model sensitivity that is independent of the other realizations. This approach and any other method that minimizes the cost functions in Eq. (7.1) will correctly sample the posterior distribution in the Gauss-linear case. Still, for a posterior non-Gaussian distribution, Approx. 6 applies. Thus, we can use any of the methods discussed in Chaps. 3, 4, and 5 to solve for the minimizing solution of each cost-function realization.

4 Least-Squares Best-Fit Model Sensitivity

There are two aspects of the solutions defined in Eqs. (7.3) and (7.5) that require our attention. First, we assume we know the tangent-linear model ${{\mathbf {G}}_{j}^{i}}$ and its adjoint, ${{\mathbf {G}}_{j}^{i}}^\mathrm {T}$, which is not always the case. The other aspect relates to the storage and inversion of ${{\mathbf {C}}_{\textit{zz}}}$, a huge matrix.

In cases when we do not have access to a tangent-linear model or the adjoint operator, we can use a statistical representation of the model sensitivity. Rather than computing different tangent linear operators ${{\mathbf {G}}_{j}^{i}}$ for each sample, we represent them by a statistical least-squares best-fit model sensitivity ${{\mathbf {G}}^{i}}$ common for all realizations (Chen & Oliver, 2013; Evensen, 2019; Reynolds et al., 2006) , and we introduce the following approximation.

Approximation 7 (Best-fit ensemble-averaged model sensitivity)

Interpret ${{\mathbf {G}}_{j}}$ in Eq. (7.3) and ${{\mathbf {G}}_{j}^{i}}$ in Eq. (7.5) as the sensitivity matrix in linear regression and represent them using the definition

$$\begin{aligned} {{\mathbf {G}}_{j}}&\approx {\mathbf {G}}\triangleq {\mathbf {C}}_{yz} {{\mathbf {C}}_{\textit{zz}}^{-1}}. \end{aligned}$$

(7.7)

Note that we have dropped the superscript j for the realizations. Hence, we approximate the individual model sensitivities with a common averaged sensitivity used for all realizations.

A consequence of this approximation is that we slightly alter the gradient in Eq. (7.2) and thus also the minimizing solution that the Kalman filter updates or the Gauss–Newton iterations would provide.

By introducing the averaged model-sensitivity from Eq. (7.7), we can rewrite the Gauss–Newton iteration in Eq. (7.5) as

(7.8)

(7.9)

where we have used the corollaries from Eqs. (6.9) and (6.10).

A rather tricky issue with Eq. (7.9) is the appearance of products between the averaged model sensitivity ${{\mathbf {G}}^{i}}$ evaluated at iteration i with the prior covariance matrix ${{\mathbf {C}}_{\textit{zz}}}$. Chen and Oliver (2013) provided an alternative approach by evaluating the state covariance in the Hessian at the current iterate. This modification does not impact the final solution, but it alters the update step. They introduced various strategies for solving Eqs. (7.8) and (7.9) using ensemble representations for the state covariances. The next chapter will present a recent and efficient algorithm that searches for the solution in the ensemble subspace.

Recall that ${\mathbf {y}}= {\mathbf {g}}({\mathbf {z}})$ is the model equivalent of the observed state and ${\mathbf {C}}_{yz}$ is the covariance between the state vector ${\mathbf {z}}$ and the predicted measurements ${\mathbf {y}}$. The operator ${\mathbf {G}}$, defined in Eq. (7.7), is the linear regression between ${\mathbf {y}}$ and ${\mathbf {z}}$, and we have

$$\begin{aligned} {\mathbf {G}}{{\mathbf {C}}_{\textit{zz}}}= {\mathbf {C}}_{yz}, \end{aligned}$$

(7.10)

and

$$\begin{aligned} {\mathbf {G}}{{\mathbf {C}}_{\textit{zz}}}{\mathbf {G}}^\mathrm {T}= {\mathbf {C}}_{yz} {{\mathbf {C}}_{\textit{zz}}^{-1}}{\mathbf {C}}_{zy} . \end{aligned}$$

(7.11)

We will use these expressions further in the following chapter. For now, we note that we can use the EKF update Eq. (7.3) to formulate an ensemble of Kalman-filter updates without using the tangent-linear operator, as

(7.12)

It is common to replace the term ${\mathbf {G}}{{\mathbf {C}}_{\textit{zz}}}{\mathbf {G}}^\mathrm {T}= {\mathbf {C}}_{yz} {{\mathbf {C}}_{\textit{zz}}^{-1}}{\mathbf {C}}_{zy}$ with ${\mathbf {C}}_{yy}$. However, most data-assimilation practitioners are unaware that this replacement introduces another approximation if ${\mathbf {g}}({\mathbf {z}})$ is nonlinear. In the following chapter, we will come back to this issue when discussing a low-rank ensemble approximation of the prior covariance matrix that leads to efficient ensemble-data-assimilation methods.

Author information

Authors and Affiliations

Energy, NORCE-Norwegian Research Center, Bergen, Norway
Geir Evensen
Department of Geoscience and Engineering, Delft University of Technology, Delft, The Netherlands
Femke C. Vossepoel
Department of Amospheric Sciences, Colorado State University, Fort Collins, CO, USA
Peter Jan van Leeuwen
Department of Meteorology, University of Reading, Earley Gate, Reading, UK
Peter Jan van Leeuwen
NERSC-Nansen Environmental and Remote Sensing Center, Bergen, Norway
Geir Evensen

Authors

Geir Evensen
View author publications
You can also search for this author in PubMed Google Scholar
Femke C. Vossepoel
View author publications
You can also search for this author in PubMed Google Scholar
Peter Jan van Leeuwen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geir Evensen .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Evensen, G., Vossepoel, F.C., van Leeuwen, P.J. (2022). Randomized-Maximum-Likelihood Sampling. In: Data Assimilation Fundamentals. Springer Textbooks in Earth Sciences, Geography and Environment. Springer, Cham. https://doi.org/10.1007/978-3-030-96709-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-96709-3_7
Published: 23 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96708-6
Online ISBN: 978-3-030-96709-3
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics