Assimilation of multiple linearly dependent data vectors

Assimilation of a sequence of linearly dependent data vectors, {dl}l=1L\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\{d_{l}\}^{L}_{l=1}$\end{document} such that dl=BldLl=1L−1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${d_{l} = B_{l}d_{L}}^{L-1}_{ l=1}$\end{document}, is considered for a parameter estimation problem. Such a data sequence can occur, for example, in the context of multilevel data assimilation. Since some information is used several times when linearly dependent data vectors are assimilated, the associated data-error covariances must be modified. I develop a condition that the modified covariances must satisfy in order to sample correctly from the posterior probability density function of the uncertain parameter in the linear-Gaussian case. It is shown that this condition is a generalization of the well-known condition that must be satisfied when assimilating the same data vector multiple times. I also briefly discuss some qualitative and computational issues related to practical use of the developed condition.


Introduction
Consider a parameter estimation problem defined by d = g (m) + η, where d denotes the data vector, g the forwardmodel operator, m the unknown parameter vector and η a realization of the measurement error. The Bayesian framework for parameter estimation includes uncertainty quantification as an integral part. Markov-chain Monte Carlo (MCMC) methods can sample correctly from the posterior probability density function (PDF) for the parameter vector also for nonlinear problems, but are prohibitively computationally expensive for many realistic problems, such as reservoir history matching. Ensemble-based methods, such as the ensemble Kalman filter (EnKF) [4] and the ensemble smoother (ES) [12], sample correctly from the posterior PDF when the forward-model operator is linear, the prior PDF is Gaussian and the ensemble size approaches infinity (linear-Gaussian case). In the general case, these methods can be seen as computationally tractable, approximate alternatives to MCMC methods. A practical disadvantage of the EnKF with respect to the ES for nonlinear Trond Mannseth trma@norceresearch parameter estimation problems is that it requires a rerun of the forward simulator from initial time to the current assimilation time to avoid statistical inconsistencies [18]. If such reruns are performed, however, the EnKF can be expected to outperform the ES, at least for weakly nonlinear problems [5,6].
The nonlinearity of the reservoir history matching problem has triggered the development of popular data assimilation (DA) methods such as iterative ES [1] and ES with multiple data assimilations (ES-MDA) [3]. As shown in [15], ES-MDA is related to annealed importance sampling [14]. With ES-MDA, d is assimilated L times (L is a predetermined number), and a compensation for utilizing the same information L times must then be devised. In assimilation cycle no. l, the original data-error covariance matrix, C, is therefore replaced by an inflated data-error covariance matrix, α l C, where α l > 1.
The motivation for introducing methods such as the ES-MDA is to better handle nonlinearities by taking L smaller steps instead of a single large step. Still, it is highly desirable for any ensemble-based method to sample correctly from the posterior PDF in the linear-Gaussian case. If the inflation coefficients {α l } L l=1 satisfy a certain equationtermed the MDA condition in this paper-the ES-MDA will sample correctly in the linear-Gaussian case [3].
I consider assimilation of multiple linearly dependent data vectors, which can be seen as a generalization of MDA. To illustrate this, let {d l } L l=1 denote a sequence of linearly dependent data vectors to be assimilated, that is, denotes a sequence of matrices. In the special case when {B l = I L } L−1 l=1 , where I L denotes the identity matrix, all L data vectors equal d L . Hence, MDA is a special case of assimilation of multiple linearly dependent data vectors. In addition to being of interest as a generalization of MDA, assimilation of multiple linearly dependent data vectors occur, for example, in the context of multilevel data assimilation (see the Appendix).
In this paper, I develop a condition that any data assimilation method has to fulfill in order to sample correctly in the linear-Gaussian case when multiple linearly dependent data vectors are assimilated. This condition reduces to the MDA condition in the special case where all the involved data vectors are identical. While MDA utilizes exactly the same information multiple times, the situation is more subtle when considering the generalization to assimilation of multiple linearly dependent data vectors. Clearly, part of, but not necessarily all of, the available information is used multiple times. For this reason, and to emphasize the relation to MDA, I denote assimilation of multiple linearly dependent data vectors partially multiple data assimilation (PMDA). The condition that has to be fulfilled in order to sample correctly in the linear-Gaussian case when multiple linearly dependent data vectors are assimilated will be denoted the PMDA condition.
The main emphasis in the paper is on the derivation of the PMDA condition, leading up to the main result, (19) and (20), and on the relation to the MDA condition, but I will also briefly discuss practical use of the PMDA condition. More work in this direction is, however, required, but that should preferably be performed when applying the PMDA condition to a nonlinear model in a real application with linearly dependent data vectors, which is outside the scope of the current paper.
I first (Section 2.1) consider the standard linear-Gaussian case with a single data vector, and write up expressions for the mean and covariance of the (Gaussian) posterior PDF. Next (Section 2.2), I introduce multiple data vectors, and write up expressions for the mean and covariance of the posterior PDF for that variant of the linear-Gaussian case. I also derive some expressions that will serve as a common starting point for the sketch of the derivation of the wellknown MDA condition, where the data vectors are identical (Section 3), and for the derivation of the novel PMDA condition, where the data vectors are linearly dependent (Section 4). In addition to the derivation of the PMDA condition, Section 4 also discusses its practical use.
Finally, my motivation for deriving the PMDA condition for linearly dependent data vectors stems from my interest in multilevel data assimilation. I therefore provide a brief motivation for multilevel data assimilation in the Appendix, where I also point out the connection to linearly dependent data vectors.

Linear-Gaussian case
In [2, Section 4.1 and Section 4.2], the authors gave two alternative derivations of the MDA condition in the setting of a joint parameter-state estimation problem, both assuming equal inflation coefficients. The derivation in [2, Section 4.2] is sample based, and explicitly shows that one needs to resample the measurement perturbations in each update cycle to obtain the correct posterior. This derivation was later generalized to allow for unequal inflation coefficients [3] in the setting of a parameter estimation problem. The derivation in [2, Section 4.1] utilizes only the Kalman filter equations and linear algebra.
The purpose of Section 2 is to facilitate a unified derivation of the MDA condition (Section 3) and the PMDA condition (Section 4). The main steps in the analysis of the linear-Gaussian case in Sections 2, 3, and 4 follow those in [2, Section 4.1], but in the setting of a parameter estimation problem, and allowing for unequal modification of the data-error covariances in the different update cycles.

Single data vector and model operator
Consider a Bayesian parameter estimation problem with parameter, m ∈ R M , data vector d L ∈ R N L and linear forward operator G L ∈ R N L ×M . Assume that the prior PDF is N m pr , C pr and the data-error PDF is N (0, C L ), where N denotes the multi-Gaussian distribution. It is well known (see, e.g., [16,Chap. 3

Multiple data vectors and model operators
Let {d l ∈ R N l } L l=1 denote a sequence of data vectors with data-error covariance matrices {C l } L l=1 , and {G l ∈ R N l ×M } L l=1 a sequence of linear forward operators, such that d l and G l m denote data and simulated output for entry number l, respectively. Furthermore, let N = L l=1 N l , and define the quantities δ ∈ R N , ∈ R N ×M , and ∈ R N ×N , by Consider a Bayesian parameter estimation problem with forward operator , data vector δ, prior PDF N m pr , C pr , and data-error PDF N (0, ). The posterior PDF for m is then N (m mu , C mu ), where (4) and (5) with (1) and (2), respectively, it is seen that where (7) assumes that (6) is fulfilled. Inserting from (3) into the definitions for F and f gives Jointly, (6), (7), (8), and (9) serve as a common starting point for the sketch of the derivation of the well-known MDA condition, where the data vectors are identical (Section 3), and for the derivation of the novel PMDA condition, where the data vectors are linearly dependent (Section 4).

Multiple data assimilation-MDA condition
In [2] and [3], the authors considered L assimilations of the same data vector with an inflated data-error covariance matrix in each assimilation cycle. In the linear-Gaussian case, this situation can be described by (3) with the following additional definitions, and then inflating the data-error covariances by C l → α l C l , where α l > 1; l ∈ [1, L]. The inflated assembled data-error covariance for MDA can then be written For reasons that will become evident in Section 4.1.3, I have not substituted C L for C l , and I have introduced a superscript on MDA indicating the involved diagonal blocks.
Note that the inflated covariance, α l C l , is the true covariance of the stretched measurement perturbation in d l .

Multiple linearly dependent data vectors-PMDA condition
Consider the case where data vectors as well as forward models are linearly dependent, where B L equals the identity matrix, I L . It follows that Since the data vectors {d l } L l=1 are dependent, use of the entire sequence as data requires some modification of {C l } L l=1 . Inspired by the use of inflated covariance matrices in Section 3, one could replace the expression for C l by α l B l C L B T l . Inserting the latter expression and (15) into (8) and (9) yields From (6), (7), (16), and (17), it follows that C mu = C po and In general, there is, however, no sequence {α l } L l=1 that satisfies this matrix equation.
Replacing the inflation of covariances, C l → α l C l = Since the PMDA condition reduces to the MDA condition when {B l = I L } L l=1 and {A l = α 1/2 l I L } L l=1 , it is a generalization of the latter. The transformed data-error covariance with multiple linearly dependent data vectors becomes where A = diag (A 1 . . . A L ), and where one should recall that {C l = B l C L B T l } L l=1 . Note that the transformed covariance, A l C l A T l , is the true covariance of the transformed measurement perturbation in d l .

Specification of {A
There are more scalar unknowns in the PMDA condition than there are scalar equations, which allows for free specification of some of the unknown before the PMDA condition is invoked to determine the remaining unknowns. In this section, a certain choice will be made regarding which unknowns that will be specified freely. I emphasize that this particular choice, while advantageous for the present exposition, is certainly not mandatory when applying the PMDA condition.

Existence
It is not clear if there exist solutions of (19) that will ensure that all the diagonal blocks in PMDA become proper covariance matrices, that is symmetric and positive semidefinite. It is easy to show that they are symmetric, so the real issue is positive semi-definiteness. Before discussing this issue for the PMDA condition, it is perhaps useful to reiterate the condition ensuring that all the diagonal blocks in MDA become positive semi-definite. To this end, the MDA condition will be expressed in a form dictated by what is advantageous for the discussion of the PMDA condition.

MDA condition
The MDA condition can be reformulated as where λ def = α 1/2 . It is evident that for {λ l C L λ l } L l=1 to be positive semi-definite, {λ l } L l=1 must be real numbers. One may select {λ l } L−1 l=1 such that {λ l C L λ l } L−1 l=1 become positive semi-definite, but then λ L cannot be selected freely since (21) must be satisfied. To ensure that λ L C L λ L becomes positive semi-definite, it is therefore necessary to further restrict the selection of {λ l } L−1 l=1 . Cancelling out C L from (21), it is evident that λ L becomes a real number iff {λ l } L−1 l=1 is a sequence of real numbers fulfilling L−1 l=1 λ −2 l < 1.

PMDA condition
Since B L = I L , (19) can be reformulated as The situation is similar as for the MDA condition in that one can select {A l } L−1 l=1 such that {A l C l A l } L−1 l=1 become positive semi-definite, while A L cannot be selected freely since (22) must be satisfied. But the situation also differs from that of the MDA condition in that it is not possible to cancel out C L from (22). This contributes to making it very challenging to derive a condition that the matrices in {A l } L−1 l=1 should satisfy to ensure that A L C L A T L becomes positive semidefinite. The issue is, of course, important, and researchers applying the PMDA condition will have to consider it for their particular application (i.e., when C L and {B l } L−1 l=1 are fully specified), but it is beyond the scope of this paper.

Computation of A L C L A T L
Defining the matrix sequence {X l } L−1 l=1 by and rearranging the right-hand side of (22), one may write To alleviate the computational burden in a practical application of the PMDA condition to a nonlinear problem where N L is large, it is an option to utilize approximate matrix factorization for the matrix inversions. It is also an option to use truncated Neumann series to approximately evaluate (I L − Q L ) −1 if Q L < 1. (Here · denotes an operator norm.) Obviously, invoking such options will result in only approximately correct sampling in the linear-Gaussian case.
where the matrix sequence {X l } L−1 l=1 is now given by Hence, the first L − 1 diagonal blocks are inflated as for the MDA condition, while the PMDA condition is resolved by calculating block number L. It is perhaps useful to illustrate the simplistic alternative with a small, simple example. To this end, let L = 2, N 1 = 1, N 2 = 2, C 2 = I 2 ∈ R 2×2 , and let B 1 = 1 2 (1, 1), that is, B 1 is the arithmetic averaging operator on R 2 (see Appendix A.2 for why averaging operators are relevant). It follows that which is a proper covariance matrix iff α 1 > 1, that is, if C 1 is inflated. Finally, I will briefly discuss what needs to be considered when specifying {α l } L−1 l=1 for the simplistic alternative in general. Selecting large entries in {α l } L−1 l=1 will be beneficial with respect to the existence issue, since (I L − Q L ) −1 C L ≈ C L for large α l 's. It would also facilitate use of a truncated Neumann series to approximate (I L − Q L ) −1 . Selecting too large α l 's would, however, result in Q L ≈ 0, and so that only d L would influence the estimation results. In that case there would be no point in including {d l } L−1 l=1 and the corresponding forward-model sequence in the modeling. Further investigations into these issues should be conducted for each application separately, and such investigations are beyond the scope of this paper.
Funding information Partial financial support was provided by the finalized research project "4D Seismic History Matching," funded by Eni, Petrobras, the Research Council of Norway (Petromaks 2), and Total EP NORGE, and by the ongoing research project "Assimilating 4D seismic data: big data into big models," funded by Aker BP, Equinor, Lundin, Repsol, the Research Council of Norway (Petromaks 2), and Total EP NORGE.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.

Appendix A: Multilevel data assimilation-motivation
Multilevel data assimilation has two main components; multilevel simulations and multilevel data. I briefly motivate use of multilevel simulations (Appendix A.1) and multilevel data (Appendix A.2), and point out that multilevel data vectors is a special case of linearly dependent data vectors.

A.1 Multilevel simulations
Statistical sampling quality and computational cost of ensemble-based methods both increase with ensemble size. To alleviate negative sampling effects of using a small or moderately large ensemble, localization [11] is routinely applied for field cases. It can, however, be challenging to design proper localization regions for porous-media flow problems. Alternative methods for alleviating effects of sampling errors, which can also be used in conjunction with localization, thereby making localization less challenging, has started to appear. One type of methods performs forward simulations on a single coarsened grid [7,13] or on a multilevel sequence of coarsened temporal [10] or spatial [8,9] grids. Another alternative is to perform most of the forward simulations with a linearized forward model [17]. The main idea behind all these methods is to allow for a substantially larger ensemble size (thereby increasing sampling quality) without increasing the total computational cost. Roughly speaking, numerical accuracy in the forward simulations is traded for statistical accuracy in the final results.

A.2 Multilevel data
Various types of spatial data appear in many areas of geoscience, for example, rainfall and temperature in climate and land-use models and inverted seismic data in reservoir modeling. When spatial data are applied with a numerical simulator, some kind of averaging has already taken place since subgrid variations in the data are not represented. When spatial data are used in conjunction with multilevel simulations, multilevel averaging of the observed data, that is the data vector on the finest level, will take place in order for the averaged data to "fit" the simulated output at grids of multiple scales. Clearly, the elements in the resulting sequence of data at different scales will not be independent. With arithmetic averaging of data from one scale to the next coarser scale, there will be linear dependencies between data vectors at the different scales, and rank (B 1 ) < rank (B 2 ) < · · · < rank (B L ).
The PMDA condition derived in Section 4 (and the MDA condition derived in Section 3) builds on the assumption that is block diagonal (see (3)). Use of the PMDA condition in the context of assimilation of multilevel data therefore requires that the data-error vectors in the different assimilation cycles are uncorrelated. This is the same requirement as with use of the MDA condition when performing multiple data assimilation (see the first paragraph of Section 2).