Correlation effect of transformed or corrected data inversion

Before performing the inversion process, the original measured data set is often transformed (corrected, smoothed, Fourier-transformed, interpolated etc.). These preliminary transformations may make the original (statistically independent) noisy measurement data correlated. The noise correlation on transformed data must be taken into account in the parameter fitting procedure (inversion) by proper derivation of likelihood function. The covariance matrix of transformed data system is no longer diagonal, so the likelihood based metrics, which determines the fitting process is also changed as well as the results of inversion. In the practice, these changes are often neglected using the “customary” estimation procedure (simple least square method) resulting wrong uncertainty estimation and sometimes biased results. In this article the consequence of neglected correlation is studied and discussed by decomposing the inversion functional to “customary” and additional part which represents the effect of correlation. The ratio of two components demonstrates the importance and justification of the inversion method modification.


Introduction-source of correlation
Transformation or correction of the original data system (y) may have an effect on the error model as the error component is also transformed. As a consequence of preliminary transformation the statistically independent measurement data may become more or less correlated. This correlation may have two basic reasons: 1. The transformation (e.g. correction) function may incorporate random variable(s), therefore all the transformed data will take in these variable(s) which will be the origin of the correlation. (E.g. Bouguer correction of gravity data embodies the Bouguer plate density or static correction of seismic data contains the equivalent upper layer velocity, shale effect correction in well logging interpretation requires the shale parameters etc.) 1 3 2. During the transformation more original data point are used for the calculation of one transformed data (convolution, DFT, interpolation etc.), hence the elements of transformed system also has a statistically common parts which also results data correlation.
Eventually the statistically common part causes the correlation between the elements of transformed data (z). The degree of this effect also depends on the applied error model. In this study the most widely-used error model was applied and examined: the zero mean, additive Gaussian noise (Δy ∈ N(0, σ y )). Hence the measured and transformed data vectors: where s is the noiseless signal vector, and T(.) denotes the transformation. In general, this is a linear transformation in most of customary preliminary data processing steps.
The purpose of this paper is the analysis of the correlation effect in the inversion, which often appears on the preprocessed data, changing the covariance structure.

Correlated data inversion
The maximum likelihood or the Bayesian approach (if used consistently) automatically answers the data correlation problem. The joint probability density function (f) cannot be written in a form of product, as can be done in the case of independent data system. (Tarantola 1987;Szatmáry 2002) For the correlated zero mean Gaussian noise the well known likelihood-function is: where the C Δz is the covariance matrix of correlated Gaussian random vector variable z. The (z − z 0 ) is the error term (Δz) for vector z and z 0 is the centre of distribution; k is the dimension of measurement space. Here, T denotes the transpose of vectors and |C Δz | is the determinant of covariance matrix. The function in Eq. (3) is a conditional probability density function according to the parameters (p) and the error covariance matrix.
The quadratic form with inverse of the covariance matrix in the exponent will be the basis of metrics for the parameter fitting (the well-known Mahalanobis distance) (Mahalanobis 1936). The nonzero non diagonal elements in C Δz (normalized by standard deviations) indicate the measure of random dependence between the preprocessed random variables (elements of z). This matrix can be transformed into correlation matrix to study the correlation.

Case of correction
In the data evaluation process, the model of whole direct problem is often divided into two parts and the effect of geologically uninterested part is separated as a correction (Bouguer correction, seismic static correction etc.) in order to enhance the essential information. Sometimes, at the correction some random parameter is involved (like Bouguer plate density, upper zone velocity etc.) and after the correction, these random variables are built in the new corrected data set as a statistically common part (causing correlation on error term of corrected data).

Effect of subtractive correction
With a simple subtractive correction on a 1D noisy data system (y) will be demonstrated the effect of correlation resulting from a correction. The corrected data points are: The c constant represents all non random multipliers, μ is the equivalent petrophysical properties for the correction (like the Bouguer plate density, eqivalant seismic velocity above the datum level etc.), h is the geometric parameter (e.g. thicknes of the layer) appears in the correction.
The error part of corrected data set (Δz) is composite, containing the independent measurement error vector (Δy), the error of correction parameter (Δμ) and the error of layer thicknesses (Δh). The error model is additive for all variables with the associated centers of distributions (s i , h 0,i , μ 0 ): Then the detailed error term (Δz) of the corrected data vector (z) element is the following: The last three error terms appear as a consequence of correction. To frame the covariance matrix with expectation value of Δz vector dyadics, the different error terms in Eq. (5) are considered independent and zero mean, so: After taking the expectation value we get the covariance matrix of corrected data: Therefore the covariance matrix of corrected data is not diagonal because the second term of Eq. (7), which has dyadic form. The diagonal elements are also modified because the 3rd term of Eq. (7). To derive the functional to be minimized in the inversion, the inverse of C Δz is needed for the calculation of Mahalanobis distance.
This inverse-which is the metric tensor in the inversion functional-can be calculated by using Sherman-Morrison theorem. (Sherman and Morrison 1949) This theorem is for the derivation of inverse of special matrices which can be decomposed to the diagonal and dyadic part. By this theorem the metric tensor for Mahalanobis distance will be: where I is the identity matrix. The inverse of original y vector covariance was: The importance of correlation correction [second part in Eq. (8)] is characterized by the following weight which depends on the ratio of error term variances: The decomposition derived above [Eq. (8)], shows how the diagonal variances is blurred, and how coupled the different measuring points in the exponent of formula 3.

Effect of correction, which is nonlinear function of random parameter
If the form of correction is not linear, the modified covariance matrix (C Δz ) can be approximated by linearization. The corrected data vector elements in this case: where g(h i ,μ) is the correction function. (E.g. the static correction in seismic preprocessing has this form, because it depends on the reciprocal of the upper zone velocity) Expressing Eq. (11) with the associated error parts: After taking the expectation value of the ΔzΔz T dyadics in the first order approximation [Eq. (12)], the C Δz will be: where the elements of C 1 diagonal matrix and u vector are the following: where δ is the Kronecker symbol. Using the Sherman-Morrison equation again, the inverse of C Δz : The second part of Eq. (15) expresses the random dependence of corrected variables which may modify the results of inversion.

Effect of product type correction
Sometimes the correction has a product form, such as the seismic amplitude correction or the multiplication by sonde coefficient to transform the raw measured data into apparent values etc.
In the case of product type correction with random parameter, the form of corrected data vector element is the following: where (g(μ)) is now the correction function (with random parameter). The covariance matrix of the error part: Using the Sherman-Morrison formula to determine the inverse matrix: This inverse matrix is also separated into diagonal and dyadic part like before. During the estimation procedure s vector can be approximated by y vector.

Case of linear transformation
Typical preliminary phase operation in the data processing is the convolutional filtering or filtering in spectral range after Discrete Fourier Transformation (DFT). In general, the linear transformation can be represented with a transformation matrix (F). The transformed data vector: The covariance matrix of data vector z which was obtained by linear transformation of originally independent (and uniform variance) data (y): The structure of matrix FF T determines the covariance structure which reflects the possible statistical dependence of vector z elements. Substituting the C Δz [Eq. (20)] into the Mahalanobis distance formula: It can be seen [Eq. (21)], that using the proper Mahalanobis metrics (taking into account the possible data correlation), then the criteria of parameter fitting doesn't changed. So the parameter variances (estimated in the inversion) also remain unchanged. It shows, that the applied linear transformation doesn't increase the information content of dataset and may not reduce the Cramér-Rao bound for the estimated parameter variance (Cramér 1946;Rao 1945).
If the "traditional" simple least square estimation had been used-neglecting the correlation effect-we would get a lower parameter variance violating the Cramér-Rao bound. (While, the proper maximum likelihood estimation is known to be asymptotically efficient, that is the variance of estimated parameter reaches the Cramer-Rao bound asymptotically).

Covariance of filtered data
During the preliminary data processing the convolutional filtering is widely applied for noise suppression or extraction of low frequency trend or high frequency local effect etc. This procedure may also be the source of noise correlation since the filtered data elements are the composition of more raw measurement data, hence their error are inherited. In the case of additive noise, the filtered data (z) and their decomposition can be expressed: were w is the vector of convolution weight in the 2 N + 1 points length filtering window. In the simpler matrix form (matrix W was made up of filter weights w): The noise covariance matrix [in a same form as in Eq. (20)]: In this case the C Δz is a band matrix, which may have nonzero non-diagonal elements. The diagonal elements of C Δy are partially distributed among the non-diagonal elements according to the w vector. Therefore the trace of covariance matrix which represents the overall variances of data vector decreases as a consequence of filtering (compared to the raw data system variance).
In the second part, the covariance structure of linearly transformed data was examined, studying the consequence of transformation.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.