Accounting for model errors in iterative ensemble smoothers

In the strong-constraint formulation of the history-matching problem, we assume that all the model errors relate to a selection of uncertain model input parameters. One does not account for additional model errors that could result from, e.g., excluded uncertain parameters, neglected physics in the model formulation, the use of an approximate model forcing, or discretization errors resulting from numerical approximations. If parameters with significant uncertainties are unaccounted for, there is a risk for an unphysical update, of some uncertain parameters, that compensates for errors in the omitted parameters. This paper gives the theoretical foundation for introducing model errors in ensemble methods for history matching. In particular, we explain procedures for practically including model errors in iterative ensemble smoothers like ESMDA and IES. Also, we demonstrate the impact of adding (or neglecting) model errors in the parameter-estimation problem.


Introduction
It is standard to assume the model to be perfect when we use ensemble smoothers for solving inverse problems. This paper addresses the problem of consistently including the additional effect of stochastic model errors in different ensemble smoothers. In particular, we consider methods such as Ensemble Smoother (ES) (Evensen and van Leeuwen, 1996;Evensen, 2009a), Iterative ES (IES) Oliver, 2012, 2013), and Ensemble Smoother with Multiple Data Assimilations (ESMDA) (Emerick and Reynolds, 2013).
There is a vast literature on solving the data-assimilation problem in the presence of model errors (see e.g., the reviews by Carrassi and Vannitsem, 2016;Harlim, 2017). We traditionally characterize the data-assimilation problem as being either a weak-constraint or a strong-constraint problem, dependent on whether we include the dynamical model as a strong or a weak constraint in the cost function. The classical book by Bennett (1992) is entirely devoted to solving the weak-constraint inverse problem, and it illustrates that in the weak-constraint case we need to simultaneously invert for the state vector as a function of both space and time.
In Eknes and Evensen (1997) the variational formulation was solved for a weak-constraint parameter and state estimation problem using the representer method by Bennett (1992), and in Evensen (2003) different methods including ensemble methods were used to solve a weak-constraint state-and parameter-estimation problem. A conclusion from these works is that, if model errors are present, we need to increase the dimension of the problem by either updating the model solution as a function of space and time or by estimating the actual model errors and after that solve for the model solution. It is also interesting to realize that the simultaneous state-parameter estimation problem becomes nonlinear even if the model itself is linear (except for in a very trivial case), see for instance Evensen (2003). Also, when using Ensemble Kalman Filter (EnKF) by Evensen (1994) or ES, it is relatively easy to include stochastic model errors as long as we update both the parameters and the state variables simultaneously (Evensen, 2003(Evensen, , 2009b. At this point, we should mention that we do not distinguish between model errors and model bias. In fact, with time-correlated stochastic model errors, and if the correlation becomes perfect (equal to one), then the model errors become equivalent to a constant bias. Fortunately, the procedure outlined in this paper can be used to estimate both model errors and model bias.
In a recent paper by Sakov et al (2018), the Iterative EnKF was reformulated to allow for additive model errors. However, for the history-matching problem, we need to account for more general representations of the error term since the solution of the reservoir simulator depends nonlinearly on the errors in the rate data used to force the model.
We will start by defining the standard strong-constraint history-matching problem, and after that, we move on to the general weak-constraint formulation. We then formally derive the ES, ESMDA, and IES, in the presence of model errors. The different smoother methods are used in a simple example to demonstrate the consistency of the formulation used and to illustrate the impact of model errors on the parameter-estimation problem.

Standard history-matching problem
The strong-constraint formulation given by Evensen (2018) is attractive because it simplifies the inverse problem and efficient ensemble smoothers can be defined. A first fundamental assumption is that we have a perfect deterministic forward model where the prediction y only depends on the input model parametrization x, y = g(x). (1) In a reservoir history matching problem, the model operator is the reservoir simulation model, which predicts the observed production of oil, water, and gas, from the reservoir. Thus, given the true parametrization of x, the true prediction of y is precisely determined by the model in Eq. (1). Also, we have measurements d of the true y given as From evaluating the model operator g(x), given a realization of the uncertain model parameters x ∈ ℜ n , we uniquely determine a realization of predicted measurements y ∈ ℜ m (corresponding to the real measurements d ∈ ℜ m ). Here n is the number of parameters and m the number of measurements, and we want to use the measurements to estimate the parameters x. The measurements are assumed to contain random errors e ∈ ℜ m . In history matching, it is common to define a prior distribution for the uncertain parameters since we usually will have more degrees of freedom in the parameters, than we have independent information in the measurements. Bayes' theorem with a perfect model gives the joint posterior pdf for x and y as In the case of no model errors, the transition density f (y | x) becomes the Dirac delta function, and we can write We are interested in the marginal pdf for x, which we obtain by integrating Eq. (4) over y, giving When introducing the normal priors we can write Eq. (5) as Note that the posterior pdf in Eq. (8) is non-Gaussian due to the nonlinear model g(x). Maximizing f (x | d) is equivalent to minimizing the cost function Most methods for history matching apply the assumptions of a perfect model and Gaussian priors, and they attempt to solve either one of Eqs. (8) or (9). For this strong-constraint problem, Evensen (2018) explained how Eqs. (8) or (9) can be approximately solved using the ES, ESMDA, and IES. We can interpret these methods to approximately sample the posterior pdf in Eq. (8), and we can easily derive them as an ensemble of minimizing solutions of the cost function in Eq. (9) written for each realization as This approach relates to the papers on Ensemble Randomized Likelihood (EnRML) (Oliver et al, 1996;Kitanidis, 1995). Note that the minimizing solutions will not precisely sample the posterior non-Gaussian distribution.

ES solution
We obtain the ES solution by first sampling the prior parameters x f j and the perturbed measurements d j from We then compute the model predictions y f j from We define the covariance matrix between two vectors x and y as the expectation while in the ensemble methods we introduce the sample mean and the sample covariance between two arbitrary vectors x and y as Evensen (2018) derived the ES update equations for the parameters x a j and the predictions y f j as but see also the alternative derivation in Sec. 3.3. Thus, the ES solution is computed starting from an initial ensemble of parameters x f j and perturbed measurements d j . First, a prior ensemble prediction y f j is generated by evaluating the model in Eq. (13) for each realization x f j . Then the prior ensemble covariance between the parameters and the predicted measurements C xy ∈ ℜ n×m and the ensemble covariance of the predicted measurements C yy ∈ ℜ m×m are computed and used in the ES update in Eq. (17). Finally, the model prediction is recomputed using the updated parameters x a j in Eq. (18). As an alternative to solving the model in Eq. (18) for y a j , it is possible to compute the updated prediction directly from an update equation and in the case with a linear model, the result would be identical to solving Eq. (18). However, due to the nonlinearity of the deterministic model, the Eqs. (18) and (19) will give different results for y a j . The Eq. (19) is just the standard ES update y a j given the prior forecast ensemble for y f j and the perturbed measurements d j of y. We will later show that, for nonlinear models, there may be a benefit of computing y a j indirectly through integration of the model in Eq. (18), initialized with x a j , rather than using the direct update in Eq. (19). Also, the indirect update in Eq. (18) allows for the use of iterations.

General weak constraint problem
Lets now look at a formulation where we assume that the model depends nonlinearly on the model errors q as well as the parameters x, i.e., y = g(x, q).
The model operator can be somewhat general, including a recursion or time steps, and the model error is a vector of noise components that could represent, e.g., the time-correlated noise in rate data used to force a reservoir simulation model over a specific period. Thus, there is a significant difference between x and q. Firstly, x is a static parameter that does not change with time. Thus, once the parameters x are estimated, we can use them in a future prediction simulation. The model errors q, on the other hand, will vary in time and are only estimated for the period of the inverse calculation. Only in the case of time-correlated errors can the estimated q be used to some extent as a forcing in the future prediction.
We assume that we have prior pdfs for the model errors and the parameters f (x) and f (q). It is common, but not necessary, to assume that the model errors and parameters are independent so we can write the joint prior pdf as f (x, q) = f (x) f (q). The transition density for the model evolution is and we obtain the joint pdf by multiplying the prior with the transition density, The likelihood for the measuerments is f (d|y), thus, the posterior conditional pdf becomes Since y is given by the model in Eq. (20) as soon as we know x and q, we can compute the marginal density

Gaussian priors
We now assume Gaussian priors and where q f would normally be zero. The likelihood for the measurements becomes and we write the marginal posterior as Maximizing the posterior pdf in Eq. (28) is equivalent to minimizing the cost function As in the strong constraint case we sample the priors and define a cost function for each sample realization, and we obtain the weak constraint analog to the strong-constraint cost function in Eq. (10), i.e.,

Stationarity condition
To develop a consistent set of equations for the different methods, it is simpler to rewrite the cost function for an augmented variable z T = (x T , q T ). We can then define the covariance where we allow for correlations between x and q since this correlation becomes important in the iterative methods, and we can rewrite the cost function in Eq. (30) as This cost function is slightly more general than the one in Eq. (30) since we do not require independence betweeen q and x. We will now derive the minimizing solution of this cost function. We start by requiring that the variation of the cost function with respect to z is of order δ z 2 , By evaluating this expression and using the Taylor expansion we obtain the following closed system of nonlinear equations for z In the case of a linear model, the solution is the standard Kalman filter update equation for the mean. However, the presence of the nonlinear function g(z) makes the solution more elaborate. All of ES, ESMDA and IES are developed to solve this system of equations.

Derivation of ES update equations
To derive the ES solution, we will use another Taylor expansion and approximation which allows us to obtain a closed form solution for each realization of z j , i.e., where we have defined Thus, we approximate the nonlinear function g(z j ) with its linearization in Eq. (34) around z = z f j and in addition evaluate the gradient in Eq. (35) at z f j . We then obtain By solving for z j and using the matrix identity which can be derived from the Woodbury identity, where we substitute G z f j for G, C zz for C, and C dd for D, we obtain the solution for z j as These equations still include the gradient of the model, G z f j , and we wish to replace this term with sample covariance matrices.
In Evensen (2018) we used a Taylor expansion of g(z j ) around the ensemble mean z = z f and we could then replace the analytical gradients evaluated at z j with the gradient G z evaluated at the ensemble mean z. We then showed that C zy ≈ G z C zz (Evensen, 2018, Eq. 19) and we replaced the analytical gradients with ensemble covariances.
We will here use an interpretation based on linear regression (see also Reynolds et al, 2006;Chen and Oliver, 2013) where we start by defining Thus, we view G z as the sensitivity matrix in a linear regression between y and z as When we introduce the ensemble-anomaly matrices we can write Eq. (42) as where the approximation is the use of a finite ensemble size. From Eq. (41) we have where the superscript + denote pseudo inverse. The unbiasedness of G can be shown, i.e., with G z being defined in (41). Also, we have If we approximate the update equation (40) with the individual gradients G z f j replaced with the gradient G z we have Eq. (40) on the form Thus, we can then replace the gradients G z in the update equation (50), using Eqs. (42) and (49) approximated using sample covariance matrices. We then obtain the update equation for the finite ensemble of N realizations as Interestingly, from Eq. (49) we have when defining G T z = G T x G T q and the ES update equation (51) is still valid in the case of dependencies between x and q. This validity is essential when we use the iterative methods since the updates introduce dependencies between x and q even when the priors are independent.

ES algorithm
To summarize, we start by sampling the Gaussian prior variables for the parameters x f j , the model errors q f j , and the measurement perturbations e j , The vector q contains all stochastic model errors over the time interval of the model integration, and the errors can also have correlations in time. We obtain the model prediction from the model written on the form where the model operator depends nonlinearly on the model-error term. Next, we can compute the sample covariances C f yy , C f xy , and C f qy , from the ensembles of y f j , x f j , and q f j . Eq. (51) defines the final update equations for x a j and q a j which becomes We then rerun the model using the updates x a j and q a j to get Alternatively, we can also compute the update of the predicted measurements from and in the case of a linear model the result would be identical to that obtained by integrating the model in Eq. (59).

Iterative smoothers in the presence of model errors
The critical approximations used in the derivation of ES are, firstly, the linearization in Eq. (36) of the model about x f j meaning that large updates will have large errors, and secondly, that an averaged statistical ensemble gradient replaces the exact analytic gradient. Only a single linear update step is computed, and with strong nonlinearities, these approximations may lead to poor results as was discussed in Evensen (2018).
The minimization problem in Eq. (10) can be solved using iterative methods like IES, ESMDA, and IEnKF. The iterative ensemble smoother (IES) by Oliver (2012, 2013)  IEnKF solves the same kind of problem as given by the marginal conditional pdf in Eq. (5) or the cost function in Eq. (9). In a recent paper by Sakov et al (2018) the IEnKF was extended to account for additive model errors, and the method should also be applicable for the history matching problem in the presence of additive model errors.
In the following, we will present variants of ESMDA and IES that take more general model errors into account as is required when solving the weak constraint history-matching problem.

ESMDA
As explained in Evensen (2018), ESMDA solves the standard ES update equations using a predefined number of recursive steps. In each step, the measurement error covariance and associated measurement perturbations are inflated to reduce the impact of the measurements. With correctly chosen inflation factors and a linear model and observation operators, the ESMDA update precisely replicates the ES update. When the model or observation operators are nonlinear, it turns out that the use of multiple short update steps reduces the errors and improves the solution as compared to using one long update step in ES.
From the previous discussion, it is clear that, in the presence of model errors, we need to recursively update both the parameters and the model errors. It is easiest to derive ESMDA by using a tempering of the likelihood function (Neal, 1996) which leads to a recursive minimization of a sequence of N mda cost functions, (Stordal and Elsheikh, 2015;Evensen, 2018), where we evaluate C i zz at the ith iterate z i , and we must have Similarly to the derivation in the ES in the previous section, we obtain the recursive update equations for ESMDA given by Eqs. (67) and (68) in the algorithm below. As in ES, the update direction is computed based on a linearization around the prior realizations of each update step. Thus, we can interpret the ES update as taking one long Euler step of length ∆τ = 1 in pseudo time τ, while in ESMDA we take a predefined number of shorter Euler steps of step length ∆τ i = 1/α i that satisfy Eq. (62).

ESMDA algorithm
We start by sampling the initial ensembles from Eqs. (53) and (54) to initialize the recursion in ESMDA Then the model is integrated according to Eq. (56) to obtain the prior ensemble prediction for the first ESMDA step, (i.e., i = 1), and we compute recursively the following for each iteration i = 1, . . . , N mda : We construct the sample covariances C i yy , C i xy , and C i qy , from the ensembles of y j,i , x j,i , and q j,i , and we sample the measurement perturbations e j,i ∼ N (0, α i C dd ).
We then compute the updates and rerun the model to obtain the updated prediction for step i. We repeat this procedure until i = N mda which results in the ESMDA solution for x j , q j and y j . Note that we must rerun the model ensemble in each iteration, and not only update y j,i+1 directly using an update equation similar to the Eq. (60), which would lead to the same result as the non-iterative ES.

IES
In IES we use a gradient based minimization method, and we need to evaluate the first and second order derivatives of the cost function in Eq. (32) with respect to z. The gradient of the cost function in Eq. (32) is already derived above as Eq. (35) An approximation to the Hessian of the cost function is obtained by operating again by ∇ z on the gradient in Eq. (70) to obtain where we have neglected the second derivatives or Hessian of the vector function g(z), i.e., ∇ z ∇ z g(z). We can then write a Gauss-Newton iteration where we define ∆z as the gradient normalized by the approximate Hessian as follows as the gradient of the model, evaluated at iteration i and for ensemble member j. The Eqs. (72) and (73) defines the Ensemble Randomized Likelihood method (Kitanidis, 1995;Oliver et al, 1996). Since we are not computing the analytical gradient of the model we will need to aproximate the ensemble of gradients with an averaged gradient like G from Eq. (47), or we can evaluate the gradient at the ensemble average for the local iterate G z as was explained by Evensen (2018). The same model gradient is now used for all realizations, and this leads to a different solution than the solution of the originally posed problem.
Eq. (73) is exactly Eq. (2) in Chen and Oliver (2013). Also, Chen and Oliver (2013) suggested using the state covariance in the Hessian evaluated at the local iterate to simplify further computations, since changing the Hessian does not change the gradient and thus the final converged solution (although it changes the step lengths in each iteration).
Thus, we can rewrite Eq. (73) with the averaged model gradient and introduce the state covariance for the local iterate in the Hessian, to obtain Then using the corollaries which are derived from the Woodbury identity, we can write Eq. (75) as Now, from Eqs. (42) and (49) The numerical solution method for this equation is discussed in more detail by Chen and Oliver (2013). It is clear that it is the introduction of low-rank ensemble representations of the covariances that makes it possible to compute the update steps ∆z j,i , and the computation requires the use of singular-value decompositions and pseudo inversions as is explained in the Appendices A and B.

Examples
To verify the new algorithms, we will use the scalar example from Evensen (2018). The example resembles the use of conditioning methods in history matching, i.e., there is a parameter x that serves as an input to a forward model to predict y = g(x, q). We assume an initial state x and a prediction y given by the model Here β is a parameter that determines the nonlinearity of the model. In the current example, we have used β = 0.0 for the linear cases and β = 0.2 for the nonlinear cases. Clearly, in this case the model error is additive to make the linear case completely linear. If we have a product of x and q, the problem becomes nonlinear, and we can not test if the different methods converge to the same solution in the entirely linear case where the convergence constitutes our proof of consistency. The model error q is a random variable sampled from N (0,C qq ) with C qq = 0.0 in the case with no model errors and C qq = 0.25 in the case including model errors.
In all the four cases we sample the prior ensemble for x from a Gaussian distribution with mean x f = 1 and variance C xx = 1, and the observation of y is sampled from a Gaussian error distribution with mean d = −1 and variance C dd = 1. Thus, in the current example, x represents the initial state or the model parameter, while y is the predicted observation. The goal is to estimate x given a measurement of y and then to recompute the correct prediction of y subject to model errors consistently with Bayes theorem.
In this example, we use a sufficiently large number of samples, i.e., 10 7 , to generate accurate estimates of the probability density functions and this allows us to work directly with the pdfs and to examine the converged solutions of the methods.  Figure 1. The upper row is the case with zero model error, while the lower row shows the result when we include a model error with standard deviation equal to 0.5. In all the plots, the legends, e.g., MDA 7 004 0.5 denote ESMDA with 10 7 ensemble members, 4 MDA steps, and a model error with variance equal to 0.5. Note that, in the linear case, all the methods give results that are identical to the Bayesian posterior, and the posterior pdfs are indistinguishable.

Results from the linear case
In Figs. 1 and 2 we show the results from the linear cases without and with model errors. In Fig. 1 we plot the joint pdfs for the prior and the updated solutions, and in Fig. 2 we plot the corresponding marginal pdfs.
The joint pdfs illustrate the effect of including stochastic model errors. Without model errors, there is a unique mapping from x to y, and the pdf is zero except along the curve (or line in the linear case) defined by the model function y = g(x). The prior joint pdf has a maximum value located at (x, y) = (1, 1) while the posterior joint pdf has shifted the maximum value to (x, y) = (0, 0) for all the methods. When we introduce the model errors, we notice that we obtain a stronger update in y and weaker update in x, than in the case without model errors. Still, we observe that all the smoother methods give a result that is identical to the Bayesian update. We can better visualize these results when we examine the marginal pdfs plotted in Fig. 2.
In the case without model errors, we see that the prediction pdf for y and the measurement pdf have the same variance and only differ in the value of the means. The measurement is at y = −1 while the mean prediction is located at y = 1. The update from ES, ESMDA, and IES, exactly matches the Bayesian update in this case and is centered between the measurement and prediction pdf as we would expect. When we include model errors, the effect is that the prediction gets a higher variance, although the mean is the same (in this particular case). Due to the higher variance, we give more weight to the measurement in the update and the update for y is stronger than in the case without model errors. On the other hand, the update for x is weaker in this case, since the addition of model errors reduces the correlation between the predicted measurement and the prior.
So, how can the update for y be shifted towards the observation in this case? Afterall, we compute y as a prediction from x. Here the inclusion of the model errors in the inversion plays a vital role. We simultaneously update the ensemble for x and the estimate of the model errors q. In Fig. 3 we see how we shift the model errors towards negative values. Thus, when we integrate the model forward from the updated x, the forcing from q compensates for the weaker update of x and also the additional shift of y towards the measured value.
This example illustrates how model errors impact the updates of x and y as well as how we also need to include the model errors as a parameter in the estimation and then use it in the prediction to obtain the correct estimate of y. Finally, we also demonstrate that in the linear case with and without model errors, ES, IES, and ESMDA, all converge From the joint pdfs, we notice that the various smoother methods give different results both with and without the inclusion of model errors, although the general shape and locations of the pdfs are reasonably consistent with the theoretical solution as given by Bayes theorem.
We get a clearer picture from the marginal pdfs in Fig. 5. As for the linear case, we get a weaker update of x and a stronger update of y. We also notice that the introduction of model errors is handled well by the iterative methods, and the results are somewhat better and more consistent with the theoretical solution than in the case without model errors. ES is still the poorest estimator, and the iterative smoothers provide a significant improvement in the estimate, also in the case including model errors. We show the corresponding updates of the model error q in Fig. 6 and the different smoother methods all give slightly different results.
The dashed green line in the plots for y in Fig. 5 is the direct ES update of y using the predicted ensemble for y and the measurement. It is clear that the update of x followed by an integration of the model to obtain y gives a better result than a direct update of y. Furthermore, the additional use of iterations improves the estimate of y even further. This result is the motivation for introducing IEnKF in sequential data assimilation (Sakov et al, 2012(Sakov et al, , 2018 and also the iterative smoother by Sakov (2013, 2014). In history matching, we are primarely interested in estimating x, and the prediction y is just the result given the model parameters in x. But also in history matching, the ultimate value comes from accurate predictions of y.

Summary
In this paper, we have given consistent formulations of iterative ensemble smoothers when we include model errors. We demonstrate the consistency by showing that the ensemble smoothers all converge to the Bayesian posterior in the linear case. The main result is that the model errors need to be treated as another set of unknown parameters and estimated in the same way as the input parameters to the model. The proposed approach allows for the inclusion of general nonlinear errors that can be both red and white in time. Thus, we can apply the methods for history matching  Figure 4. The upper row is the case with zero model error, while the lower row shows the result when we include a model error with standard deviation equal to 0.5. In all the plots, the legends, e.g., MDA 7 004 0.5 denote ESMDA with 10 7 ensemble members, 4 MDA steps, and a model error with variance equal to 0.5. The legends ES 7 0.0D and ES 7 0.5D refers to an ES case where the prediction y a j is updated directly by Eq. (60).
of reservoir models forced with uncertain rate data having time-correlated errors, as well as an iterative EnKF in sequential data assimilation with general model-error terms.
We demonstrate that the inclusion of model errors leads to a weaker update of the input parameters to the model, but a stronger update of the measured model prediction. Vice verse, the negligence of model errors that should be present, will lead to a too substantial update of the model input parameters with an associated underestimated uncertainty and a too weak update of the prediction.
Thus, the results open for a more consistent solution of the history-matching problem, given that significant model errors are neglected in all previous history-matching applications with iterative ensemble smoothers.

A IES gradient
It is instructive to write out the terms in the IES gradient in Eq. (79) for x and q. We have used the notation for the prior covariance as C zz = C 0 zz = C f zz and similarly for C xx and C qq . The two terms in the gradient in Eq. (75) become Numerically, it makes sense to invert the prior covariances C qq and C xx separately rather than inverting C zz . For the scalar model used in the example, we just replace all matrices and vectors with corresponding scalars in the previous equations.

B Pseudo inversion
The pseudo inverse of a matrix C xx = X (X ) T /(N − 1) where X ∈ ℜ n×N is the matrix of N realizations of ensemble anamolies, is best computed as follows We have used the sigular value decomposition X = UΣV T where U ∈ ℜ n×N is the left singular vectors, V ∈ ℜ N×N contains the right singular vectors, and Σ ∈ ℜ N×N is the diagonal matrix of a maximum of N − 1 positive singular values. In theory U has n columns, but since we have less than N non-zero singular values we only store the first N singular vectors. The product ΣΣ T ∈ ℜ N×N is then a diagonal matrix containing the singular values squared (i.e., the eigenvalues of C xx ) on the diagonal. The computational cost for the SVD when N singular vectors are retained is O(nN 2 ). The pseudo inverse can then be written as where the first p < N significant singular vectors are retained on the diagonal. Thus, the products involving the inverse C −1 xx = U x Σ x Σ T x −1 U T x and C −1 qq = U q Σ q Σ T q −1 U T q are then written as The other terms involving either C −1 xx or C −1 qq are treated similarly. The computational cost of Eq. (85) is, when we define x ∈ ℜ n x , q ∈ ℜ n q and y ∈ ℜ m X 3 ∈ ℜ n x ×N n x N 2 X 4 = (X ) T X 3 X 4 ∈ ℜ N×N n x N 2 X 5 = (Y )X 4 X 5 ∈ ℜ m×N mN 2 Thus, there are no computaions larger than O(max(m, n x , n q )N 2 ). The inverse (C i yy + C dd ) −1 can be computed using the subspace pseudo inversion algorithm from Evensen (2009b, Chap. 14).