Sequential multilevel assimilation of inverted seismic data

We consider estimation of absolute permeability from inverted seismic data. Large amounts of simultaneous data, such as inverted seismic data, enhance the negative effects of Monte Carlo errors in ensemble-based Data Assimilation (DA). Multilevel (ML) models consist of a selection of models with different fidelities. Multilevel Data Assimilation (MLDA) attempts to obtain a better statistical accuracy with a small sacrifice of the numerical accuracy. Spatial grid coarsening is one way of generating an ML model. It has been shown that coarsening the spatial grid results in a problem with weaker nonlinearity, and hence, in a less challenging problem than the problem on the original fine grid. Accordingly, formulating a sequential MLDA algorithm which uses the coarser models in the first steps of the DA, followed by the finer models, helps to find an approximation to the solution of the inverse problem at the first steps and gradually converge to the solution. We present two variants of a sequential MLDA algorithm and compare their performance with both conventional DA algorithms and a simultaneous (i.e., using all the models on the different grids simultaneously) MLDA algorithm using numerical experiments. Both posterior parameters and posterior model forecasts are compared qualitatively and quantitatively. The results from numerical experiments suggest that all MLDA algorithms generally perform better than the conventional DA algorithms. In estimation of the posterior parameter fields, the simultaneous MLDA algorithm and one of the variants of sequential MLDA (SMLES-H) perform similarly and slightly better than the other variant (SMLES-S). While in estimation of the posterior model forecasts, SMLES-S clearly performs better than both the simultaneous MLDA algorithm and SMLES-H.


Introduction
Sound decision making in petroleum reservoir management depends on reliable production forecasts from reservoir models, including accurate estimates of uncertainty in the forecasts. The reliability is increased by utilization of available data for calibration of the models-through the process known as history-matching. Ensemble-based Data Assimilation (DA) methods, using statistically correct formulations, have become popular for automated reservoir Mohammad Nezhadali monezhadali@gmail.com 1 NORCE Norwegian Research Center, Nygårdsgaten 112, 5008, Bergen, Norway 2 University of Bergen, Natural Science Building, Allégaten 41, 5007, Bergen, Norway history-matching [1][2][3][4][5][6][7]. We consider estimation of absolute permeability from inverted seismic data.
Ensemble-based methods have limited degree of freedom, which in conjunction with massive amounts of data, e.g. time-lapse seismic data, can result in over-fitting. There have been several efforts to balance the degrees of freedom of the problem and the information content in the data, including use of localization [8], reduction of data using machine learning techniques [9,10], reduction in data size using the correlation between the data and wells' cumulative production [11], sparse representation of data using a wavelet transform [12], assimilation of only the saturation front or transformation of the data into position of fluid fronts [13][14][15], combination of coarsening the data and coarse model simulations [16], and projection of data into ensemble subspace in combination with local analysis [17].
Monte Carlo approximations play a crucial role in ensemble-based DA. Due to computational-cost limitations, the ensemble size is limited to roughly one hundred. Using straightforward ensemble-based DA, the degrees of freedom of the problem would equal the ensemble size, and such an approach would result in significant Monte Carlo errors. The negative effects of Monte Carlo errors are enlarged in the presence of large amounts of simultaneous data, such as inverted seismic data, resulting in underestimation of variance, and in more severe cases ensemble collapse.
The most widely used treatment for Monte Carlo errors is distance-based localization [18]. The basic assumption underlying distance-based localization is that true correlations between a parameter and a datum decrease when the distance between their respective locations increases, and disappears if the distance exceeds a critical distance. This assumption may not always hold for subsurface problems. Different localization functions and their utilization in DA can be found in [19][20][21]. A proper choice of localization function, and the critical distance in particular, depends on parameter and data types as well as on other problem settings. This reduces the robustness of distance-based localization, also for problems where its basic assumption does hold. Papers using ensemblebased methods for assimilation of seismic data [22][23][24], typically use localization methodologies developed originally for meteorological science which were later adapted to petroleum problems with production data.
Simply increasing the ensemble size will reduce Monte-Carlo errors, but it will also increase the computational cost. Utilization of a lower-cost and lower-fidelity model renders the possibility of increasing the ensemble size without increasing the total computational cost. Use of a lower-fidelity reservoir model will, however, introduce modeling errors in addition to those already present in conventional-fidelity simulation results. The underlying assumption when applying lower-fidelity models in DA is therefore that the gain in reducing Monte Carlo errors is larger than the loss in numerical simulation accuracy. DA using various types of lower-fidelity models has been applied to several inverse problems, e.g., within petroleum reservoir modeling [25][26][27] and atmospheric science [28]. Note that since lower-fidelity simulations are applied to the forecast step and localization is applied to the analysis step, the two techniques can be combined, if desired.
Lower-fidelity models can have high numerical errors. Additionally, choosing an optimal level of fidelity for these models is not straightforward. Multilevel (ML) simulations utilize a selection of models, which form hierarchies of both computational accuracy and computational costs (ML model). Multilevel Data Assimilation (MLDA) [29][30][31][32][33][34][35][36][37] utilizes an ML model in the forecast step of the DA. The levels of the multilevel model have different numerical accuracies. The MLDA, allocating some of the computational power to the models with lower fidelity, tries to achieve a low total error by keeping a balance between the numerical errors and statistical errors. Some conventional DA methods (i.e., single-level ensemble-based DA methods) , like the ensemble smoother (ES) [38], the ES with multiple DA (ESMDA) [2], and the iterative ES (IES) [1], assimilate data simultaneously, i.e. assimilate all data over a certain period at once, while other conventional methods, like the ensemble Kalman filter (EnKF) [39] and the EnKF with multiple DA (EnKFMDA) [40], assimilate data sequentially. In the MLDA domain, a Simultaneous MLDA (SiMLDA) algorithm was developed for assimilation of inverted seismic data [33]. Numerical experiments show that this algorithm, the Multilevel Hybrid Ensemble Smoother (MLHES) whose formulation uses a hybrid Kalman gain based on several levels, outperforms its conventional DA counterparts with which it was compared [33,37,41]. However, strong nonlinearity can affect its performance negatively. Hence, another SiMLDA algorithm, the iterative version of MLHES (IMLHES) [37], was designed to handle this problem. Numerical experiments show that use of iterations improves performance of MLHES [37], but IMLHES is not without limitations. It is not obvious how to optimally formulate convergence criteria for the different levels of IMLHES. This could cause failed iterations and additional computational cost.
There are indications that utilization of sequential MLDA (SeMLDA) algorithms can benefit from certain properties of the problem of estimation of permeability from inverted seismic data. Firstly, analytical [42] and numerical [43] results show that sequential DA is expected to outperform simultaneous DA for weakly nonlinear problems. Secondly, for a class of problems (including the problem considered here) where the model forecast can be seen as a spatially integrated response to a spatially varying parameter field, there exists a correlation between smallscale oscillations in the parameter domain and the degree of nonlinearity of the mapping from parameter field to model forecast, see, e.g., [44,45]. This correlation is such that coarsening the simulation grid and upscaling the associated parameters will generally result in weaker nonlinearity in the coarser forward models compared to the finer ones. Taking advantage of this, we consider several resolutions of inverted seismic data, and develop a SeMLDA scheme which first assimilates the coarsest resolution of the data corresponding to the coarsest forward model, followed by the data in higher resolution corresponding to more nonlinear models. This construction corresponds to the optimal ordering of data as suggested by [42,43]. Note that, here, the term 'sequential' pertains to a sequence of data with different resolutions, not data at different times.
In this work, we will introduce two variants of a SeMLDA algorithm and assess their performance in comparison with a conventional sequential DA algorithm, a conventional simultaneous DA algorithm, and a SiMLDA algorithm. This will be done with the help of numerical experiments. The rest of this paper is organized as follows. Since the SeMLDA methods developed in this paper are partially inspired by the ESMDA, the ESMDA will be briefly presented in Section 2. Section 3 describes MLDA in general, and introduces two variants of a novel SeMLDA algorithm. Section 4 explains the test models used for numerical investigation. In Section 5 we describe the numerical investigations, which are followed by their results in Section 6. Finally, in Section 7 we summarize and conclude the paper.

Ensemble smoother with multiple data assimilation
The forward models used in the parameter estimation process are often nonlinear. In the case of assimilation of inverted seismic data, the nonlinearity comes from both fluid flow equations and rock physics modeling. This nonlinearity has motivated the development of several DA algorithms, including ESMDA [2]. This algorithm assimilates the same data N a times while inflating the data-error covariance matrix. By doing so, several small assimilation steps are taken instead of one big assimilation step. This helps to better account for the nonlinearity of the problem.
At step i of ESMDA, an ensemble of N e realizations {z pri i,j } N e j =1 from the prior parameter vector Z pri i is considered. The ESMDA update consists of three steps.
Firstly, running the forward simulator for every realization, z pri i,j , 1 ≤ j ≤ N e , we have where y i,j is a realization from the model forecasts random vector at assimilation step i, Y i . Secondly, the original observation data-error model, D ∼ N (E(D), C(D)), is slightly manipulated such that the random observation data vector at step i is given as D i ∼ N (E(D), α i C(D)). The data-error covariance matrix is inflated using the scalar value α i so that multiple assimilations of the same data does not violate Bayes' rule. Hence, the updated parameter vector of an arbitrary ensemble member is given by where d i,j is a realization of D i , and the Kalman gain, K i , is given as The terms C(Y i ) and C Z pri i , Y i denote the the covariance of Y i and cross-covariance between Z pri i and Y i , respectively.
Finally, while i < N a , the prior ensemble at step i + 1 is set equal to the posterior ensemble at step i; In general, the likelihood can be written as a product of inflated likelihoods. This process, known as tempering, fulfills Bayes theorem exactly. ESMDA is a special case of tempering. If the forward model, M, is a linear model and the distributions of the prior parameters and the data errors are Gaussian, there exists a condition (denoted the MDA condition) for α i which ensures that the ESMDA samples correctly from the posterior distribution. This condition is given as It is, however, unclear how important (5) is for problems where normality or linearity does not hold.

Multilevel data assimilation
In MLDA the forecast step is performed using a set of models which have different costs and fidelities. Here, we define M L := M, and {M l } L−1 l=1 where M l is an approximation to M L with increasing accuracy and computational cost as l increases. We will denote {M l } L l=1 an ML model.

Multilevel model
Members of an ML model form hierarchies of both accuracy and computational cost. One can think of several schemes to devise the hierarchy including, but not limited to, coarsening the spatial grid of the forward model, increasing the length of the forward model time steps, and relaxing the convergence criteria in the iterative linear solvers. All of these methods reduce the computational cost of the models and increase their numerical error. Coarsening the spatial grid and performing simulations on such grids is chosen for the current work (Note, however, that the parameters that we estimate are kept in the fine grid, meaning that upscaling the parameters is considered as part of the ML forward models).
Fossum and Mannseth [32] proposed a robust technique for grid coarsening, which was also used in [33,37,41]. In each coarsening step, neighboring cells of the grid at the previous step are merged into a coarser cell unless they are to be kept fine deliberately. A representation of the grid coarsening process for an 8 × 8 example grid can be found in Fig. 1. The figure illustrates that coarsening has occurred in a uniform manner across both directions, except for the

Transformation of model forecasts
The discrepancy in coarseness of the ML grids results in the spatially distributed model forecasts to be in different resolutions for different levels. Therefore, in order to be able to compute the ML sample statistics of model forecasts, a robust transformation scheme should be devised for converting a model forecast from the resolution at one level to another.
In the problem at hand, transformation of the model forecast requires either upscaling or downscaling. A standard volume-weighted arithmetic averaging technique is used for upscaling. Since the grids of the ML model used

Upscaling of observation data
As part of the DA process, the mismatch between the model forecasts and observation data needs to be calculated. Here, it is assumed that inverted seismic data is given in the resolution of the finest simulation grid, level L. Therefore, for each of the levels, either the observation data should be upscaled to the resolution of the model forecasts or the model forecasts should be downscaled to the resolution of the observation data. We take the former approach. Since the observation data is in the resolution of the finest model, using the same transformation functions as those Fig. 2 The schematic of two MLDA algorithms compared with ESMDA designed for the model forecasts on the fine observation data will result in upscaling of observation data into the required resolution. Hence, the transformed random vector of observation data at level l is given as

Sequential multilevel data assimilation
The idea of sequential assimilation of spatially distributed data resonates well with the nature of the ML model used. Since coarsening the model is expected to result in weaker nonlinearity [44,45], utilization of coarse models in the first steps of the DA followed by more non-linear fine models in the next steps can help to gradually zoom in on the solution of the inverse problem. Therefore, in SeMLDA the observation data are upscaled into several levels corresponding to the levels of the ML model. Afterwards, the data are assimilated sequentially starting from the coarsest resolution followed by the finer ones.

Sequential multilevel ensemble smoother
In this section we discuss two versions of the Sequential Multilevel Ensemble Smoother (SMLES). This algorithm draws on the ESMDA algorithm [2], MLHES algorithm [33], and constraints associated with assimilation of linearly dependent data [46]. Initially, based on the available computational resources, the number of simulations performed on each level should be decided. Since the simulations are cheaper on the coarser levels, a higher number of simulations will be performed on those levels. Considering this, a decision is made on the resource allocation. Based on the resource allocation, a sample of N 1 ensemble members is generated based on the prior information. Afterwards, at step l the model pertaining to level l is used to assimilate the data transformed to match the resolution of the model forecasts at that level. Running the forward simulator for every realization, z pri l,j , 1 ≤ j ≤ N l , from the prior parameters random vector at level l, Z pri l , we have where y l,j is a realization from the model forecasts random vector at step l, Y l . In the analysis step, realizations from the random observation data vector at level l, The form of C( D l ) will be discussed in Section 3.4.2. The updated parameter vector of an arbitrary ensemble member is, then, given by Here, K * l is defined generically with * being a wildcard notation, and we introduce two flavors of the algorithm.
The straightforward flavor (SMLES-S) utilizes the data error and model at level l for formulation of the Kalman gain at step l. Accordingly, the Kalman gain is given as where C(Y l ) and C Z pri l , Y l denote the covariance of Y l and cross-covariance between Z pri l and Y l , respectively.
In SMLES-S only the realizations which have been simulated by all the models are considered for the final statistics. With a subtle manipulation of the data-error covariance in Eq. 9 for the realizations at level l which are not simulated by M l+1 , the resulting posterior realizations in the linear-Gaussian case would sample correctly from the posterior distribution up to the data at level l. The condition that should hold at any level l is explained in Section 3.4.2. Figure 2 represents the schematic of SMLES-S in comparison with ESMDA and MLHES algorithms. As can be seen from the figure, the ensemble size of SMLES-S shrinks as the level increases due to limited computational resources. Accordingly, at level L the ensemble size becomes small and it may end up in the situation that MLDA wants to avoid at the first place. A possible treatment would be utilization of localization in the finer levels, but that would reduce the robustness of the algorithm. Another alternative is to allow for transfer of information between different levels by utilization of ML statistics. By doing so, a hybrid version of the SMLES algorithm is formulated similar to MLHES and IMLHES. The hybrid Kalman gain can be formulated as where C H l (Y l ) and C H l (Z, Y ) denote the ML covariance of Y l and the ML covariance between Z and Y , respectively. The ML statistics will be discussed in Section 3.4.3.
The SMLES-H algorithm takes advantage of both conditioning all the realizations to the data up to the last level on which they are simulated and the ML statistics. A depiction of the difference between SMLES-S and SMLES-H is presented in Fig. 3. Pseudo-codes of SMLES-S and SMLES-H are presented in Appendices A and B, respectively.

Partially multiple data assimilation condition
For the convenience of the reader, we present the main result from [46] (omitting the derivation). Similar to formulating a condition for assimilating a set of data multiple times such that the updated ensemble will sample correctly from the posterior in the linear-Gaussian case, there exists a condition which should hold if the data are linearly dependent, known as the Partially Multiple Data Assimilation (PMDA) condition [46].
In the problem at hand, the data at any coarser level, D c , is a linear transformation of the data at any finer level, D f , such that for any {c, l, f }, 1 ≤ c < l < f ≤ L, we have Hence, we can define the PMDA condition for the data up to any arbitrary level L, 1 ≤ L ≤ L.
Consider the data-error covariance matrix at level l < L to be given as where A l is an inflation matrix. Assuming the prior distribution is Gaussian, the forward model M L is linear, and for any coarser level c we have the straightforward formulation of the SMLES algorithm, given in Section 3.4.1, samples correctly from the posterior distribution of the parameter random vector up to the data at level L if the following condition holds,

Multilevel statistics
Assuming we have approximations of the model forecasts, Y , being a function of the unknown parameter vector, Z, on several levels, a scheme for approximation of ML statistics for Y is required. As for MLDA, the mean and the covariance of model forecast are of foremost interest. Assuming the model with the highest fidelity, M L , to be exact, and {M l } L−1 l=1 to be approximations to M L with a decreasing numerical error, [29] proposed a formulation for ML statistics. In this formulation, inspired by Bayesian Model Averaging, the statistics are computed based on reliability weights w l for each of the levels l. This formulation is, by definition, a biased scheme for computation of ML moments; however, it will be a useful technique for problems in which variance error dominates bias, which is often the case for petroleum reservoir problems [29]. Using this formulation and transformations of the forecast, [33] proposed a formulation of ML statistics for spatially distributed where E(Y k ) denotes sample mean of the model forecast at level k. At step l the model forecasts pertaining to levels finer than l are not available, since the proposed algorithm is sequential. Hence, with a small change in the abovementioned formulation, the formulation used in this work will be based on levels 1 ≤ k ≤ l, Using the law of total variance, the ML approximation of covariance of the model forecast at level l is formulated as In addition, the parameter-forecast cross-covariance can be written as where E H (Z) is the ML formulation of the parameter-vector mean. This statistic is formulated using the same weights as in forecasts ML statistics, but since the parameters are in the same resolution for all levels, no transformation is needed for formulating it,

Test models
Three different reservoir models are set up for numerical investigations. These reservoir models have some shared properties. They are two-dimensional with Cartesian grids. For all of them, compressible two-phase flow (oil and water), no-flow boundary conditions, and standard equations for capillary pressure and relative permeability, are considered. All the wells in these reservoir models are controlled by their bottom-hole pressure; the injectors at 300 bar, and the producers at 110 bar. A description of the other shared general properties of the reservoir models is given in Table 1. Unique features of the reservoir models are explained separately in Sections 4.1 -4.3.
The forward models consist of two segments. A reservoir flow model is used to predict the state variables, i.e. the pressure and saturation of the reservoir fluids, in time, and a petro-elastic model is utilized for computing the elastic rock properties from parameters and predicted state variables.
The flow segment of the forward model is derived by substitution of Darcy's law into the mass conservation equation for each of the fluid phases, resulting in [47] ∇. where In these equations, k denotes absolute permeability, and k r * denotes the relative permeability of the corresponding phase, while * is a wildcard notation. k r * is a function of saturation of that phase, S * . The pressure of a phase is denoted by p * , and the capillary pressure, p cow , is a function of S w . Furthermore, g denotes the gravitational constant; ν * , B * , and ρ * are the viscosity, the formation volume factor, and density of their corresponding phases; and q * denotes the sink or source term of its corresponding continuity equation. The flow segment of the forward models is performed using Eclipse-100 [48]. Coarsening the grid is done by using the Eclipse keyword COARSEN, which merges groups of pre-defined neighboring cells to form a coarser grid. The upscaling of permeabilities is performed using porevolume weighted arithmetic averaging, and transmissibilities between two neighboring coarse cells in each direction are calculated based on harmonic averaging in that direction and summing it in other directions [48].
As for the petro-elastic segment of the forward model, an in-house model based on standard rock-physics equations [49], [50, Report 1] was used.

Reservoir model I
This model has a 50 × 50 grid, and two wells, one producer (P) at center east and one injector (I) at center west. This model is designed to evaluate the performances of the different algorithms in parameter estimation of an oil reservoir with relatively long-range correlation in permeability field.

Reservoir model II
This model has a 64×64 grid and five-spot well pattern, four injectors at the corners and a producer at the center of the field. This model is designed to assess the performances of the different algorithms in history-matching of a field with relatively short-range correlation length.

Reservoir model III
This model has a 70 × 70 grid and two wells, an injection well in southwest corner and a production well in the northeast corner. This model has two permeability zones, one with a long-range correlation length and one with a short-range correlation length, and there exist a smooth transition from one zone to another.

Numerical investigation
Three numerical experiments are conducted, one for each of the three reservoir models discussed in Section 4.
The unknown parameter field in all the experiments is the logarithmic permeability field, which has a different distribution in each of the experiments.
The observation data are two sets of time-lapse bulkimpedance data based on a baseline (day zero of production) and two vintages, which are different for each experiment and will be described separately. These observation data are generated using the results of simulation of a random draw from the prior parameter distribution. As inverted seismic data typically are spatially correlated, we use a non-diagonal covariance matrix for the data error, based on isotropic spherical variograms with mean 0. The ranges of the variograms are different for different experiments so that the robustness of the algorithms towards variogram range is assessed. The marginal standard deviation of each observation value is given as where r = 0.1, δ is the value of observation data at a certain location, and η is a threshold introduced to avoid too much certainty in observation data whose absolute values are very small. This threshold is defined as the 1 st smallest percentile of the absolute value of the observation data.
In order for comparison of SMLES with standard ESMDA, three algorithms are run; SMLES-S, SMLES-H, and ESMDA with localization (ESMDA-LOC). The gold standard for this comparison is the DA results obtained  What we want to estimate is the posterior distribution of the parameters and the model forecasts. The correct estimates would be given by Markov Chain Monte Carlo with an exceedingly large chain length. However, the focus of this work is comparing the novel MLDA algorithms with algorithms of the same class which are widely used in reservoir history matching, i.e. ensemble-based DA algorithms. Hence, ESMDA and IES with exceedingly large ensemble sizes are selected to remove the Monte Carlo errors and serve as the gold standards for comparison.
In order for these comparisons to be fair, there exists the "equal computational power" constraint. As the dominant cost of the DA process is pertaining to simulations of forward models, where iterative linear solvers dominate the computational costs for large problems, the computational cost relating to simulation of each ensemble member, using forward model M l , is assumed to be proportional to G γ l , where G l is the number of the active grid cells in the forward model at level l, with γ ∈ [1.25, 1.5], [51]. Here, we take γ = 1.35. Accordingly, the computational power associated with each algorithm run can be written as where n l is the total number of simulations using M l . Using Eq. 28, the computational cost of the algorithm runs are set to be equal. The iterative algorithms do not always have a fixed computational cost and many iterations are performed to satisfy the convergence criteria without considerable improvements in the objective function. Therefore more computational power (approximately twice the computational power of the other algorithms) was observed for these algorithms (IMLHES and IES-LOC).
Setting a fixed computational cost, there exists L − 1 degrees of freedom for specification of the {N l } L l=1 in the ML algorithms. No optimization was performed for this specification, the only aim pursued was to keep the size of sub-ensembles ascending with decreasing model cost. Several other similar settings that were tried resulted in similar DA outcomes.
The convergence criterion for the iterative algorithms was that improvements in the relative data mismatch should be smaller than 0.0001.
The localization scheme in ESMDA-LOC was based on covariance structure given in [52], spherical variogram, and the DA was performed using subspace inversion method proposed by [53]. As for IES-LOC, the localization scheme was based on covariance structure given in [52] and spherical variogram.
For the SMLES-H and IMLHES, there is a possibility to improve the results by tuning the weights in calculation of ML statistics for specific cases, but here we use the simplest choice-{w l = 1 L | 1 ≤ l ≤ L}. Both ESMDA-LOC and ESMDA-REF are run with 6 steps, with α i = 6, 1 ≤ i ≤ 6, in all the steps. Runs with higher number of steps were conducted but no major improvement in the DA results was observed.
As for SMLES-S, the inflation matrices are set as A l = L * I ζ l for 1 ≤ l < L, where I ζ l is the identity matrix of size ζ l . For L = L, Eq. 15 is solved to calculate A L . Unlike satisfying the MDA condition, which is computationally trivial, satisfying the PMDA condition is computationally very expensive for real field problems and becomes practically unfeasible. This is due to the presence of the costly inversions in Eq. 15. However, since the PMDA condition only secures correct sampling for linear-Gaussian problems, approximately satisfying this condition may suffice for the real field cases.
Regarding SMLES-H, the ensemble at level l is divided into two sub-ensembles; (I) the realizations which are simulated using M l+1 after the analysis step, {z pri l,j | 1 ≤ j ≤ N l+1 }, and (II) those that are not, {z pri l,j | N l+1 < j ≤ N l }. For Sub-ensemble (I) the inflation matrix is set as A l = L * I ζ l . For Sub-ensemble (II), Eq. 15 is solved for L = l, so that A L is calculated. This is done to assure

Experiment I
This experiment is conducted on Reservoir Model I. The ML algorithms have four levels, corresponding to 157, 259, 685, and 2500 grid cells, respectively. A summary of the resource allocation for the different runs carried out in this experiment can be found in Table 2. All the numbers in the table are per assimilation step or iteration, except for SMLES-S and SMLES-H. For these two algorithms the total number of simulations performed at each of the levels are reported. This also holds for Tables 3 and 4 pertaining to Experiment II and Experiment III, respectively. The observation data for this experiment are generated based on seismic vintages at 2500 and 5000 days after production starts. The range of the variogram used for data-error covariance in this experiment is 15 grid cells.
The prior unknown logarithmic permeability field is based on an exponential variogram with mean and variance constant at 5 and 1, anisotropy angle and anisotropy ratio of 80 degrees and 0.7, and range 20 grid cells. Randomly selected realizations from this logarithmic permeability field can be found in Fig. 4.

Experiment II
This experiment is conducted on Reservoir Model II. The ML algorithms have four levels, corresponding to 265, 433, 1147, and 4096 grid cells, respectively. A summary of the resource allocation for the different runs carried out in this experiment can be found in Table 3.
The observation data for this experiment are generated based on seismic vintages at 4000 and 8000 days after production starts. The range of the variogram used for data-error covariance in this experiment is 10 grid cells.
The prior unknown logarithmic permeability field is based on a spherical variogram with mean and variance constant at 5 and 1, the anisotropy angle and anisotropy ratio of −30 degrees and 0.7, and range 10 grid cells. Randomly selected realizations from this logarithmic permeability field can be found in Fig. 5.

Experiment III
This experiment is conducted on Reservoir Model III. The ML algorithms have three levels, corresponding to 390, 1261, and 4900 grid cells, respectively. A summary of the  Table 4.
The observation data for this experiment are generated based on seismic vintages at 4000 and 8000 days after production starts. The range of the variogram used for data-error covariance in this experiment is 5 grid cells.
The unknown logarithmic permeability field is based on two different variograms in two zones of the field, one encompassing the northeastern part of the field and one encompassing the southwestern part of the field with a smooth transition between them. The details about the variograms based on which the distribution of the unknown parameters are defined can be found in Table 5. Randomly selected realizations from this logarithmic permeability field can be found in Fig. 6.

Numerical results
The results from the numerical experiments are assessed both qualitatively and quantitatively, using the posterior parameters and forecasts.
As for qualitative analysis, firstly, mean and variance of the posterior parameter fields obtained by different  The conventional DA algorithms (ESMDA-LOC and IES-LOC) were tested with several ranges for localization (critical distances), and the best results are presented for each of the experiments.
The quantitative analysis is performed using a measure suggested by [54] for comparison of DA results obtained by different schemes with a reference case. Consider ν to be a vector of interest for quantitative analysis, e.g. parameter estimate vector or model forecast vector. The following two metrics are computed for each of the algorithm runs [54], where Mean and Var represent the empirical mean and variance for different cases; superscripts pri and pos denote the prior and the reference posterior, respectively; the subscript * is a wildcard notation for the algorithm of interest; and the distance is measured in 2-norm. Here, the reference posterior is calculated based on the results obtained by ESMDA-REF for all the experiments. When estimation of ν is concerned, the Mean metric has the property that it will be close to 0 for the algorithms that perform similar to the reference and will be close to 1 for the algorithms whose estimate of ν are similar to the prior estimate. Similarly, the Var metric calculates the difference between the variance obtained by the algorithm of interest and the reference variance normalized by the norm of the reference variance, meaning that smaller values are preferred.
We will compute these two metrics for both the posterior parameter estimates and the second vintage of the posterior model forecasts.

Results of experiment I
Visual analysis of the mean permeability fields in Fig. 7 shows that the SMLES-S and SMLES-H results are more similar to the ESMDA-REF result than the ESMDA-LOC result is. It also shows that IMLHES performs more similar to IES-REF than IES-LOC does. There does not seem to be any considerable advantage in performance of a specific ML algorithm when the mean permeability fields obtained by the ML algorithms are compared with the IES-REF result. However, due to convergence issues mentioned above, the SiMLDA algorithm used more computational Checking the variance fields in Fig. 8 confirms that both SMLES-S and SMLES-H perform better than ESMDA-LOC. It further confirms that IMLHES performs more similar to IES-REF than IES-LOC does. However, in this figure it is evident that SMLES-S underestimates the uncertainty in the posterior parameters while SMLES-H and IMLHES overestimate it. No indication of superiority of either SMLES-H or IMLHES over each other is noticeable in the variance fields, and both perform slightly better than SMLES-S. It is worth noting that, changing the color scale of the plot of the variance field for SMLES-S (denoted by SMLES-S * in the figure), shows that, in spite of underestimation of uncertainty, the structure of variance field is predicted accurately by this algorithm.
Superiority of performance of ML algorithms over the performance of the conventional DA algorithms is further confirmed by checking the statistics of the model forecasts in Figs. 9 and 10. There is no clear indication of advantage of either SMLES-H or IMLHES over each other. However, SMLES-S performs better than both of them particularly in estimating the variance field of the model forecasts.
Quantitative measures given in Tables 6 and 7 also confirm that the ML algorithms generally perform more similar to ESMDA-REF than the conventional DA algorithms. Among the ML algorithms, SMLES-H performs slightly better in estimation of both the mean posterior parameters and the mean model forecasts, while IMLHES performs slightly better in estimation of the variance of the posterior parameter field, and SMLES-S performs significantly better in estimation of the variance of the posterior model forecasts.
The ESMDA-LOC and IES-LOC results presented here are based on the localization range of 40 grid cells (1200 meters).

Results of experiment II
Visual analysis of the mean permeability fields in Fig. 11 shows that the SMLES-S and SMLES-H results are more similar to the ESMDA-REF result than the ESMDA-LOC result is. It also shows that IMLHES performs more similar to IES-REF than IES-LOC does. There does not seem to be any considerable advantage in performance of a specific ML algorithm when the mean permeability fields obtained by ML algorithms are compared with the IES-REF result.
Checking the variance fields in Fig. 12 confirms that both SMLES-S and SMLES-H perform better than ESMDA-LOC. It further confirms that IMLHES performs more similar to IES-REF than IES-LOC does. However, in this figure it is evident that SMLES-S underestimates the uncertainty in the posterior parameters while SMLES-H and IMLHES overestimate it. No indication of superiority of either SMLES-H or IMLHES over each other is noticeable in the variance fields, and both perform slightly better than SMLES-S. As for Experiment I, changing the color scale of the plot of the variance field for SMLES-S (denoted by SMLES-S * in the figure), shows that, in spite of underestimation of uncertainty, the structure of variance field is predicted accurately by this algorithm.
Superiority of performance of the ML algorithms over the performance of the conventional DA algorithms is further confirmed by checking the statistics of the model forecasts in Figs. 13 and 14. There is no clear indication of advantage of either SMLES-H or IMLHES over each other. However, SMLES-S performs better than both of them particularly in estimating the variance field of the model forecasts.
Quantitative measures given in Tables 6 and 7 also confirm that the ML algorithms generally perform more similar to ESMDA-REF than the conventional DA algorithms. All the MLDA algorithms perform reasonably similar in estimation of the mean of posterior parameters, while IMLHES and SMLES-H perform slightly better in estimation of the variance of the posterior parameter field. As for estimation of the statistics of the posterior model forecasts, SMLES-S performs best, and its superiority is most evident in estimation of the variance of the posterior model forecasts.
The ESMDA-LOC and IES-LOC results presented here are based on the localization range of 60 grid cells (1800 meters).

Results of experiment III
Visual analysis of the mean permeability fields in Fig. 15 shows that the SMLES-S and SMLES-H results are more similar to the ESMDA-REF result than the ESMDA-LOC result is. It also shows that IMLHES performs more similar to IES-REF than IES-LOC does. There does not seem to be any considerable advantage in performance of a specific ML algorithm when the mean permeability fields obtained by ML algorithms are compared with the IES-REF result.

Summary and conclusions
In this work, two variants of a novel sequential MLDA algorithm, SMLES-S and SMLES-H, were introduced. In addition, performances of these algorithms were assessed in comparison with two conventional DA algorithms and a simultaneous MLDA algorithm. In doing so, three experiments were performed on three reservoir models. The three experiments were designed such that the performance of the algorithms were evaluated in different settings for the prior parameter fields (different variogram types; different anisotropies; and various correlation lengths including long-range correlation, short-range correlation, and mixture of long-range and short-range correlations) and different ranges for the variograms used for the data-error covariance. Each of the experiments consisted Results of the experiments were assessed both qualitatively and quantitatively.
In order for qualitative evaluation of the numerical results, firstly, the mean and the variance of posterior parameter fields were generated and assessed visually. The relative performances of the different methods were similar for all three experiments. The assessments showed that both SMLES-S and SMLES-H performed more similar to ESMDA-REF than ESMDA-LOC did in estimation of the posterior parameter mean field. Regarding estimation of the variance fields, SMLES-H overestimated the variance while SMLES-S underestimated it. The superiority of performance of both SeMLDA algorithms over ESMDA-LOC was evident, also for the variance fields. Among the iterative algorithms, IMLHES performed more similar to IES-REF than IES-LOC did. There was no indication of superior performance of either SMLES-H or IMLHES over each other in any of the experiment when their performances were compared to IES-REF results. However, IMLHES used more computational resources than either of the SeMLDA algorithms. Both IMLHES and SMLES-H performed slightly better than SMLES-S.
Additionally, fine-scale simulations were run for all the posterior ensembles obtained by the different algorithms in  Two metrics were adopted for quantitative comparison of the DA results obtained by different algorithms for estimation of both mean and variance of the posterior parameters and model forecasts. The metrics indicated that the ML algorithms generally performed better than the conventional DA algorithms in estimation of both mean and variance of the posterior parameters. They also indicated that SMLES-H and IMLHES performed slightly better than SMLES-S in estimation of the variance of the posterior parameters, and that all the MLDA algorithms performed better than IES-LOC in estimation of mean and variance of the posterior model forecasts. SMLES-S also performed consistently superior to ESMDA-LOC in estimation of mean and variance of the posterior model forecasts, while this was not observed for IMLHES and SMLES-H. Among the ML algorithms, SMLES-S clearly performed best when it came to estimation of the variance of the model forecasts. The other two algorithms did not consistently perform better than one another.
There were significant differences between the results from the MLDA algorithms and the results from the conventional DA algorithms in all the experiments. Simultaneous  assimilation of large amounts of data into ensembles of small size partly explains the under-performance of the conventional algorithms. In the case of inverted seismic data we noticed true long-range correlations between the data and parameters. Regularization of the Kalman gain using distance-based localization, which disregards these true correlations and distorts the update formula, is another cause for the significant difference between the results.    There were several issues in this work that can be further investigated. As satisfying the PMDA condition is not a possibility in real field cases, devising robust techniques for approximately satisfying this condition can be studied. In order for optimization of both SeMLDA algorithms several of their characteristics can be further researched, e.g. the optimal extent of coarsening the grid, the number of levels and allocation of resources between them, the weights for different levels in the ML statistics, and the formulation of posterior statistics, to name a few. Additionally, generalizations of SMLES algorithms can also be studied, e.g. by assimilating the data more than once in some of the levels using more inflated data-error covariance matrices. Finally, as realistic reservoir cases are more complex than the fields investigated in this work, extensive coarsening of the grid may result in large numerical error and model bias. This limitation, and increase in the dimensionality of the parameters, may call for combination of localization and the proposed MLDA algorithms.
Funding Open Access funding provided by NORCE Norwegian Research Centre AS. The authors acknowledge financial support from the NORCE research project "Assimilating 4D Seismic Data: Big Data Into Big Models" which is funded by industry partners, Equinor Energy AS, Lundin Energy Norway AS, Repsol Norge AS, Shell Global Solutions International B.V., TotalEnergies EP Norge AS, and Wintershall Dea Norge AS, as well as the Research Council of Norway (PETROMAKS2). We also thank Schlumberger for providing academic software licenses to ECLIPSE.

Declarations
Financial interests There are no financial interests to disclose.

Non-financial interests
The fourth author is an editorial board member in Computational Geosciences. Otherwise, there are no non-financial interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.