Flexible iterative ensemble smoother for calibration of perfect and imperfect models

Iterative ensemble smoothers have been widely used for calibrating simulators of various physical systems due to the relatively low computational cost and the parallel nature of the algorithm. However, iterative ensemble smoothers have been designed for perfect models under the main assumption that the specified physical models and subsequent discretized mathematical models have the capability to model the reality accurately. While significant efforts are usually made to ensure the accuracy of the mathematical model, it is widely known that the physical models are only an approximation of reality. These approximations commonly introduce some type of model error which is generally unknown and when the models are calibrated, the effects of the model errors could be smeared by adjusting the model parameters to match historical observations. This results in a bias estimated parameters and as a consequence might result in predictions with questionable quality. In this paper, we formulate a flexible iterative ensemble smoother, which can be used to calibrate imperfect models where model errors cannot be neglected. We base our method on the ensemble smoother with multiple data assimilation (ES-MDA) as it is one of the most widely used iterative ensemble smoothing techniques. In the proposed algorithm, the residual (data mismatch) is split into two parts. One part is used to derive the parameter update and the second part is used to represent the model error. The proposed method is quite general and relaxes many of the assumptions commonly introduced in the literature. We observe that the proposed algorithm has the capability to reduce the effect of model bias by capturing the unknown model errors, thus improving the quality of the estimated parameters and prediction capacity of imperfect physical models.


Introduction
Bayesian inversion is a generic inference framework that is widely adopted for calibration of mathematical models while accounting for different types/sources of uncertainties. In the Bayesian framework, the prior model parameters belief or probability is updated by integrating the observed data to obtain the posterior probability. Several algorithms could be used to generate samples from the posterior distribution of the model parameters. Among those, Muzammil Hussain Rammay mhr3@hw.ac.uk; muzammil.h.rammay@uis.no Ahmed H. Elsheikh a.elsheikh@hw.ac.uk 1 Heriot-Watt University, Edinburgh, UK 2 Total Geoscience Research Centre, Aberdeen, UK Markov chain Monte Carlo (MCMC) is an exact method for sampling. However, MCMC can be computationally expensive due to the large number of iterations (number of sampling steps) needed to reach convergence and the sequential nature of the method. Ensemble-based methods have been gaining popularity in the last two decades for data assimilation and calibration of simulation models of various physical systems [5]. The main advantage of ensemble-based methods is the low computational cost for higher dimensional data assimilation and inverse problems with some trade-off between the convergence speed/dimensionality and Gaussian approximation of the posterior distribution. Ensemble smoother (ES) is one of the ensemble-based techniques similar to the ensemble Kalman filter [6]. ES updates the model parameters by simultaneous assimilation of all available data. However, ES fails to solve highly non-linear inverse problems. A number of iterative ensemble-based methods have been proposed for parameter estimation of non-linear mathematical models. The ensemble smoother with multiple data assimilation (ES-MDA) [5], Levenberg-Marquardt ensemble randomized maximum likelihood method (LM-EnRML) [3] and iterative ensemble smoother (IES) [18,27] are some of these techniques. In ensemble-based methods, the prior ensemble members of the model parameters are computed from the initial statistical distribution of the unknown model parameters and the objective is to find an approximate posterior distribution (i.e., posterior ensemble) of the model parameters conditioned to observation data. After calibration, the posterior ensemble of the model parameters is used for making predictions. Ensemble-based methods are designed with the assumption that utilized mathematical model provides a complete representation of real physical systems and that the model errors are small enough that it could be neglected during the calibration process. This assumption might introduce bias in the estimated parameter distribution [1,26] and as a consequence results in bad quality predictions using the calibrated models.
A large number of studies have been conducted to explore the possible ways to account for the model error during the calibration process. In a Bayesian inversion context, three broad lines of research have emerged in the published literature. In the first line of research, the prior model error statistics were computed using pairs of high-fidelity and low-fidelity models. These error statistics were utilized during the calibration by using different types of algorithmic frameworks [10,16,17,20,22]. These frameworks vary according to the behavior of different physical systems and could perform poorly in highdimensional problems with variable boundary conditions and when the distributions of the model error statistics are complex or multi-modal. The problem of complex statistics of model error was addressed by Köpke et al. [13] and Köpke et al. [14], where the authors accounted for the model error component using orthonormal basis generated from the difference between pairs of high-fidelity and lowfidelity models. These bases were evaluated locally at each update iteration as well as the model pairs (high and low fidelity) were re-run during the calibration process. More recently, an input/output independent formulation of model error was introduced to handle high-dimensional parameter estimation problems as well as handling problems with time-varying boundary conditions [23,24]. However, this line of research relies on the availability of high-fidelity models such that we could learn the statistical properties of the model errors by evaluating both the high-and lowfidelity models at the same set of model parameters. This assumption might not be valid for a wide range of applied problems where we only have access to one model.
The second line of research for addressing model error/bias during calibration is related to joint calibration of physical models with a second model that is assumed to represent the model error. Different parameterization for the model bias/error have been proposed, for example Gaussian process regression [12] or autoregressive error models [8,9]. However, without any prior knowledge of the error model (unknown model error), the joint calibration may be prone to break the physical constraints of the systems and as a consequence might fail to improve the predictive capacity of the calibrated physical model [25]. Evin et al. [7] compared the joint calibration approach to a post-processing approach of accounting for the model error and concluded that the joint calibration approach was found to be less robust. This leads us to the third line of research, where model error is generally estimated from the residual (data mismatch obtained from difference between reality and simulation models). For example, Evin et al. [7] estimated the model error by using normalized residuals and autoregressive model with linear heteroscedasticity. However, this formulation has limitation in the scenarios where model error exhibits strong structural features and non-linear heteroscedasticity. Along the same line, Sargsyan et al. [25] utilized approximate Bayesian computation (ABC) to capture the model error uncertainty using residuals. However, the use of ABC in ensemblebased methods is not clearly understood. Recently, Oliver and Alfonzo [21] estimated the correlated structure of an approximate model error by computing a covariance matrix of the total residual obtained after one-round of calibration. This covariance matrix was utilized to estimate the model error statistics and then a re-calibration step is introduced in order to compensate the model error effects until a termination criterion is satisfied. This approach while novel requires multiple re-calibration iterations and has the limitation to handle non-Gaussian behavior of residuals. Furthermore, if the imperfect model is flexible enough, the total residual might vanish after one calibration step and then the model error/bias will be underestimated.
In this paper, a flexible iterative ensemble smoother is introduced to calibrate both perfect and imperfect models. The algorithm is simple to implement and has negligible computational overhead over the standard ES-MDA algorithm. ES-MDA is reformulated by splitting of data-residual (mismatch between model output and observation) into two parts by estimating a split parameter (scalar value) during the calibration process. The first part of the residual is used to update the model parameters and the second part is assumed to represent the model error. The objective of the proposed algorithm is to reduce the model bias by capturing the unknown model error uncertainty during the calibration of imperfect models in order to improve the prediction capability of the calibrated physical model. Furthermore, the proposed algorithm could be used as a diagnostic tool to check the reliability of the physical models and the need for a model refinement step. In this work, three test cases have been used to observe the performance of the proposed algorithm. These test cases are related to the calibration of polynomial functions, simple machine and an imperfect reservoir model. In the first test case, cubic function is considered as the perfect model and imperfect models are represented by quadratic and linear functions. The second test case is related to estimation of efficiency of the simple machine model which lacks physics in terms of friction component. The third test case is related to calibration of imperfect reservoir model, which has blurred channelized geological patterns. The imperfect reservoir model has two sources of modeling errors, i.e., simplified geological representation and upscaling errors.
The outline of this paper is as follows. In Section 2 the formulation of flexible iterative ensemble smoother is described. Following that, three test cases related to calibration of polynomial functions, simple machine model and imperfect reservoir model are described with results in Section 3. Section 4 is related to the conclusions of the paper.

Formulation of flexible iterative ensemble smoother
Standard data assimilation and parameter estimation procedure often rely on an implicit assumption that the model errors are generally small and could be neglected (i.e., the mathematical model is perfect). Mathematically, if a perfect model is utilized, the observed data is formulated as, where g(.) is a function representing the perfect model, m true is the true model parameters, d is the measurement errors which is usually assumed to follow a normal distribution N (0, C D ) and C D is the covariance matrix of the measurement errors. Generally, physical models are treated as if they are perfect during the calibration process and subsequently parameter mismatch/errors are assumed as the main source of the observation/data mismatch. However, it is widely known that physical models are imperfect and are only an approximation of reality in terms of description, scale, assumption or complexity. These approximations commonly introduce model errors that are often neglected during calibration. The mismatch between historical data (i.e., the true system response) g(m true ) and simulated data from an imperfect modelg(m) is a result of both parameter error and model error. Mathematically, if an imperfect model is utilized, the observed data could be formulated as, where m is the model error andg(.) is a function representing the imperfect mathematical model. Failing to account for the model error effects during the model calibration process can result in significant bias in the calibrated model parameters and consequently might result in inaccurate predictions. Generally, the magnitude of the model error is unknown before calibration and data assimilation (i.e., we do not know the extent of the imperfection of the mathematical model). In this section, an iterative ensemble smoother is formulated with an approximate method to quantify the model errors during the calibration process in order to reduce and in some cases eliminate any bias in the estimated model parameters. Furthermore, we aim to develop an iterative ensemble smoother that is flexible enough to match the observed data to the level of measurement noise in the case of utilizing a perfect mathematical model for inversion. In order to achieve this objective, we first present the standard ES-MDA, followed by the proposed flexible formulation that could handle the cases where the model errors cannot be neglected. In standard settings, ES-MDA update equation can be written as [5], where M = [m 1 , m 2 , m 3 , . . . , m N e ] ∈ R N m ×N e is an ensemble of model parameters of size N e , m r is a realization r of the model parameters of size N m , D ∈ R N d ×N e is an ensemble of model outputs of size N d generated by a perfect model denoted by an operator g (i.e., D = g(M)), D uc ∈ R N d ×N e is the ensemble of perturbed observations (d obs ∈ R N d ×1 ), α is the noise inflation parameter and C D is the covariance matrix of the measurement error/noise. The covariances C MD and C DD , representing approximate sensitiveness of the model response to changes in the model parameters, are defined using the following equations: where M mean ∈ R N m ×1 is the ensemble mean of model parameters, D mean is the ensemble mean of model outputs and − → 1 N e ∈ R 1×N e is a row vector of ones. In ES-MDA, non-linear inverse problem is solved iteratively with an inflated noise covariance matrix and the inflation factor α is normally set to the total number of data assimilations/iterations N a .
For the proposed Flexible ES-MDA, the output of the imperfect model is related to the perfect model output using the following equation: where D is an ensemble of model output generated from the imperfect model denoted by the operator g using D = g(M) and E is an ensemble of the model error. Similarly, the mean of the previous equation over the ensemble outputs resulted the following equation: For the approximate sensitivities, substituting Eqs. 6 and 7 into Eqs. 4 and 5 resulted in the following modified covariances: where, Using these sensitivities and substitution of Eqs. 6, 8 and 9 into Eq. 3, the Flexible ES-MDA update equation while accounting for the model error can be written as: In general, the ensemble of modeling errors E is unknown. However, the residual of data mismatch includes the model error effects and we postulate that this dataresidual could be split between a parameter update component and a modeling error component. In this study, we approximate the ensemble of model errors E using a fraction (percentage) of the data-residual ensemble as shown in the following equation: where R = d obs − → 1 N e − D ∈ R N d ×N e is the ensemble of residuals obtained from the differences between the observation and the ensemble of imperfect model outputs and s p is a scaler split parameter. The prior (initial) split parameter is computed based on the ratio of the Euclidean norm of mean residual (mean deviation from observed data) and Euclidean norm of maximum residual (maximum absolute deviation from observed data).
where σ m = mean(R) ∈ R N d ×1 is the mean residual (mean deviation from observed data) and σ max = max(abs(R)) ∈ R N d ×1 is the maximum residual (maximum absolute deviation from observed data). During calibration (data assimilation) this split parameter is updated based on the ratio of the Euclidean norm of mean residuals obtained in current iteration i and previous iteration i − 1.
The statistical interpretation of the split parameter (Eqs. 18,19) can be attributed to the correspondence of the mean residual to the model error/bias. Equation 18 is used to normalize the initial mean residual in the range of [0, 1] and Eq. 19 is used to represent the change in the mean residual during the calibration/data assimilation process. We note that if model response/output consists of multiple time series, the split parameter should be calculated individually for each time series.
Following the introduction of the ensemble of approximate model errors as defined in Eq. 17, we can rewrite Eq. 16 as the following: We note that the proposed algorithm introduces an assumption about the model error form as defined in Eq. 17. This form is based on several numerical observations and is validated by extensive numerical testing. However, there is no guarantee that this form is optimal for all imperfect models. For those case where Eq. 17 fails to approximate the true model errors, utilizing the full update equation (Eq. 20) might result in invalid updates (and subsequently divergence) because of the corrupted sensitivity (crosscovariance) terms (i.e., C M E , C D E and C ED ). Therefore, we adopted a relatively robust approach by retaining only the model error components with regularization effects (C E E ). This results in a flexible ES-MDA formulation that is able to fit high-quality models with negligible model errors (a.k.a. perfect models) as well as suitable for calibration of imperfect models where model errors cannot be neglected.
Therefore, Eq. 20 can be simplified to the following form: Equation 21 represents the final update form for flexible iterative ensemble smoother. We also note that after Algorithm 1 Flexible ensemble-based algorithm for perfect and imperfect models.

Test cases
In this work, three test cases have been used to observe the performance of the proposed algorithm. These test cases are related to the calibration of polynomial functions, simple machine and imperfect reservoir model. For comparison purpose, calibration is performed using both the standard ES-MDA [5] and the proposed Flexible ES-MDA algorithm. An ensemble of 100 members is used with 8 number of iterations for the calibration of all test cases. In the first test case, cubic polynomial function is considered as perfect model and imperfect models are represented by quadratic and linear functions. The objective is to test the performance of the proposed algorithm for calibration of perfect and imperfect models. The second test case is related to the estimation of efficiency of the simple machine model which lacks physics in terms of friction component and used [1] as a test case for a joint calibration framework with the model error parameterized by Gaussian process regression. In this work, the objective is to test the calibration and prediction improvement of simple machine model using the proposed algorithm without joint calibration. The third test case is related to calibration of imperfect reservoir model, which has blurred channelized geological patterns. The imperfect reservoir model has two sources of modeling errors, simplified geological representation and up-scaling errors.

Test case 1: the polynomial functions
In this test case, the data is generated from a cubic polynomial functions and perturbed with an additive measurement noise d ∼ N (0, C D ) using Eq. 22. The complexity of the calibrated models vary from a first order polynomial to a third order polynomials as shown in Eqs. 23, 24 and 25. The objective is to test the performance of the proposed algorithm on both scenarios where the calibrated model complexity matches the data generating process (i.e., perfect model) and when the calibrated model is less parameterized than the data generating process (i.e., imperfect models). The domain of x lies within the interval [−1, 1]. We calibrate three models (cubic, quadratic and linear models) to obtain the posterior distributions of each model parameters vector λ and the corresponding models outputs. The objective is to evaluate the flexibility of the proposed algorithm in quantifying the model error uncertainty for imperfect models as well as the ability to match the data for the perfect model case. Prior parameters are sampled from a standard normal distribution with zero mean (μ = 0) and standard deviation σ = 10.
The calibration of polynomial functions are performed using two levels of measurement noise. In the first level, negligible measurement error (i.e., order of magnitude 10 −12 ) is considered. Figure 1 shows the posterior distribution of cubic, quadratic and linear models outputs obtained by ES-MDA and Flexible ES-MDA algorithms. We observe that both algorithms manage to match the observations for cubic model (perfect model case), without any bias as shown in Fig. 1a and b. This shows that the  Model output (f) Linear model proposed algorithm did not introduce any bias and has the flexibility to match the data for the perfect model case with negligible measurement errors. The posterior distributions of cubic model output ( Fig. 1a and b) are appeared as the point estimate (i.e., exact data match) due to the effect of negligible measurement error (i.e., order of magnitude 10 −12 ). Figure 1c and d show the posterior distribution of the quadratic model output. It is clear that the standard ES-MDA fails to match the data, while the calibrated model using the Flexible ES-MDA results in model outputs that account for model error uncertainty and provides a good coverage for the observations as shown in Fig. 1d. Similar results are observed for the linear model case in Fig. 1e and f, where a better coverage is obtained by the proposed Flexible ES-MDA. However, we note that the coverage is not as good for the quadratic model case. This shows that the capacity as well as the limitation of the proposed algorithm in accounting for the modeling errors during the calibration.
For further validation of the proposed algorithm, we compare the best split factor (computed using the true model error ensemble E actual ) with the approximated split factor s p estimated using the formulation proposed in this manuscript. The best split is computed by minimizing the Frobenius norm of the difference between true model error ensemble and the residual ensemble, i.e., min E actual − s p R . This simple minimization problem is solved using differential evolution algorithm [28]. Figure 2 shows both the best split factor s p when the true model error is known and the approximated split versus the number of ES-MDA iterations. At iteration 0 (i.e., prior), both the best and proposed split factors show very low values, because the prior residual is relatively larger than the model error. For subsequent iterations, the split parameter is increasing with respect to the iteration number, which is attributed to the convergence of the proposed flexible algorithm to the level of model error uncertainty. At iteration 2, the proposed split is very close to best split value for both linear and quadratic imperfect models, which show the robustness of the proposed equation in approximating the split factor. After iteration 2, the proposed split parameter approached 1, which show that all the data mismatch (residual ensemble) is treated as a model error ensemble and further reduction of uncertainty is not possible due to the limited capacity of both the linear and quadratic models in matching the data. These results demonstrate the ability of the proposed algorithm in capturing the unknown model error uncertainties during the calibration of imperfect models.
Using the same problem setup, we add a measurement error of 5% of true function value (Eq. 22). The objective of this experiment is to evaluate the proposed algorithm when measurement errors are present while calibrating perfect and imperfect models. Figure 3 shows the posterior outputs of the calibrated cubic, quadratic and linear models. For the cubic models case, both the standard ES-MDA and the Flexible ES-MDA algorithms managed to match the structural feature of the data as shown in Fig. 3a and b. However, the proposed Flexible ES-MDA algorithm performs significantly better in terms of capturing the noisy features of the data (Fig. 3b). This effect shows that the split formulation of proposed algorithm act as a adaptive regularizer for the perfect model scenario. Similar results (w.r.t negligible measurement noise case) are observed for both the quadratic and the linear models in the presence of both measurement and model errors (Fig. 3c, d, e     Fig. 4a, b and c except for coefficient of x, (i.e., λ 1 ) of the quadratic model. We use the prediction interval coverage probability (PICP) of posterior distributions as one of the calibration metric. PICP is estimated by counting the number of observations in the confidence intervals (10% to 99%) of the posterior distribution normalized by the total number of observations [29]. PICP is a very useful metric for quantifying both under-estimation and/or over-estimation of uncertainties in the obtained posterior distributions [29]. The mathematical description of PICP is shown in Appendix 1. PICP values close to 45 • line (dashed red line in Fig. 5) indicates a perfect posterior distribution. Figure 5 shows the PICP of the prior and posterior distribution of the linear, quadratic and cubic models outputs. We observe that the posterior distribution of cubic (perfect) model output obtained by the standard ES-MDA underestimates the uncertainty. However, the proposed Flexible ES-MDA shows more robust uncertainty quantification as shown in Fig. 5a. This effect is due to the efficient coverage of the noisy features of the observation data by the proposed algorithm (Fig. 3b). PICP of posterior distribution obtained from ES-MDA for the imperfect linear and quadratic models show severe under-estimation of uncertainties ( Fig. 5b and c) as the model error uncertainties are missing in the formulation. Further, we observe a more robust PICP values obtained from the proposed algorithm for calibrating the imperfect linear and quadratic models as shown in Fig. 5b and c.
Mean continuous ranked probability score (CRPS) is one of the most useful calibration and prediction metrics to evaluate the precision and accuracy of the posterior ensemble. Less accurate or poor quality results are indicated by the higher values of CRPS. The complete detail of CRPS for ensemble prediction system can be found in [11]. The mathematical descriptions of mean CRPS along with  Mean square error (MSE) are presented in Appendix 1. Figure 6 shows the comparison of mean CRPS and MSE of the posterior ensemble of models outputs obtained by the ES-MDA and the proposed algorithm. Calibration results from the proposed algorithm show lower CRPS values for linear, quadratic and cubic models (Fig. 6a), which indicates more reliable results in terms of precision and accuracy. In Fig. 6b, MSE shows more wide distribution of posterior ensemble due to the coverage of unknown model error uncertainty using Flexible ES-MDA; however, lower values of MSE is also observed using the proposed algorithm due to significant reduction in model bias.

Test case 2: the simple machine
In this test case, the simple machine model is calibrated with the observation data obtained from the unknown complex machine. The main objective is to test the flexibility of proposed algorithm for the cases, where the model lacks some physics. The imperfect model for simple machine is shown in Eq. 26.
where y is the work obtained from machine, x is the effort on machine and θ is the efficiency of machine. True complex machine T (x) includes the friction effect, which is unknown. The observed data d obs is generated using the function of true complex machine using: where a = 20 and d is the measurement error, which is taken as 5% of true function. In this test case, the task is to estimate the efficiency of the machine θ and the corresponding output of the calibrated model. The prior distribution for the model parameters is assumed to follow a standard normal distribution, i.e., mean μ = 0 and standard deviation σ = 1. The domain of effort variable x lies within the interval [0, 6]. Total number of points are 61, i.e., x contains values from 0 to 6 with the difference of 0.1 between two consecutive points, where 40% are used for calibration/parameter estimation purpose and 60% are used for predictions. In this test case, an ensemble of 100 members is used with 8 number of iterations for the calibration of simple machine model. Figure 7 shows the posterior distribution of efficiency (parameter) of the machine and model output obtained by ES-MDA and Flexible ES-MDA algorithms. Both algorithms match the historical/training data; however, predictions obtained from the calibrated model using ES-MDA are overconfident, inaccurate and generally unreliable (Fig. 7a). This is due to the biased posterior distribution of efficiency (parameter) of machine as shown in Fig. 7d. This bias in the estimated efficiency parameter of the machine is reduced significantly and posterior distribution covers the true efficiency of the machine using proposed Flexible ES-MDA (Fig. 7d). Due to this effect more reliable predictions are obtained from the calibrated simple machine model using the proposed Flexible ES-MDA algorithm (Fig. 7b). Figure 8 shows the PICP of training data and predictions. The proposed algorithm shows more robust uncertainty quantification of training data as shown in Fig. 8a (Fig. 8b), which shows the increase in predictions reliability from calibrated imperfect models (which lacks some physics). Similar results are observed in terms of mean CRPS and MSE metrics for both historical/training and prediction data as shown in Fig. 9.

Test case 3: the imperfect reservoir model
This test case is related to the calibration of imperfect model of subsurface oil reservoir with channelized geological patterns of log-permeability field. The subsurface reservoir has dimensions of 7500 ft × 7500 ft × 20 ft in the x, y and z directions, respectively. The reservoir contains oil and water phases with in-compressible flow in porous media. The reservoir has uniform porosity of 20% and the initial reservoir pressure is 5000 psi. Figure 10 shows the true model, which consists of 75 × 75 grid blocks, along with different wells open/shut schedule (Fig. 10b) and controls (Fig. 10c). The true model is used to generate the observed data and the observations are perturbed by an additive measurement noise, which is taken as 5% of true model response. The reservoir contains one injector well (I1) and three production wells (P1, P2, P3). The production wells are operated under constant bottom hole pressure constraint of 4500 psi and the injector well is operated with timevarying constraint of constant injection rate as shown in Fig. 10c. The relative permeability is represented by Corey's power law model, which is described along with parameter values and fluid properties in Appendix 2. The capillary pressure and gravitational effects are neglected. The reservoir is simulated using a 2-D grid with the MATLAB Reservoir Simulation Toolbox (MRST) [15]. Well P3 is used in the prediction period in order to assess the prediction capabilities of the calibrated reservoir model for future/new wells. The reference and prior fine scale channelized log-permeability fields are generated using a two facies training image as an input to a direct sampling version of MPS algorithm [19]. The calibrated reservoir model is an up-scaled version of true model with a size of 15 × 15 grid block with no parameterization of the geological features (parameters corresponds to grid block values). The up-scaled model (a.k.a. imperfect reservoir model) contains two sources of modeling errors, simplified geological representation and up-scaling errors. Figure 10a shows the log-permeability field of true model with the channelized features and Fig. 11a shows the corresponding up-scaled log-permeability field where the channelized features are blurred due to harmonic up-scaling of the grid properties. Except for permeability, all other inputs to the reservoir simulator take the same value for both the reference fine model and the up-scaled models used in history matching, for example relative permeability and porosity.
In this test case, an ensemble of 100 members (i.e., one hundred realizations of log-permeability field) is used with 8 number of iterations for the calibration of imperfect reservoir model. Figure 11 shows the mean of the posterior distribution of the log-permeability field obtained by standard ES-MDA and the proposed Flexible ES-MDA algorithms. For the standard ES-MDA, we observe overshooting (red and blue values) in the mean log-permeability field at a large number of grid blocks. In this case, ES-MDA aggressively tried to adjust the model parameters to match the observation while neglecting the model errors. For the Flexible ES-MDA, we obtained relatively smooth logpermeability fields as shown in Fig. 11c, as the algorithm   only updates the model parameters using a percentage of the data mismatch defined by the split parameter s p to account for unknown model error effects. Figure 12 shows the standard deviation of log-permeability field obtained using both the standard ES-MDA and the proposed Flexible ES-MDA algorithms. We observe relatively high values of standard deviation from Flexible ES-MDA (Fig. 12b) as compared to ES-MDA (Fig. 12a), due to the effect of additional uncertainty of unknown model error. Figure 13 shows the prior and posterior distribution of oil rates from the production wells. ES-MDA performs well in terms of matching the observations as shown in Fig. 13a; however, poor-quality predictions are observed for wells P2 and P3. For well P1, the predictions start to deviate after 4.2 years. These overconfident and inaccurate predictions are due to the over-shooting of log-permeability values around production wells (Fig. 11b). This is a common problem in the petroleum industry, where the historical data is usually matched and often the calibrated models suffers from severe predictability problem [2]. One of the primary reasons of this problem is failing to account for model error effects during the history matching/calibration process as evident in these results. In contrast, the proposed Flexible ES-MDA algorithm performs relatively better in terms of predictions quality for production wells P1, P2 and P3 as shown in Fig. 13b. This could be easily attributed to the smooth mean log-permeability fields of posterior ensemble as shown in Fig. 11c.
Similar results are observed in Fig. 14 for the water rates of production wells and injector well I1. However, we observe that the historical data for well I1 is not fully covered by the posterior distribution obtained by  Fig. 14b. This is due to the bad prior distribution (shown by brown lines) of the injection pressure for well I1. We note that this low quality prior is a consequence of modeling errors effect (up-scaling) and this can be avoided by using model/grid refinement around wells that are difficult to match. Figure 15 shows the PICP of parameter estimation, historical data and predictions. The Flexible ES-MDA shows very robust uncertainty estimates for the model parameters (log-permeability field) as the PICP values lie close to the reference line as shown in Fig. 15a. Additionally, the proposed algorithm shows improved uncertainty estimate of historical production data as shown in Fig. 15b, despite failing to match the historical data for well I1. The prediction PICP is also improved by the Flexible ES-MDA (Fig. 15c), which shows an increase in predictions reliability from calibrated imperfect reservoir models. Figure 16 shows the mean CRPS and MSE metrics for both historical and prediction data. We observe that mean CRPS and MSE of the historical data show lower values for ES-MDA due to relatively better matching and higher values for Flexible ES-MDA due to the uncovered historical data of well I1. However, the Flexible ES-MDA shows significant improvement for mean CRPS and MSE of prediction data. The good matching using ES-MDA can mislead us to overconfident and inaccurate predictions in case of imperfect models and this over-confidence in inaccurate predictions could be avoided using the proposed where CP is the coverage probability of the 99.  Figure 17 shows the posterior results of log-permeability field using the proposed Flexible ES-MDA with adaptive adjustment of the split parameter with respect to coverage probability. We note that the quality of the mean logpermeability field is slightly decreased (Fig. 17a) as compared to earlier results shown in Fig. 11c without any adaptive adjustment of the split parameter. Figure 18 shows the calibration and prediction results of well outputs obtained from Flexible ES-MDA with adaptive adjustment of the split parameter. In this figure, we only present the results that are affected by the adaptive adjustment of the split parameter. We observe a better data match of the historical data for well I1 (Fig. 18) due to the adaptive adjustment of the split parameter in Flexible ES-MDA as compared to without any adjustment of split parameter Oil prod rate (STB/day) and 99% confidence interval of prior distribution respectively. Solid blue lines and gray shaded area show p50 and 99% confidence interval of posterior distribution (Fig. 14b). In addition, we note that the confidence intervals of production wells P2 and P3 are reduced (Fig. 18) due to the relatively lower values of the standard deviation of the posterior ensemble (Fig. 17b) obtained from Flexible ES-MDA with adaptive split parameter as compared to Fig. 12b. Figure 19 shows the PICP of model parameters, historical data and predictions. We observe that PICP of model parameters is slightly decreased as compared to the results in Fig. 15a. This is due to the adjustment of split parameter in order to match bad prior time series of well I1. However, the PICP of historical data and predictions are improved as compared to Fig. 15b and c using Flexible ES-MDA. This effect shows some trade-off between quality of estimated parameters and corresponding outputs (specially bad prior) of imperfect reservoir model using   Injector BHP (psi) 10 4 Fig. 18 Calibration and prediction results of production and injection data using Flexible ES-MDA with adaptive adjustment of split parameter. The descriptions of lines and colors are same as in Fig. 13 adaptive adjustment of the split parameter in the proposed algorithm. Figure 20 shows the mean CRPS and MSE metrics for both historical and prediction data with adaptive adjustment of split parameter. We observe that mean CRPS and MSE of the historical and prediction data are improved using Flexible ES-MDA as compared to Fig. 16 due to matching of historical data of well I1 and reduction in confidence intervals of water production of well P2 and oil production of well P3 by adaptive adjustment of split parameter. These results show that the adaptive adjustment of split parameter in Flexible ES-MDA can be used as an alternative option if model refinement is not feasible and output time series cannot be matched due to bad prior.

Conclusions
In this paper, a flexible algorithm is proposed for calibration of perfect and imperfect models. This flexible algorithm builds on the ensemble smoother with multiple data assimilation, which has the assumption that models are perfect, i.e., accurate representation of the real systems. However, it is widely known that mathematical models are approximations of the real systems and some inherent errors always exist in computational models. Neglecting the model error during calibration causes bias in the estimated physical parameter and often results in unreliable predictions. In the proposed algorithm, the residual (data mismatch) is split into two parts. One part is used for parameter estimation/update and the second part is used to represent the model error. The initial split parameter is computed based on the ratio of norm of mean residual (mean deviation from observed data) and norm of maximum residual (maximum absolute deviation from observed data). During calibration (data assimilation) this split parameter is updated based on the ratio of norm of mean residuals obtained in current and previous iterations. In the proposed algorithm, we assumed that the data misfit/residual can be split into a parameter error and a model error according to the ratio of mean deviations between iterations. This split formulation shows very close correspondence to the best split value computed from true model error (see Test case 1).
In this work, three test cases have been used to observe the performance of the proposed algorithm. These test cases are related to the calibration of polynomial functions, simple machine and imperfect reservoir model.
In the first test case, cubic function is considered as perfect model and imperfect models are represented by quadratic and linear functions. We observe that if the model is perfect, the proposed algorithm exactly match the data and for imperfect models the algorithm has the flexibility to capture the unknown model error, which is very useful to avoid over-confidence and generally inaccurate predictions. The second test case is related to estimation the efficiency parameter of a simple machine model which lacks physics in terms of friction component. The calibration result from proposed algorithm shows more reliable estimation of the efficiency of the machine which provides significant improvement in prediction. The third test case is related to calibration of imperfect reservoir model, which has blurred channelized geological patterns. The imperfect reservoir model has two sources of modeling errors, simplified geological representation and up-scaling errors. We note that the proposed algorithm reduces the model bias in parameter estimation and as a consequence predictions are improved significantly from the calibrated imperfect models. We observe improvements in PICP, mean CRPS and MSE metrics after evaluating the parameter estimation, calibration and prediction performance of the proposed algorithm for three test cases of different physical systems. However, the well output which shows bad prior due to the model error effect can be difficult to match using the proposed algorithm. This can be avoided by adaptive adjustment of the split parameter in the proposed algorithm with some trade-off between quality of estimated parameters and corresponding model outputs.
We strongly recommend model improvement in terms of physics, assumptions, details and description, if a large magnitude of model error is indicated by the proposed algorithm. However, we argue that the proposed algorithm provides good indicators about the reliability of the calibrated models especially for the cases with unknown model error. Sometimes for effective decisionmaking we need to reduce the uncertainty to threshold value (i.e., minimum value which effects decision-making process). If the model error uncertainty is greater than the threshold value, we strongly recommend to improve the model in order to reduce model error uncertainty. In cases where model error uncertainty is less than acceptable threshold value and there is a time-constraint, the proposed algorithm provides a robust method for calibrating imperfect models that can be used for decision support. In future, we would try to address different types of uncertain parameters and simulation models with different forms of model errors in terms of complex physics, structural geology, fluid dynamics and phase behavior.
3 ) The notations of above equations are described in MRST manual [15]. The fluid data and Corey relative permeability model parameters used in the reservoir model are shown in Table 1.