Fast robust optimization using bias correction applied to the mean model

Ensemble methods are remarkably powerful for quantifying geological uncertainty. However, the use of the ensemble of reservoir models for robust optimization (RO) can be computationally demanding. The straightforward computation of the expected net present value (NPV) requires many expensive simulations. To reduce the computational burden without sacrificing accuracy, we present a fast and effective approach that requires only simulation of the mean reservoir model with a bias correction factor. Information from distinct controls and model realizations can be used to estimate bias for different controls. The effectiveness of various bias-correction methods and a linear or quadratic approximation is illustrated by two applications: flow optimization in a one-dimensional model and the drilling-order problem in a synthetic field model. The results show that the approximation of the expected NPV from the mean model is significantly improved by estimating the bias correction factor, and that RO with mean model bias correction is superior to both RO performed using a Taylor series representation of uncertainty and deterministic optimization from a single realization. Use of the bias-corrected mean model to account for model uncertainty allows the application of fairly general optimization methods. In this paper, we apply a nonparametric online learning methodology (learned heuristic search) for efficiently computing an optimal or near-optimal robust drilling sequence on the REEK Field example. This methodology can be used either to optimize a complete drilling sequence or to optimize only the first few wells at a reduced cost by limiting the search depths.


Introduction
In the development of a hydrocarbon field, optimization is an important process that can substantially improve the profitability through reduction in production and development costs or by increasing or accelerating the production of hydrocarbons. The increase in profitability of a field for a given change in controls is predicted using a reservoir simulation model, in which the geological uncertainties such posed by the high nonlinearity in the reservoir model; consequently, RO performed in a simplified model is likely to yield a suboptimal solution [11]. Techniques for reducing the number of simulations needed for optimization of the expected value of an objective function can be further classified into two main categories: improving the optimization algorithms for certain problems and reducing the cost of evaluating the expected value.
Ensemble-based optimization [6,14] is an efficient gradient-based approach for fairly general RO problems with continuous variables. The cost of evaluating the gradient of the expected value of the objective function is reduced by using the same ensemble to represent the reservoir model uncertainty and to compute the gradient of the expectation of the objective function with respect to control variable settings. One downside to a gradientbased approach is that it is easily possible to get stuck in a local minimum (or maximum). For optimization problems with discrete control variables, Wang and Oliver [40] proposed a nonparametric online learning methodology with heuristic controls to efficiently solve sequential decision-making problems. This approach can reduce the number of iterations required during the optimization process. Nevertheless, the amount of computation required for RO performed using sample average approximation (SAA) over a large ensemble could still be expensive since the cost increases linearly with the number of realizations required to represent the uncertainty in model properties.
There are several possible ways to compute the expected value at a lower cost than SAA. One is to reduce the representation of uncertainty through model selection, in which a subset of the ensemble members are used for optimization [22,32,37]. A small subset of model realizations may not span the uncertainties accurately, leading to a sub-optimal solution. To address this issue, Barros et al. [2] proposed an automated scenario reduction approach for selecting a subset that contains an optimal number of realizations that are able to capture the range of the uncertainties of the full ensemble. A much faster approach is to evaluate the objective function in the mean reservoir model. Chen et al. [7] obtained relatively good results by using the ensemble mean model updated from the ensemble Kalman filter (EnKF) data assimilation for production optimization. Their optimized design increased the expected NPV compared to the reference case, but was not as good as the optimized results obtained using SAA. One major drawback to direct use of the mean model is the large bias in predicted reservoir behavior that may result from the use of a model with reduced heterogeneity, in which case the mean model may offer a poor estimate of the expected NPV. To improve this approximation, one possible approach is to modify the representation of uncertainty using a Taylor series expansion of the objective function [1,3,[8][9][10]. The first term of Taylor expansion can be inexpensively obtained using the mean of reservoir model realizations. To accurately estimate the expected value, the Taylor series expansion generally includes higher-order terms (e.g., a quadratic or cubic term). The computation of higher-order derivatives is impractical for most real reservoir applications, however.
Instead of using higher-order terms to improve the estimation accuracy, we develop a fast and efficient approach to correcting the bias in the NPV obtained from the mean reservoir model by estimating a multiplicative bias correction factor based on the information from distinct controls and individual model realizations. To accurately estimate the expected NPV, we apply distancebased localization to estimate the bias for specific controls, considering the similarity between samples and control variables in terms of the bias correction factor. To avoid poor estimates caused by sampling error due to small sample sizes, a regularization term based on the average value and the variance of the bias correction factor is used to reduce the sensitivity of the estimates to the taper length, thereby allowing more accurate estimates to be generated for a wider range of taper lengths. The initial sampling of control and model realizations is used to create an ensemble of partial corrections factors. During the RO phase, when it is necessary to estimate the expected value of NPV for a control, we only require simulation of the control applied to the mean model. The bias correction factor is estimated from partial bias correction factors on similar controls. Hence, after creation of the initial ensemble, RO performed using bias-correction methods requires only one additional simulation of the mean reservoir model at each iteration, which is much less than the effort required in SAA. This bias-correction methodology can be applied to fairly general problems of optimizing the expected value of an objective function. But an appropriate distance metric to measure the similarity of controls in terms of the bias correction factor is required, which is specific to the problem at hand.
The performance of various bias-correction methods and a linear or quadratic approximation is investigated in two applications. The objective in the first example is to locate an injection well such that flow rate for a fixed pressure in a one-dimensional model is minimized; this problem is small enough that we can evaluate both linear and quadratic Taylor approximations of the objective function. The second example is to maximize the expected NPV in a synthetic field model by optimizing the drilling sequence of wells at fixed locations. Here we use wellposition based distance to measure the similarity of drilling sequences for the bias correction factor. We formulate the sequential drilling optimization problem as one of finding a path with the maximum reward in a decision tree, and apply learned heuristic search [40] with mean model bias correction (MMBC) to compute the RO drilling sequence under geological uncertainty and to optimize only the first few wells at a reduced cost by limiting the search depths. This paper is organized as follows. Section 2 describes the bias-correction methods for estimating the expected NPV and the learned heuristic search method for optimizing either a complete drilling sequence or only the first few wells. Sections 3 and 4 describe two numerical case studies (i.e., flow optimization in a one-dimensional model and drilling-order problem in a synthetic model). The conclusions are presented in Section 5.

Estimation of expected value
Correction factor In this work, we use maximization of the NPV as the objective for an optimal reservoir management problem. Because reservoir characterization is always incomplete, the optimization problem is based on a reservoir model with uncertainty in parameter values. An appropriate approach to account for the uncertainty is to use the expected NPV over an ensemble of reservoir models that have been sampled from the probability distribution for model parameters. The expected value of the objective can be written as: where x is a vector of control inputs; m ∈ R m is an m-dimensional vector of uncertain model parameters; j is the index of individual realizations; and N e indicates the number of reservoir models. This ensemble-based average value can provide a good approximation of the expected NPV if the ensemble of model realizations is sufficiently large [23,25]. RO performed using such straightforward estimation of the expected NPV, however, requires many expensive simulations when the ensemble size N e or the number of iterations needed for the optimization process is large. Instead of computing the expectation of the objective function by using SAA, one could consider optimizing the expectation for a linear or quadratic approximation of the objective function [1,3,9,10]. To second order, the Taylor expansion of the objective function is: where we have neglected higher-order terms in the expansion. f m and f mm are first and second derivatives of f with respect to uncertain model parameter m, respectively. If m is distributed as multivariate Gaussian with meanm and covariance C, then the expected value of the quadratic approximation of the objective can be shown [31] to be: A possible advantage of this approach to approximating the expected value of the objective function is that optimization of the expectation does not require evaluation of controls applied to a large number of Monte Carlo samples [1].
Computing f (x,m) in Eq. 3 will be easy as it only requires the mean of the realizations. Computing f mm (x,m) is more difficult as it requires the second derivative of the objective function with respect to model parameters. Although it might be possible in some cases to approximate the second derivative, computing second derivatives will be impractical for most reservoir applications. The linear approximation seems more likely to be feasible in practice: Assuming that m is distributed as Gaussian, the expectation over m of the linear approximation to the objective is simply a function ofm and x: The key point of this is that the mean of the ensemble can be used for the optimization, instead of performing the optimization on the ensemble of realizations.
Such an approach has a major advantage over SAA in that it significantly reduces the number of simulations required for evaluating the expected value. However, since the NPV is generally not a linear function of the uncertain model parameters m, NPV from the mean model is generally not the same as the expected value over uncertainty space, If we want to use the mean model for optimization problems without sacrificing the accuracy of the estimated expected NPV, we need a method for improving the approximation of expected NPV from the mean model.
In this paper, we compute a multiplicative correction factor between the ensemble average of valuesf (x i ) and the value obtained from the mean model f (x i ,m) at a fixed control x i : where α(x i ) is the correction factor for a fixed control x i . Instead of directly computing the ensemble average of values of the objective function, we develop an approximation of expected valuef (x) by estimating the correction factor α(x i ) of control x i . If it were feasible to compute the value of the NPV at control x i for all samples of model parameters, then α(x i ) could be computed in the following way: where f (x i , m j ) is the economic value at control x i of an individual realization m j . For each individual model realization m j and control x i , we define a partial correction factor β(x i , m j ,m): The correction factor α(x i ) at control x i of all ensemble realizations (7) can be written in terms of the partial correction factors: We also define the mean value of the correction factor α(x i ): where N x is the number of relevant controls. Straightforward application of Eq. 10 requires N x × (N e + 1) evaluations of f (x, m) to compute the average correction factorᾱ from N x different controls on an ensemble of N e model realizations.
A Monte Carlo estimate ofᾱ can be obtained at a much lower cost by sampling control x j uniformly from the space of all possible controls, and sampling reservoir realizations m k from the space of conditional realizations. Thenᾱ can be estimated using the following formula: where β(x j , m j ,m) is the observed value at control x j , which requires two simulations, i.e., apply control x j to a random individual realization m j and apply control x j to the mean reservoir modelm. Therefore, it would require 2 × N x simulations to obtain a set of observations β from N x distinct controls.
For N e reservoir model realizations, we can sample N e distinct controls to obtain observations of β so that each realization will provide one observed value of β. In that case, the number of observed values of β is the same as the ensemble size (i.e., N x = N e ). In some cases, more observations might be needed to obtain a good approximate ofᾱ (e.g., obtain two observations of β from each realization (i.e., N x = 2N e ) or use a larger ensemble size). If the value of β is similar for most realizations, we can use information from a subset of the ensemble members and a smaller number of distinct controls (i.e., N x < N e ) to estimate the bias.
Although an estimate ofᾱ can be efficiently obtained from the observed values of β, the accuracy of an estimate of the expected NPV obtained usingᾱ for correction is limited by the variability in α. Use of the bias correction factor obtained by averaging samples of β (11) will result in the same correction factor being used for all controls, even though the correct values of α for some of the controls may be far fromᾱ. In such a case, the accuracy level of the estimates from different controls is limited by the actual α value. Nevertheless, if the variability in α(x i ) is small as control x i is varied, then it is possible thatᾱ can provide a useful approximation to α(x i ) for estimation of the value of f (x i ) from the value of f (x i ,m).

Distance-based localization
In general, however, we expect that an estimate of the correction factor will be better if it is primarily based on information from similar control variables. Thus, we expect that a weighted estimate will be better than an unweighted estimate. In our work, we use weighted linear estimation. Suppose that N e distinct controls are applied to N e individual realizations and the mean model for generating a set of observations β. The weighted linear estimateα(x i ) at control x i is defined as: where β(x j , m j ,m) is the observed correction factor at a random control x j applied to an individual realization m j and the mean modelm, and ω(x i , x j ) is the weight for β(x j , m j ,m). The weights, ω(x i , x j ), should depend on a measure of similarity, or distance measure, between controls x i and x j .
With an appropriate measure of distance between control sequences, weights are assigned such that β(x j , m j ,m) at shorter distances will have higher weights while partial correction factors for controls that are more dissimilar will have smaller weights. Lacking information about the correlation of β with distance between control sequences, we use the Gaspari-Cohn taper function [16] to compute distance-dependent weights: where ρ (δ, L) is the distance-based weight and varies from 0 and 1; δ is the distance between controls; L is the taper length determining the distance at which the weighting drops to approximately 0.2, and 2L is the critical distance, beyond which the weighting is zero. For the drilling-order problem, the control variables x have no physical locations but are permutations of sequences of possible actions, in which case an orderbased encoding is appropriate. We have chosen to use the permutation encoding [34] of the drilling sequence. In this encoding, each integer value in the vector encodes the relative ordering of the drilling of a specific well. Consider, for example, two possible control sequences x i and x j in which four wells are drilled, i.e., and S x j = [W 3 , W 1 , W 4 , W 2 ]. The permutation encodings for these two sequences are P x i = [1, 2, 3, 4] T and P x j = [2, 4, 1, 3] T , respectively.
Distance between two control sequences is then measured by the distance between the vectors P x i and P x j . Appropriate distance measures for ordering problems include the "edit" distance [30], which is the minimum number of operations required to transform one sequence to another sequence, and the "swap" or Jaro-Winkler distance [21], which counts the minimum number of swaps of two elements required to transform one sequence to another. Because computation of swap and edit distances are relatively expensive, it is common to use fitness-distance measures as surrogates for the permutation distances [35]. The Hamming distance [17] between two sequences of equal length is the number of positions at which the corresponding actions are different, i.e., the number of wells that have different positions in the drilling sequence. The Manhattan distance (also known as the "position-based distance") measures the sum of the absolute differences between positions of the elements. In terms of the permutation encoded vectors, the Manhattan distance is: where the kth elements in P x i and P x j are P x i ,k and P x j ,k , respectively, which are the positions of a fixed well W k . N w is the number of wells. In addition to the Hamming and Manhattan distance measures for permutation encodings, we initially considered the use of two standard distance metrics on vector spaces. The Euclidean distance is defined as: (15) and the cosine distance [26,36] is a correlation-based distance measure defined as: When the lengths of all sequences are identical, as they are when the sequences are all perturbations of a base sequence, then the cosine distance is simply a scaled version of the Euclidean distance: consequently, there is no need to consider both the Euclidean and cosine distance measures. The choice of an appropriate distance measure to use as a measure of similarity is problem specific. In the application section, we show that the Manhattan, Euclidean, and cosine distance metrics are all very similar when applied to the well ordering problem. They are all superior to the Hamming distance in explaining similarity of drilling-order sequences in terms of the bias correction factor.

Taper window selection
The performance of localization for estimation of correction factor depends not only on the choice of distance measure but also on the taper parameter L which affects the weights and the effective sample size [24] used for computation of correction factor. A good distance measure will effectively identify control variables with similar correction factors so that the number of realizations used for estimation is maximized and sampling error is reduced.
Suppose that N e is the number of observed values of β, then: is a common approximation of the effective sample size [12,33]. In this equation, ω j is the weight on the j th partial correction factor. If all weights are identical, then the effective sample size is equal to N e , while if one of the normalized weights is equal to one and all others are zero, the effective sample size is 1. n eff is a measure of the equivalent number of equally weighted samples. The weights are determined by the distance between the estimation location and the j th drilling sequence. The accuracy of the estimate of the correction factor for control variable sequence, x i , is influenced both by the effective sample size, n eff , and the bias resulting from the use of partial correction factors based on random control variables with different values of α. Reducing the taper length will decrease the bias by only including values from control variable sequences with very similar values of the correction factor, but will also increase the sampling error by decreasing the effective sample size. Because the optimal taper length is not known a priori, we generally apply regularization to reduce the effect of a non-optimal choice of taper length.

Regularization
The major disadvantage of pure distancebased localization is that use of a taper length that is smaller than optimal one may result in an estimate of α that is far from the correct value due to sampling error resulting from the small number of samples within a small distance of the estimation point. Instead of using a long taper length to avoid such a situation, it is generally possible to improve the accuracy of estimated α value by adding a regularization term based on the average value and the variance of correction factor to reduce the sensitivity of the estimate to taper length while still generating accurate estimates.
A regularized estimate of α(x i ) is obtained by minimizing an objective function with both a local and a global term: In Eq. 19, σ 2 α is an estimate of the variance of α over the domain of interest, σ 2 β is the variance of β, and n eff is the effective sample size for the observations of β (18). The regularized estimate,α r (x i ), is obtained by solving ∇ α S = 0, obtaining: Note that the regularized objective function is a weighted average of the localized estimateα loc (x i ) from Eq. 12 and the mean value of α. When the effective sample size is large compared to the ratio σ 2 β /σ 2 α , the regularized estimate will be based primarily on the local samples of β.
Estimation with regularized localization has a major advantage over an approach that relies only on localized estimation: by improving the accuracy of the estimated values that are obtained with an inappropriate distance measure or taper length, regularized estimateα r is potentially more accurate thanᾱ for a wider range of taper lengths. When the variance of α is unknown, it might be difficult to select the optimal value, but as shown in experiments, results are not strongly sensitive to the exact choice.
Optimal weights Here we show how the optimal weights can be estimated if the covariance of partial correction factors is known. In that case, an estimate of α at a fixed control x 0 is calculated based on a linear combination of observations β ij with weights w i . In vector notation, the estimate is written as: where elements in vector b are the observed values of β jk from random controls and realizations. The collection of observations will be denoted by the vector b, i.e.: b = β 11 β 22 · · · β NN T .
Although the notation is different, Eq. 21 is identical to Eq. 12. The quantity α(x 0 ), that is to be estimated, is defined to be the linear combination of β 0j at the estimation location: where the jth element of b 0 is β 0j at a fixed control x 0 of an individual realization m j that is sampled from the probability distribution for model parameters, i.e.: Imposing the constraint w T 1 = 1 provides an unbiased estimate for which the expected error is 0. The optimal weights for estimating α(x i ) from a set of random observations β are obtained by minimizing the expected variance of the estimate, constrained to the unbiasedness condition. For estimation of α(x 0 ), the variance of the expected error is: To minimize the variance in estimation error, subject to the constraint that w T 1 = 1, we define a Lagrangian function: where λ is the Lagrangian parameter and S w (w) is the variance of the estimator error (23). The optimal weights are then obtained by solving for w that minimizes S(w, λ): Straightforward computation shows that: and Then the weights w can be found from the following systems of linear equations for ∇ w,λ S = 0, which can be written in matrix form: where cov(b, b) denotes the covariance of the variables β in b, and each element in cov(b, b 0 ) is the covariance function between the corresponding observed values of β in b and b 0 (Appendices 1 and 2).

Learned heuristic search
Heuristic function Heuristic search [19] is an efficient approach for solving sequential decision-making problems by repeatedly expanding the partial path with the largest estimated value until a complete path for which the true objective-function value is higher than the estimated values of all evaluated partial paths is found. The estimated value of a partial path could be obtained using an evaluation function f (n s ), which estimates the value of objectivefunction for the optimal complete path constrained to the previous actions. This estimated objective-function value consists of two elements: where g(n s ) is the true reward from the initial state to a specific state n s through a set of selected actions, and h(n s ) is a heuristic function that estimates the maximum reward from current state n s to a goal state. For drilling-order problem, the objective of robust optimization is maximization of expected NPV over uncertainty by optimizing the drilling sequence of wells. The expected NPV computed using mean model bias correction (MMBC) can be mathematically represented as: where qm o,j , qm w,j , and qm wi,j denote the rates of produced oil, produced water, and injected water, respectively, from the mean model in m 3 /day; r o , r w , and r wi are the oil price, water production cost, and water injection cost, respectively; T represents the number of time steps; t j is the cumulative time in days up to time step j ; Δt j is the time interval in days; b is the discount rate for a certain reference time τ (365 days); N w is the total number of drilling wells; W n denotes the cost of drilling the nth well; t n is the cumulative time in days up to the open time for each well; α(x i ) is the bias correction factor for the corresponding control x i . The cost of finding a strong and admissible heuristic for drilling-problem can be prohibitive since the evaluation of heuristic function requires simulations. In this paper, we use a heuristic function in which all remaining wells are drilled simultaneously at the next step [28,40], as this estimate can be obtained inexpensively and generally provides an overestimate of the NPV. Such a heuristic is guaranteed to find the true optimal drilling order. However, it might lead to an exhaustive search due to large estimated values.

Online learning techniques
To efficiently find a solution that is optimal or near optimal, the evaluation function f (n) in Eq. 27 should be close to the true maximum value f * (n). It is difficult to design a heuristic function that is accurate in all situations, but a crude heuristic function can be improved by online-learning techniques, i.e., estimate the error of the initial approximate value by learning the observations from previous decision steps. From a set of available online-learning mechanisms Φ 1 , Φ 2 , Φ 3 , . . . , Φ n , the best-improved evaluation functionf Φ (n) with multiple online-learning techniques is defines as: which might not be the most accurate value, but it is more likely to overestimate the actual maximum value and guide the heuristic search close to the optimal solution.
Wang and Oliver [40] have proposed two possible online-learning techniques (i.e., single-step adjustment and multiple-time-periods learning) for improving the initial approximate values obtained from heuristic sequences (i.e., all the remaining wells are drilled simultaneously and opened at the next time step) by estimating a set of forecast errors of f (n) and h(n) for the remaining decision steps.
The single-step adjustment is defined as: where d(n s ) is the number of remaining actions at n s , and γ f n is the forecast error of f (n) in future decision step n, which is estimated by using the ratio γ f n s associated with f (n s ) and f (n s−1 ) and the mean single-step ratioμ n s of γ f n 1 , γ f n 2 , . . . , γ f n s along the current optimal path. Multiple-time-periods learning is calculated by correcting the heuristic values of various time periods simultaneously: To summarize, multi-learned heuristic search with space reduction (MLHS-SR) based on the economic indicator [15] and improved evaluation functionf Φ (n) (29) can be used to find a solution to an optimization problem faster without losing quality [40]. Figure 1 shows the flowchart of using MLHS-SR with the bias-corrected mean model for the drilling-order optimization problem under geological uncertainty. To estimate the bias correction factor α for different drilling sequences, we first sample N x distinct controls and apply them to individual model realizations to obtain the initial observations of β. This step requires 2 × N x simulations. Both bias-correction methods (i.e., improve the estimates of expected NPV) based on the observations from distinct controls and model realizations, and online learning techniques (i.e., improve the estimates of maximum expected NPV) based on the observations from previous drilling steps do not require any simulations. Hence, we only need to perform one additional simulation in the mean model at each iteration for evaluating the expected NPV of one specific control. The use of bias correction applied to the mean model allows the application of fairly general optimization methods. During the optimization process, it requires only simulations in the mean model for obtaining initial estimates of expected NPV. For an ensemble with N e realizations, we assume that the information from N e distinct controls is used to estimate the bias correction factor, and the optimization process requires N iter iterations (i.e., N iter different controls have to be evaluated to obtain the optimal solution). Then, the total number of simulations required in RO is N tot = 2N e + N iter , in which 2N e simulations are performed to obtain N e observed values of β and N iter simulations are  performed in the mean reservoir model to obtain the initial approximations of expected NP for N iter different controls. If N iter different controls were to be evaluated using SAA, the cost in RO would be N tot = N e × N iter , which is much more expensive and will increase linearly with the ensemble size N e and the number of iterations N iter .

Depth-limited search
The reservoir model will almost certainly be updated based on information obtained from drilling the first few wells. After model updating, the optimal order of the remaining wells will differ from the initial estimate of optimal order. It is therefore more important to correctly identify the first few wells to be drilled rather than provide an entire drilling sequence. One advantage of using learned heuristic search for the drillingorder problem is that a partial solution of the first wells can be obtained at a reduced cost by cutting off the search at a specified depth.
The most straightforward approach to partial sequence optimization is through the depth-limited search (DLS) [27], in which the learned heuristic search is terminated at a certain depth. To find the solution of the first N s wells, we prefer to terminate the search at the first-visited best partial path with N s + 1 selected wells, because online learning techniques with more observations along a longer path can further improve the approximations and potentially generate a better solution. Note that the last well along the first selected partial path of N s + 1 wells might not be the optimal well because the search might change direction after evaluating its extended paths.
A faster approach to finding the partial solution is to use iterative depth-limited search (IDLS). It works by iteratively optimizing the next well based on the first selected partial path with two more wells until a solution of N s + 1 selected wells is found. Because only the partial paths extended from previous decisions are considered, this approach can avoid evaluating unnecessary paths along other directions caused by underestimated values. Learned heuristic search with accurately estimated values generally will not change direction frequently, so that the optimized sequence of the first few wells with a limited depth is likely to be near the final optimized drilling sequence.

Case study 1: Flow optimization in one-dimensional model
The purpose of this simple example is to thoroughly investigate methods of efficient optimization on a flow problem for which the dependence of the objective function on the model parameters is highly nonlinear. The problem is chosen to be small enough that we can evaluate both linear and quadratic Taylor series approximations of the objective function, and can evaluate the correct optimal solution. The objective in this example is to locate an injection well, operating at fixed pressure in a one-dimensional flow domain with uncertain permeability, such that the total flow rate out of the reservoir through fixed pressure boundaries is minimized. The reservoir is discretized with 150 grid cells (i.e., 150 possible injector locations), each of length Δx, and cross-sectional area A. The permeability k i is distributed as log-gaussian. Pressures at both ends of the grid are fixed at 0. An injector is located at the interface between cells i = i w − 1 and i = i w , where the pressure is fixed at P = P w . Instead of permeability, we use the log-permeability θ i as a parameter in the problem, k i = exp(θ i ).
In this notation, the total flow rate is the sum of the flow to the left and flow to the right: where all variables are in consistent units. Because the permeability is uncertain, we minimize the expected value of flow rate q, by adjusting the location of the injection well, i w . To avoid a problem in which the optimal injector location is at the center of the reservoir because of symmetry, we modify the prior probability for θ by assuming observations of θ at four locations (x = 10, 30, 50, 70). The mean of the posteriori distribution for θ is shown as the orange curve in Fig. 2a, along with the true log-permeability (blue). Ten realizations of the logpermeability field from the posteriori distribution are shown in Fig. 2b. The most straightforward approach to approximating the expected value of the objective function is through the use of SAA or averaging of Monte Carlo samples from the posterior distribution for θ. We use SAA with 400 samples for each control location as a benchmark for other methods. Hence, the SAA method uses 150×400 function evaluations to generate the expected value of q(i w , θ) for optimization.
Taylor series expansions of the objective function provide much less expensive approximations of the expected value of q(i w , θ). For the linear approximation, the expected value of q(i w , θ) is approximated using the mean of the log-permeability field (5). The quadratic approximation is considerably more expensive as it requires computation of the second derivative of the objective function with respect to well location. For this 1D steady flow problem, that is still a manageable computation. Figure 3 compares the linear and quadratic Taylor series approximations of the expected value of flow rate to the sample average approximation.
(a) True log-permeability field and best approximation after observations (b) Samples of the log-permeability distribution after conditioning. Fig. 2 a, b Light blue curve shows the true θ that generated observations. Blue squares show observations of θ with noise added. Correlation range of prior distribution for log-permeability is 26 Although there is clearly a large bias in the values from both the linear and quadratic approximations, the shapes are quite similar to the SAA, and the location of the minimum for each curve is also approximately the same. For this particular problem (minimizing total flow rate), a uniform bias does not affect the optimization result.
To correct the nonuniform bias in α, we use a regularized localization approach (20) in which higher weights are given to samples that are closer to the control variable for which the bias correction is being estimated. In this optimization problem, the actual distance between well locations is an appropriate measure of similarity. The variability of values of β is quite large (Fig. 4a) because the rate is strongly affected by the occurrence of low permeability values between the injector and the boundary. The variability in α is much smaller than the variability in Fig. 3 The expected value of the total injection rate, E[q (x, θ], computed using the sample average approximation, and two levels of Taylor series approximation β (Fig. 4b). Unfortunately, while the variability in β can be estimated from samples, the variability in α will generally be unknown. Because it is computed by averaging over samples of β, however, we should expect it to generally be smaller than the variability in β, so that the ratio σ 2 α /σ 2 β which appears in Eq. 20 will generally be substantially smaller than 1.
In Figure 5, we compare the optimal well locations obtained using three different methods. In the SAA approach, we use N e samples of the permeability field for each well location between 35 and 115 to compute an approximation of the expected value of flow rate. In the linear approximation, we simply compute the flow rate using the mean of the log-permeability field, and for the bias-correction approach, we used regularized localization with γ = n effσ 2 α /σ 2 β and a taper length of 20 to obtain a biascorrection to the linear approximation of expected value of flow rate. Each experiment was repeated 100 times to reduce the effect of sampling error. For both ensemble sizes, the bias-correction approach give better results than the linear approximation, although the difference is more apparent at larger ensemble size because the spread in the results is reduced in that case.
The most important criterion for judging success of the methodology is the ability to actually minimize the flow rate. Figure 6 compares the distribution of total flow rates obtained from the optimal well locations applied to the truth case for each of the methods. The test was repeated 100 times so there are 100 different truth cases and 100 "optimal" locations for each method. Note that the bias-correction method is now clearly superior to the linear approximation, even in this example in which the optimization depends only on relative differences in the value of the objective function. Optimization of the well α for Fig. 4 a, b Variability of the partial bias correction β and the bias correction factor α for the 1D flow problem location based on single realizations from the posterior distribution for model parameters gives very poor results (Fig. 6b).

Reservoir model
In this example, we design a set of experiments to study the bias properties of the drilling-order problem applied to the synthetic REEK Field model [18,29,40], and the performances of various methods for estimating the bias for both complete and heuristic sequences by using the information from distinct controls and individual realizations. The objective of the robust optimization problem is maximization of the 10-year expected NPV with respect to the drilling schedule of wells. Learned heuristic search applied to the mean model with bias correction is used to optimize the drilling sequence under geological uncertainty. To illustrate the quality of the robust optimal The REEK model is a three-phase black-oil reservoir model with 40 × 64 × 14 grid cells, of which 34,770 are active cells. An ensemble of 100 geologically consistent model realizations is used to empirically represent uncertainty in the porosity field, permeability field, and fault transmissibility multipliers. Table 1 shows the reservoir properties and control variables in REEK field. For the test problem, we assume that eight vertical fully penetrating wells (5 producers and 3 injectors) with fixed locations need to be drilled sequentially, and the first well is drilled at the beginning of simulation. The assumed drilling period is 6 months for all wells and wells begin operating immediately after drilling. Figure 7 shows the well locations and initial oil saturation of one randomly chosen model realization. The injectors are positioned around the oil-water contact and controlled by a maximum injection rate of 10,000 m 3 /day and a maximum BHP of 320 bars. The producers are distributed throughout the oil-containing area and controlled by a maximum production rate of 6000 m 3 /day and a minimum BHP of 250 bars.

Estimation of correction factor
Properties of α and β We randomly select 5 different complete drilling sequences and apply them to the entire ensemble of realizations and the reservoir mean model to obtain the actual values of the multiplicative bias correction factor α (7) and their partial correction factor β of individual realizations (8). Figure 8 shows the observed values in α and β from 5 random complete drilling sequences. Symbols that are in the same color indicate the partial correction factor β(x i , m j ,m) at a fixed drilling sequence x i applied (a) Injection rates on 400 "true" permeability fields for three robust optimization methods.
(b) Injection rates for three robust optimization methods and a non-robust method. to different reservoir models and the mean model. The xaxis displays the index j of a single reservoir model m j . We observe that β(x i , m j ,m) at a fixed control changes significantly with geological uncertainty, but β(x i , m j ,m) at a fixed reservoir model changes slightly with different controls. The horizontal dashed lines represent the bias correction factors α i for the 5 drilling sequences, computed by averaging over samples of β. The variability in α i for these 5 drilling sequences is very small and the mean for each is close to 1.1.
To obtain more statistically reliable results, we sample 560 different drilling sequences for studying the bias properties. For 5 producers and 3 injectors, there are 56 possible combinations of drilling sequences based on the types of wells (e.g., [P, P, P, P, W, P, W, W], [P, P, W, P, P, P, W, W]). We randomly select 10 distinct drilling sequences from each combination and compute their actual values of α and β. Figure 9 shows the distributions of α and β values obtained from these 560 random complete drilling sequences. α has a value near 1.108 for most of the drilling sequences (Fig. 9a), indicating that SAA of NPV is almost 10% higher than the NPV computed using the mean model. To accurately estimate the expected NPV, it is necessary to correct the bias in the initial approximation that is obtained in the reservoir mean model. Out of the 560 drilling sequences, only 22 (approximately 4%) sequences have α values lower than 1.09, whereas β changes between 0.6 and 1.5 with a larger variability (Fig. 9b). Interestingly, almost all of the drilling sequences with atypical bias correction factor α are from two extreme drilling sequences, i.e., either all producers or all injectors are drilled first. The results of these two combinations are plotted in yellow. This finding was unexpected and suggests that to more accurately estimate the bias for general drilling sequences, we should avoid sampling controls from these two extreme combinations since they seem to provide less useful information and vice versa. Figure 10 compares the distributions of variance in β at a fixed reservoir model and a fixed control after eliminating two extreme combinations (i.e., [P, P, P, P, P, W, W, W] and [W, W, W, P, P, P, P, P]). As previously observed in Fig. 8, the variability of β among different drilling sequences at a fixed reservoir model is smaller than the variability of β among different reservoir models at a fixed drilling sequence. The average value of α can be estimated by averaging all observed values in β from distinct controls. In this example, the variance in α is only 0.000058 for general drilling sequences. The estimates of expected NPV with a bias correction factorᾱ could be accurate for most of the drilling sequences, but this will not always be the case in other problems. For the case of large variability in α, we should consider the similarity between the samples and control variables for estimating the bias for specific controls. In this case, the variability in β is much larger than  the variability in α for general complete drilling sequences, and the ratio σ 2 β /σ 2 α is almost 267. For a small number of samples, local estimatesα loc based on β (12) might be away from the actual bias correction factor. The error of local estimates caused by sampling error due to small sample size could be as large as 40%. Hence, it is critical for this problem to use a taper length with a relatively large effective sample size for pure distance-based localization or apply regularized localization to avoid generating estimates that are far away fromᾱ.
To measure the similarity of drilling sequences for the bias correction factor, we investigate the use of four different distance metrics (i.e., the Manhattan, Euclidean, cosine, and Hamming distances). Figure 11a shows the empirical variograms of α for these four distance metrics. We observe that all four variograms show an approximately linear increase with distance over most of their ranges.
Overall, each of these four distance metrics can measure the similarity of drilling sequences for expected NPV. For the purpose of identifying similar drilling sequences, a measure of distance is better than another if it correlates better over larger numbers of samples. Figure 11b shows the cumulative fraction of drilling sequences within an upper bound on the variogram of α. Among a set of random controls, very few of them will have small Hamming distances. We note that, compared with the other three distance metrics, the Hamming distance yields fewer similar drilling sequences at a fixed threshold variance of α. In other words, when a fixed fraction of samples is considered for local estimation, drilling sequences extracted by the Hamming distance will potentially have more significant variability in α. By contrast, the Manhattan, Euclidean, and cosine distance metrics have similar behaviors and all perform better than (a) Empirical variogram of the bias correction factor α as obtained using different distance metrics (b) Cumulative fraction of sequences within a certain upper bound on the variogram of α Fig. 11 a, b Performance comparison of four distance metrics applied in the drilling-order problem the Hamming distance for local estimation based on the observations from randomly selected drilling sequences. In our subsequent experiments, we use the Manhattan distance to measure the distance between two drilling sequences for estimating α values for specific controls.
Accuracy of estimates of α In this experiment, we compare various methods for estimating the bias correction factor α of different drilling sequences. We first sample a number (N x ) of distinct general drilling sequences to obtain a set of observations of β, which requires 2 × N x simulations.
To evaluate the quality of each of the three approaches (i.e., average bias correction factor, pure-distance localization, and regularized localization), we use each method to estimate α for 540 different drilling sequences. Estimation of α for various drilling sequences using these approaches does not require additional samples of β, but to estimate the expected NPV of a specific drilling sequence, we need to perform one additional simulation using the mean model to obtain the initial approximate value of expected NPV. In other words, the amount of computation required for computing the expected NPV of 540 sequences by using the information from 100 distinct controls and individual realizations is 2 × 100 + 540 = 740 simulations, which is much less than the cost for SAA (54,000 simulations). In this experiment, we use the accuracy of the estimated α as a criterion for judging the quality of the expected NPV based on the reservoir mean model with bias correction. Figure 12 shows RMSE of the estimates of the 540 values of α based on a set of 100 observed values in β. The black horizontal line indicates the result from a fixed estimated α obtained by averaging samples of β (11). Pure distance-based localization and regularized localization are applied with different taper lengths between 1 and 200 (for reference, the maximum Manhattan distance between complete sequences with eight wells is 32). We observe that purely local estimatesα loc are potentially more accurate than the estimates computed by averaging over samples of β when a taper length of L > 26 is used. In this case, the effective sample size is relatively large (n eff > 80), and the best value of the taper length in this test is approximately 34. When the taper length extends over 170, all samples are assigned high weights and the performance is close to the approximation ofᾱ. On the contrary, the quality of local estimates is very sensitive to the taper length because of sampling error when the taper length is small (e.g., n eff ≈ 10 at L = 15). The error of estimated values resulting from the small number of samples is significant.
Since our estimate of optimal weighting in regularized estimation of α depends on the value of σ 2 α , which we are unlikely to know accurately, we investigated the sensitivity of results to variation in the magnitude of the weighting term. Three different levels of the ratio of the variance of α and β (i.e., λ = 0.1 · σ 2 β /σ 2 α n eff , σ 2 β /σ 2 α n eff , and 10 · σ 2 β /σ 2 α n eff ) were evaluated. For λ = 10 · σ 2 β /σ 2 α n eff , the regularized estimatesα r are close toᾱ and the RMSE is similar. For λ = 0.1 · σ 2 β /σ 2 α n eff , the effect of regularization is small; consequently, the estimates at a small taper length are inaccurate due to sampling errors. However, the results show that λ = σ 2 β /σ 2 α n eff reduces the sensitivity of the estimates to the taper length, while still generating more accurate values even for cases with a small number of samples. As a result, the corresponding regularized estimates are potentially more accurate for a wider range of taper lengths.
To reduce the influence of sampling error on the conclusions, we repeat each experiment 100 times, each time using a fresh sample of controls to obtain observations of β. For every single test, we apply three different sample sizes (N x = 100, 200, and 400) to investigate the effect of Fig. 12 Comparison of the RMSEs of estimates of α obtained using different approaches based on 100 observed values of β the sample size on the accuracy of the estimated α and the best value of the taper length for distance-based localization. Figure 13 shows the RMSEs of α estimates obtained using different approaches with the three different sample sizes after repeating each test 100 times. In general, the estimates are improved by using more samples. The improvement, however, is small for estimates obtained without localization (horizontal dashed lines) as the estimate of the mean for β is already quite accurate for N x = 200.
Distance-based localization offers a further increase in accuracy, and the improvement is more significant with a larger sample size. In this example, we observe that the RMSE of the local estimates at the optimal taper length is close to that of the estimates computed from the optimal weights based on the covariance of the partial correction factors (26). For the drilling-order problem, the wellposition based distance distribution obtained from a fixed set of random controls changes slightly when the control variable for estimating the bias correction factor is varied. In such a case, the distributions of elements in covariance matrix cov(b, b 0 ) (42) in terms of the distance are stable. The variance of the correction factor increases almost linearly with distance, making the optimal weight based on the covariance of partial correction factors decreases with the distance of the samples. As a result, the optimal weights would be close to the local weights at the optimal taper length for the drilling-order problem.
As observed previously, regularization with λ = σ 2 β /σ 2 α n eff appears to always produce more accurate estimates of α than those obtained by averaging of all values of β. When the taper length is greater than a certain threshold, it seems that pure distance-based localization slightly outperforms regularized localization. The reason for this is that local estimates are potentially more accurate thanᾱ with large effective sample sizes; however, the regularization term reduces the effect of localization and results in estimates that are closer toᾱ. Both the variance of α and the best value of the taper length are generally unknown. Therefore, instead of using regularized localization based on an uncertain value of σ 2 α /σ 2 β , for this problem, we use pure distance-based localization with a relatively large effective sample size to estimate α values for different controls.
We investigated the effect of taper length on the error in the estimate of bias correction factor, α, for three different sample sizes. For each sample size, we repeated the experiment 100 times to obtain reliable estimates. The results show that the optimal taper length for local estimates of α decreases as the sample size increases (Fig. 13). As the optimal taper length is also subject to sampling error, we investigated the variability. Figure 14 compares the frequency distributions of the optimal taper lengths for each ensemble size. When only 100 samples are used to estimate α, the best value of taper length is shown to be highly sensitive to the initial samples (histogram in blue). In approximately 1/3 of the cases, the optimal taper length is larger than 200, which means that the average value of all β will provide a better estimate than a purely distance-based weighting with a small taper length. For some controls, the actual α values might be very close toᾱ, such that more samples will be required for local estimation to provide any further improvement in accuracy. When N x is increased from 100 to 400, the optimal taper length is less variable, varying between 25 and 30 in most cases (histogram in green). It seems that for cases with sample sizes larger than the ratio σ 2 β /σ 2 α , the best value of the taper length is likely to lie near a point where the value of σ 2 β /σ 2 α n eff is approximately 1. In that case, the variability in the local Fig. 13 RMSEs of estimates of α with three different sample sizes Fig. 14 Distribution of the optimal taper length for the pure distancebased localization estimates will be close to the variability in the actual α values. When the variance of α is unknown, it might be difficult to select the best value of the taper length. However, the results show that the local estimates are not highly sensitive to the taper length when the effective sample size is relatively large, in which case local estimates are still more accurate than estimates obtained using the average value of β.

Correction factors for heuristic sequences
During the heuristic search process, a set of heuristic sequences with different numbers of selected wells is used in combination with online learning techniques to estimate the maximum value among the possible complete drilling sequences constrained by previous wells. Hence, we need to compute the expected NPVs of some specific heuristic sequences to optimize either an entire drilling sequence or only the first few wells when applying a learned heuristic search. In this experiment, we study the bias properties of heuristic sequences and the possibility of using mean model bias correction to accurately estimate their expected NPVs. Figure 15a and b show the distributions of α and β from 550 random heuristic sequences, 400 of which have fewer than four selected wells (i.e., all possible heuristic sequences with N s ≤ 3 are considered). The histogram in orange represents the bias from the heuristic sequences in which either all producers or all injectors are drilled sequentially first. Figure 15c compares the distributions of the variance of β for a fixed reservoir model and a fixed heuristic sequence after the elimination of controls from those two extreme combinations. Relative to the observations for complete drilling sequences, the bias for heuristic sequences has some similar properties: (1) thē α for heuristic sequences is close to that for complete sequences, but the variability in α is smaller; (2) heuristic sequences in which either all producers or all injectors  (3) β values for heuristic and complete sequences have similar distributions, and the variability in β is much larger than the variability in α; and (4) the variance of β among different heuristic sequences for a fixed model is much smaller than that among different realizations for a fixed heuristic sequence. We observe that the variance of α for heuristic sequences is only 0.0000168. In this case, the expected NPV that is estimated from the mean model withᾱ will have a high level of accuracy. Distance-based localization might require a large number of samples to improve the estimates, and even then, the improvement will be small. Figure 16 shows the empirical variogram of α for heuristic sequences based on the Manhattan distance. The number of selected wells constrains the distance between heuristic sequences. For eight wells, the maximum Manhattan distance between heuristic sequences with fewer than four selected wells is only 12. For the sequences in which at least half of the wells are drilled simultaneously (N s ≤ 4), while the variance of α increases with distance, the correlation of α is still strong even for the sequences with the largest distance. For N s ≤ 5 and N s ≤ 6, however, the variance of α decreases beyond a certain distance. The main reason for this is that heuristic sequences with a significant gap in N s generally have a large distance, but they might also have similar values of α. If we sample heuristic sequences with a large N s (i.e., in which most of the wells are drilled sequentially) to estimate the bias for sequences with a small N s (i.e., in which most of the wells are drilled simultaneously), distance-based localization might not significantly improve the estimates even with a large number of samples, since most of the samples will be at long distances, but may still have similar α values. Therefore, to ensure the effectiveness of distancebased localization, we should use samples with a small gap in N s to estimate the bias for specific heuristic sequences with N s selected wells. Figure 17 shows the RMSEs of the estimated α values for heuristic sequences with different numbers of selected wells obtained by using the information of distinct heuristic controls. As in the previous section, to reduce the sampling error, we repeat each experiment 100 times. As expected, the results show that the average value of β is a good approximation of α for most heuristic sequences due to the small variability in α. The error in the estimates can be only slightly reduced from 0.0034 to 0.0032 by increasing the sample size from 100 to 400, but at the cost of 600 additional simulations. In this case, it is not necessary to use a larger sample size to improve the estimates. The estimated α based on 100 observed values of β is already very close to the actual α values, although this might not always be the case in other problems. The effect of distance-based localization is small even if we use N x = 400, which means that more samples would be needed for distance-based localization to improve the estimates further. As observed in the case of complete drilling sequences, the regularized estimates obtained from heuristic sequences also prove that a regularization term based onᾱ with a parameter of σ 2 β /σ 2 α n eff reduces the sensitivity of the estimates to the taper length, such that the estimates are potentially more accurate for a wider range of taper lengths. The results for both complete and heuristic sequences illustrate the effectiveness of using MMBC to estimate the expected NPV. Although the effect of localization on improving the estimates is relatively small in this example, this might not always be the case in other problems. For cases with considerable variability in α, the improvement in accuracy achieved by using distance-based localization with an appropriate taper length or regularized localization will be more significant.

Robust optimization under geological uncertainty
Optimization of complete sequences In this experiment, we apply learned heuristic search with MMBC to compute the RO drilling sequence under geological uncertainty. A set of 100 random heuristic drilling sequences with different numbers of selected wells is used to generate 100 observations of the partial correction factor β. This requires  200 simulations, as the NPV must be computed for the drilling sequence run using the mean model, and for a model realization. Because the variance of α is unknown, we apply pure distance-based localization to estimate the bias for different drilling sequences. To avoid inaccuracy in the local estimates caused by sampling error due to a small sample size, we choose a taper length near 50, which is large enough to ensure that the effective sample size is relatively large. To evaluate the effectiveness of the various approaches, we compute the "gold standard" RO strategy based on SAA and the NO strategies of individual realizations. The performance of RO/NO optimal strategies is investigated by comparing their NPV distributions in the entire ensemble of 100 realizations with that of 560 random complete drilling sequences. Figure 18 shows MLHS-SR search strategy applied to the mean model with bias correction. To speed up the search process while maintaining the highest possible solution quality after space reduction, we consider at least half of the remaining wells with high economic indicator values as the next possible actions (i.e., prior space reduction) and preserve three partial paths with the highest estimated expected NPVs for the remaining search process (i.e., posterior space reduction). For nodes marked in orange, the paths through them have the highest estimated expected NPV for complete sequences and have been expanded in the most promising directions. For each selected direction, only the partial paths go through blue nodes (or orange nodes if the paths have been extended), and black nodes are evaluated after prior space reduction. However, the paths with black nodes are pruned in the process of posterior space reduction due to their lower estimated values and will not be considered during the search process. In this case, the search evaluates 46 paths through 13 decision steps to optimize an entire drilling sequence: [OP 3, WI 1, WI 2, OP 1, OP 4, WI 3, OP 2, OP 5]. Considering the cost of obtaining the partial correction factor β (200 simulations), only 246 simulations are required to compute the RO drilling sequence for eight wells. For comparison, when MLHS-SR is used with SAA, it is necessary to perform more than 3000 simulations to obtain a solution with the same quality. By more extensively reducing the search space, we can find the same solution faster (235 simulations). However, excessive space reduction (e.g., searching along only one direction) will lead to a drilling sequence with a less-thanoptimal value (− 0.71%). Because a large fraction of the cost is a result of estimation of the bias correction factor, the cost reduction achieved by evaluating fewer controls might not be very significant, especially for small fields with large ensemble sizes. In such cases, it is not advisable to prune the space significantly to reduce the cost, because the computational cost might be reduced only slightly while yielding a solution with a suboptimal value.
The percentage presented near each node in Fig. 18 represents the error of the estimated expected NPV of each partial path (i.e., heuristic sequences) compared with SAA. The average error of the estimated expected NPV from the mean model is reduced from − 9.20 to 0.21% by estimation of the bias correction factor. The maximum error on the estimated expected NPV is only 0.54%. In this example, by using the information from only 100 samples of the drilling sequence and individual model realizations, the expected NPVs of different drilling sequences can be accurately estimated. The optimal drilling order from the learned heuristic search is computed by repeatedly expanding the search in the directions with the highest estimated values. Because the bias correction factor for the drilling-order problem in the REEK Field is relatively stable, adding such a multiplicative factor does not significantly change the initial ranking of the heuristic sequences in the mean model. Hence, we observe that MLHS-SR performed directly in the mean model (i.e., linear approximation) finds the same solution obtained using the bias-corrected mean model. However, for the cases with considerable variability in α, a learned heuristic search with MMBC can potentially find a better drilling sequence than that obtained using the linear approximation. Figure 19 shows the error on the maximum expected NPV of the complete drilling sequence along the optimal path that is estimated by using heuristic sequences with MMBC and online learning techniques. We first remove the bias in the initial approximations of the expected NPV obtained from the mean model by estimating a bias correction factor α (green curve → blue curve), and we then correct the estimates of the maximum expected NPV obtained from heuristic sequences through online learning Fig. 19 Error of estimates of maximum NPV along the optimal path mechanisms (blue curve → orange curve). We notice that the initial estimates of the maximum expected NPV from heuristic sequences (blue curve) always overestimate the actual maximum value for the complete sequence and show an exponentially decaying trend with an increasing decay factor throughout the search process. However, the accuracy of the estimated maximum expected NPV is significantly improved by learning the errors on approximate values obtained from previous drilling steps (orange curve). In this case, the average error along the optimal path is only 0.22%. With such accurate estimates, the final optimized sequence is very likely to have a high expected NPV and to be near the true optimal sequence. When MLHS is applied without any space reduction, the average NPV of the optimized sequence is 0.60% lower than that of the solution obtained with space reduction. The main reason for this is that some estimated values underestimate the actual maximum value (e.g., the error for the partial path with three selected wells is − 1.03%), meaning that if the expected NPV of the current solution is higher than all of these underestimated values, the search will terminate with the current solution. In this experiment, an appreciable amount of space reduction (e.g., at least half of the remaining wells are considered as the next possible action) is shown to effectively prune paths that are less likely to correspond to the optimal drilling order, leading to a better solution with a lower cost. Figure 20 compares the NPV distributions for RO/NO sequences and a set of 560 random drilling sequences. The results show that the average NPV is significantly increased by optimizing the drilling sequence with a learned heuristic search. RO sequence has a higher average NPV than either the NO solutions or the set of random drilling sequences, being as much as 25.86% higher than the average NPVs of some of the random drilling sequences. Table 2 summarizes the optimization results for each of the approaches. At the cost of approximately 200 additional simulations, the RO strategy obtained from the mean model with bias correction is shown to be significantly better than the deterministic solutions for individual realizations. We observe many of the NO strategies have similar NPV distributions over a set of 100 realizations. It seems that in the REEK model, the optimal drilling sequence does not vary significantly with the geological uncertainty.
Although the geological properties vary significantly from realization to realization, the optimized deterministic well sequences for 100 model realizations have common characteristics. In particular, the optimized sequence for each single realization always starts with one producer and one injector, in that order. For 92 of the 100 individual realizations, the last well drilled in the optimal sequence is a producer. In most cases, Producer OP 3 and Injector WI 1 are drilled first, and Producer OP 5 is the last well drilled. In the REEK Field case, with five producers and three injectors, the optimal wells for these three drilling positions appear to be nearly independent of the geological properties. As a result, the RO solution is likely to be near several deterministic optimal solutions for this field. It is clear that this will not always be the case for other fields.
Partial optimization Wang and Oliver [40] have shown that learned heuristic search is a viable means of solving the sequential drilling-order problem for complete sequences. In many cases, however, only the first few wells in the drilling sequence may be needed. As shown in Fig. 18, the optimal partial path will not always extend in only one direction during the search. In such cases, it is not guaranteed that the first few wells can always be optimized along the final optimal complete sequence by limiting the depth of the search path. In this experiment, we investigate the possibility of optimizing the first few wells without   finding the entire optimal sequence by performing learned heuristic search with a limited depth (i.e., a depth-limited search and an iterative depth-limited search). Figure 21 shows the search strategies for the first two and three wells using MLHS-SR with depth-limited search (DLS). To optimize the first few wells, we terminate the search with the first selected partial path that includes one additional well. Figure 21a shows that the first optimal partial path with three selected wells is [OP 3, WI 1, OP 2], which indicates that the optimized sequence of the first two wells is [OP 3, WI 1]. The path through Producer OP 3 and Injector WI 1 has been previously expanded as the most promising direction in the second decision step; however, the search changes direction after the evaluation of the extended partial paths due to their underestimated values. Seven additional decision steps are required to guide the search back to the path along the final optimized complete sequence. In this case, if the search were to be stopped immediately with the first selected partial path with two wells (i.e., [OP 3, WI 1] ), then the same solution for the first two wells could be obtained faster (i.e., 26 simulations would be eliminated). However, this will not always be the case for other search depths. Figure 21b shows that although the first expanded direction with three selected wells is [OP 3, WI 1, OP 2], the path with four wells [OP 3, WI 1, WI 2, OP 1] provides a better solution for the first three wells (i.e., [OP 3, WI 1, WI 2]) since with more observations from longer paths, online learning mechanisms can improve the estimated values and guide the search closer to the true optimal solution. A total of 234 and 240 simulations are required to optimize the first two and three wells, respectively, under geological uncertainty, of which 34 and 40 simulations, respectively, are performed during the search process. Figure 22 shows the search strategies for the first two and three wells using MLHS-SR with iterative depth-limited search (IDLS). In each decision step, we consider only the paths extended from the previous decisions and optimize the next well based on the first selected partial path with two more wells. We optimize the first well based on the firstvisited best partial path with two selected wells (i.e., [OP 3, WI 1]), then we compute the second well based on the paths starting with OP 3. In contrast to the search strategies obtained using DLS, as shown in Fig. 21, the paths starting with Injector WI 1 and Producer OP 4 are pruned, thus avoiding 14 simulations. Figure 23 shows the numbers of (a) Depth-limited search strategy of the first two wells (b) Depth-limited search strategy of the first three wells  simulations needed to optimize the first few wells by using MLHS-SR with DLS and IDLS. The results for this example show that both methods can be used to optimize the first few wells along the final optimized complete sequence by controlling the search depth, but IDLS finds a solution faster by eliminating some unnecessary node evaluations caused by underestimated values. The performance of DLS and IDLS depends on the online learning techniques applied. When the estimated values are sufficiently accurate such that the search process will expand the path in only one direction, both DLS and IDLS can optimize the first few wells along the final optimal complete drilling sequence at the same cost. However, the estimated values are generally less accurate earlier in the search process due to the lack of previous sequential observations, meaning that the search might later change direction. Compared with DLS, IDLS is more likely to optimize the first few wells faster without any loss of the solution quality since paths with low NPV generally will not continue to be extended when appropriate online learning mechanisms are used. In this case, because only 246 simulations are needed to optimize an entire sequence for eight wells via MLHS-SR, the cost reduction achieved by optimizing only the first few wells is not very significant. For larger fields, however, MLHS-SR with IDLS can be used to efficiently optimize only the first few wells in the sequence.

Conclusions
In this paper, we presented a methodology for robust optimization of the expected value of an expensive-tocompute objective function for which the uncertainty is characterized by an ensemble of model realizations. A straightforward approach using sample averaging to approximate the expected value is prohibitively expensive in most real reservoir applications. In our approach, we compute a multiplicative bias correction to the value of the objective function computed from the first term of the Taylor series expansion. Computation of the bias correction requires two function evaluations for each random sample of the control variables: one evaluation using the mean reservoir model and a second evaluation using a random model realization. Our results show that the information from distinct controls and individual realizations can be used to accurately estimate the bias in the approximation of expected NPV obtained from the mean model and that robust optimization can be performed with good approximations of the expected NPV while requiring many fewer simulation runs than SAA. Based on our experiments, we arrive at the following conclusions: -Approximations of the expected NPV obtained using the mean reservoir model are generally poor compared with estimates obtained using SAA, but approximations from the mean model can be significantly improved by estimating a multiplicative bias correction factor, which is estimated by simulating a modest number of randomly selected controls on both reservoir realizations and the mean reservoir model. -Distance-based localization can improve the estimated values of the correction factor. The performance of this approach the estimation of the bias-correction factor depends on the distance measure and the taper length, both of which must be estimated. -Regularized localization with an appropriate parameter based on the variance of the bias correction factor reduces the sensitivity of the estimates to the taper length; consequently, regularized estimates are accurate for a wider range of taper lengths. When the variance of the bias correction factor is known, regularized localization is the preferred method. -The RO solutions obtained using MMBC and SAA are of similar quality and are both superior to the deterministic solutions for individual realizations and the mean reservoir model (i.e., linear approximation).
-A learned heuristic search can be performed either to optimize complete drilling sequences or to optimize only the first few wells in the sequence at a reduced cost by limiting the search depth. Compared with DLS approach, a solution of the same quality can be found faster by performing an IDLS with effective online learning techniques.
The major advantage of using mean model bias correction is that we only need to perform simulation using the mean model to obtain an initial approximate value of expected NPV, for which the bias will be corrected by a multiplicative correction factor estimated based on the observations of partial corrections from similar controls. Although the methodology was applied to the problem of determining optimal drilling order, the methodology is fairly general and does not require that the objective function be general or that the control variables be continuous. The effect of localization of the estimate could be significant in the case of large variability in the bias correction factor, especially when the ensemble size is large such that more observations from distinct controls and individual realizations could be used for bias estimation. However, to ensure the effectiveness of distancebased localization for bias correction, we need to design appropriate distance measures for identification of sets of similar controls. For the drilling-order problem, a distance metric based on the well position can effectively measure the similarity between drilling sequences. In this work, we presented a method for efficiently estimating the expected value of an objective function by applying a multiplicative bias correction to the value obtained from the mean model. It would not be difficult to modify the bias-correction method for use in multiple objective optimizations under uncertainty (e.g., standard deviation, percentiles, expected value). For example, it would be possible to compute an estimate of the variance of the objective-function value, by squaring the value of the objective function from the mean model and multiplying it by the variance of partial corrections, which can also be estimated by using information from distinct controls and individual model realizations.
Funding Open Access funding provided by NORCE Norwegian Research Centre AS. This research was supported through the DIGIRES project by the Research Council of Norway and industry partners Equinor, Petrobras, Aker BP, Neptune Energy, Winter-shallDEA, Lundin Norway, and Vår Energy. Access to Eclipse licenses was provided by Schlumberger.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material where C β (h x ) is the covariance function for β at fixed model realization; h x is distance between observations; C β (h m ) is the covariance function for β at fixed control variable; h m is the distance of realizations corresponding to the observed controls, i.e., h m = 0 if observed values are from the same realization, h m = 1 if observed values are from different realizations.
C β (h x ) can be obtained from the average variogram of β of all ensemble realizations: where σ 2 β (m j ) and γ (h x , m j ) are the variance and variogram of β at a fixed realization m j , respectively.
For fixed control variable, the values of β ij and β ij will be correlated for model realizations m j and m j at a small distance. Here, we assume that the model realizations are far enough apart such that the observed values of β from different realizations are independent. Then the covariance function for β for fixed control variable can be described as: where σ 2 β (x k ) is the variance of observed β from different realizations at a fixed control x k . cov(bb T ) = ⎡ ⎢ ⎢ ⎢ ⎣ cov(β 11 , β 11 ) cov(β 11 , β 22 ) · · · cov(β 11 , β NN ) cov(β 22 , β 11 ) cov(β 22