A natural Hessian approximation for ensemble based optimization

A key challenge in reservoir management and other fields of engineering involves optimizing a nonlinear function iteratively. Due to the lack of available gradients in commercial reservoir simulators the attention over the last decades has been on gradient free methods or gradient approximations. In particular, the ensemble-based optimization has gained popularity over the last decade due to its simplicity and efficient implementation when considering an ensemble of reservoir models. Typically, a regression type gradient approximation is used in a backtracking or line search setting. This paper introduces an approximation of the Hessian utilizing a Monte Carlo approximation of the natural gradient with respect to the covariance matrix. This Hessian approximation can further be implemented in a trust region approach in order to improve the efficiency of the algorithm. The advantages of using such approximations are demonstrated by testing the proposed algorithm on the Rosenbrock function and on a synthetic reservoir field.


Introduction
We consider the unconstrained optimization problem where J (x) is a differentiable function in R n . The focus is on reservoir management where J typically represents the Net-Present-Value (NPV). However, the methodology presented here is applicable to any type of optimization problem of moderate size (number of controls not exceeding O(10 2 )). Production optimization in petroleum science plays a significant role in the return on investment, being of two constituents of closed-loop reservoir management. This usually requires a model-based optimization technique since the workflow involves the prediction of future production [2,5,13]. There are a significant number of papers that use the gradient obtained through the adjoint technique [9]. Sarma et al. [13] proposed an approximate feasibledirection algorithm with the help of the adjoint method to handle production optimization problems with nonlinear path inequality constraints. Likewise, Zandvliet et al. [17] used the adjoint method for well placement optimization. This method calculates the gradients of the objective function and the gradients are used subsequently to approximate directions. Forouzanfar et al. [8] developed a gradient-based optimization algorithm, where the gradient is computed by combining the adjoint method and an analytical method for linear constraints to estimate the optimal well location and target rates in the reservoirs. Although the adjoint method has been proven as an efficient way to calculate the gradient, it is a nearly impractical method since it requires access to commercial simulator source code. Over the last decade stochastic gradient approximations have gained a lot of interest in the petroleum community. The ensemble based optimization (EnOpt) was introduced into the petroleum community by Chen et al. [2] and Lorentzen et al. [10] as an optimization equivalent to the ensemble Kalman filter [6], where correlations between random Gaussian input controls and their output are used to approximate a gradient. Do and Reynolds [4] applied a similar technique, the simultaneous perturbation stochastic approximation (SPSA), to estimate optimal well controls in production optimization. The efficiency of EnOpt was improved in [7] by deploying covariance matrix adaption [11], in which the covariance matrix is allowed to change according to the best samples from the ensemble of controls. A theoretical evaluation of EnOpt was presented in [15] where it was shown that EnOpt is a special case of a natural evolution strategy [1], an optimization framework where the standard gradient is replaced by a natural gradient w.r.t. parameters of a Gaussian search distribution. Furthermore, it was shown that for robust optimization (e.g. optimizing the mean NPV over an ensemble of reservoir models) the strategy of pairing one random control with one reservoir realization is theoretically sound in the natural evolution framework. This is a key point for real applications as the number of simulation runs is reduced to the number of reservoir models, and not multiple runs per reservoir model. This makes EnOpt as computationally efficient as the adjoint method for robust optimization.
In this article the work of Stordal et al. [15] is extended and a Hessian approximation is derived using second order natural gradient information. With this second order information we implement a trust-region strategy for EnOpt where we use a preconditioned conjugate gradient method to solve the subproblem of the trust-region algorithm known as Steihaug's approach [14]. Steihaug's approach is based on the preconditioned conjugate gradient method and may be regarded as a generalized dogleg technique where the quasi-Newton step is supplied.
The paper starts with background information about ensemble-based optimization and shows how EnOpt is a special case of a natural evolution strategy. Using the natural evolution strategy, the development of approximate Hessian is also presented in Section 2. As the Hessian matrix is needed in the trust-region approach, Section 3 gives a brief description of the approach. Next the practicalities for the proposed method are presented in Section 4 along with applications on the Rosenbrock function and a reservoir model. Concluding remarks are found in Section 5.

Ensemble-based optimization
The EnOpt algorithm starts, as any numerical optimization algorithm, with a vector μ 0 ∈ R n as the initial values for the control and a covariance matrix Σ that needs to be specified.
denotes the vector of the optimization variables at each step, where n is the total number. Initially, an ensemble of control vectors {X i 0 } N i=1 , are drawn from a multivariate Gaussian density (μ 0 , ) and their objective function values, , are evaluated. Originally, the EnOpt algorithm updated the mean control vector μ k at each iteration as where k denotes the iteration number, μ k is the current control, β k is the step size and Σ is the covariance matrix of the Gaussian distribution where the ensemble is drawn from. The preconditioned gradient is then approximated as where . It was shown in [15] that asymptotically, as N goes to infinity, the right hand side of Eq. 4 converges to which is the natural gradient [1] of the objective function In general, natural evolution searches for the minimum of the expected objective function w.r.t. a location parameter of the search distribution. That is where f is a probability density that depends on the parameter μ. Both [1] and [16] pointed out that the ordinary gradient of Eq. 5 w.r.t μ does not account for gradient uncertainty and depends on the particular parameterization of the distribution, leading to unstable updates. The essence of the natural gradient is to remove this dependence on the parameterization by multiplying the gradient with the inverse of the Fisher information matrix, which is Σ in our case. From the natural evoluation point of view, Stordal et al. [15] redefined the EnOpt algorithm, whithout Σ as a pre-conditioner in Eq. 3, and with an additional update equation for Σ. Given a parametric family of multivariate Gaussian distributions (x|μ, ), the objective function in the natural evolution is defined as with corresponding natural gradients where the natural gradient is defined as the inverse of the Fisher information matrix multiplied with the traditional gradient. For details of the derivation the reader is referred to [1].
Unfortunately, the expectations in Eqs. 6 and 7 are not available analytically in general. Instead, an approximation of the natural search gradients are obtained from Monte The EnOpt in a natural evoluation setting evolves as where β 1 k and β 2 k are the step sizes. We also note that often Furthermore, it is also quite common to normalize the gradients in order to avoid tuning the step sizes on a case by case basis.
The above Monte Carlo estimates of the covariances suffer from sampling errors due to the limited sample size computationally available for reservoir simulations (typically 50-100). Unlike data assimilation methods, where a type of covariance localization is used to reduce the impact of sampling error (typically via a tapering function or domain localization), the NPV objective function used in the EnOpt algorithm does not have this "local" feature. This renders standard localization techniques from the data assimilation literature useless for EnOpt. Fortunately, the EnOpt gradient consists of a single vector of covariance estimates (not a matrix) of order O(10 2 ) or less for most reservoir problems so the need for reducing the sampling error is not nearly as severe as for data assimilation problems where the covariance matrices can be of order O(10 6 ) and higher. There are, however, a few approaches that can help reducing the sampling errors in Eqs. 8 and 9. For the gradient w.r.t. μ, a pre-conditioning matrix can be used to smooth the gradient estimate and hopefully average out some of the errors [2]. Another possible approach is to use truncation, where all estimates below a certain threshold in absolute value are set to zero. For the gradient w.r.t. Σ, a classical shrinkage towards the identity matrix is one approach. The second approach, which is implemented in the reservoir example presented later, is to only update the variances (the diagonal of Σ) while keeping the correlation structure fixed. The latter approach significantly reduces the number of covariances to be estimated.
We now extend the EnOpt by including a natural Hessian approximation that can be applied in both backtracking and trust-region algorithms to show later how the performance of an optimization can be improved by having second order gradient information.

A natural Hessian approximation
In view of the natural evolution theory presented above, a formulation of a natural Hessian w.r.t. μ is presented.
From Eq. 6 Then using the chain rule and the log trick Note that this is the exact same expression as the natural gradient w.r.t. Σ in Eq. 7. And so a Monte Carlo approximation of the natural Hessian matrix is given by Eq. 9 and is already available in the EnOpt algorithm. No extra computation is needed. A more detailed description of EnOpt with backtracking strategy and trust-region strategy utilizing the natural Hessian approximations are described later in Algorithms 1 and 2, respectively. The trust-region approach is described in more details in the next section.

Trust-region approach
We formulate the trust-region approach in terms of minimization, hence we seek to minimize the negative NPV instead of maximizing the NPV. In trust-region approaches, the objective function, J , is replaced by a quadratic form m k at each iteration k using the first two terms of the Taylor-series expansion of J around μ k . Let where p is the trial step, J k = J (μ k ), g k is the gradient of J at the current point and B k is a real symmetric n × n matrix (typically the Hessian). At the current iterate point the trial step is computed by solving the sub-problem min p∈R n m k (p) = J k +g k p+ where k > 0 is the trust-region radius. Having determined the trial step, the objective function is now computed at μ k + p and then compared to the value predicted by the approximate model at this point. By doing so, this information is further used to adapt the trust region per iteration.
Here, the Steihaug approach [14] is used where the algorithm terminates when it exits the trust-region p > or when it encounters a direction of negative curvature in B. The approximate derivatives are then used in the algorithm to update the search direction. In particular, after the first iteration, we have the Cauchy point, the point lying on the gradient that minimizes the quadratic model. By iteratively finding the Cauchy point the local minimum can be found. Having determined the trial step, p k , the objective function is now computed at If the reduction predicted by the approximate model m k (μ k + p k ) is realized by the objective function J (μ k + p k ), the trial point p k is accepted and the trust-region is updated.
Up to this point all necessary theories are presented, we will demonstrate the advantages of using the proposed method in the next section.

Numerical experiments
In this section two examples to show the performance of the natural Hessian approximation for ensemble-based optimization are presented. The first example, a twodimensional test function, is to demonstrate the improved convergence rate by comparing three different versions of EnOpt with a more conventional gradient based method; while the second example presents a large-scale reservoir optimization problem.

Two-dimensional valley-shaped function
The objective function to be minimized is the two- where the global minimum, J (x * , y * ) = 0 at (x * , y * ) = (1, 1), is inside a long, narrow, parabolic shaped almost flat valley. This valley is trivial to find, however, convergence to the global minimum is difficult.
Four different optimization scenarios are: (i) trust-region strategy with true gradient and Hessian of Rosenbrock function (TR-grad); (ii) trust-region strategy with natural gradient and Hessian (TR-ens); (iii) backtracking strategy with natural gradient and Hessian (BT H ); (iv) simple backtracking without Hessian, i.e., steepest descent line search method without Hessian (BT). Results are in Fig. 1. Each scenario is simulated 100 times with 100 different starting points uniformly distributed in the function domain interval [−5, 10]. The covariance matrix Σ is initially set as a diagonal matrix with 0.01 on the diagonal. The steps sizes are set to β 0 = 0.001 for the trust-region approach and β 1 0 = 0.1 and β 2 0 = 0.001 for the backtracking. The trust-region parameters are set to γ 1 = 0.5, γ 2 = 2, η 1 = 0.25, η 2 = 0.75 and 0 = 3. The algorithms are run until convergence, that is until μ k − (1, 1) < 10 −3 . All ensemble based methods use the same initial ensemble.
Not surprisingly, among these four scenarios, the one using the analytic gradients has the best result. The least favorable scenario turns out to be the traditional EnOpt that only uses simple backtracking. The results are presented in Figs. 1 and 2 including second order derivative information in EnOpt. In terms of iterations, the trust-region strategy significantly outperforms backtracking. Figure 2 shows the summary of the results in a box plot. In order to address the sensitivity of the initial choice for Σ, the EnOpt with trust region and EnOpt with backtracking including the Hessian were rerun for ten different starting values. For each value, the starting point for μ was kept fixed and each method was run one hundred times to avoid any Monte Carlo effects. The average number of iterations are reported in Fig. 3 along with the value for the variance. The results are similar to the discussion in [15], i.e. the variance should not be set too large initially. However, since the variance is updated at each iteration, the initial value is not as sensitive as for the original EnOpt where the variance is kept fixed.

Synthetic reservoir model
A synthetic field designed by TNO, Brugge, is used as a benchmark study to test the combined use of waterflooding optimization and history matching methods in a closed-loop workflow [12]. The model is discretized on a 139 × 48 × 9 grid lattice with a total of 60048 cells. The model has a complex geological structure that contains five facies types formed in different depositional environments. The field is equipped with 30 wells, 20 producers and 10 injectors as shown in Fig. 4.
The objective function is defined as the Net-Present-Value (NPV), which can be calculated as where the oil production, q o,j , water production, q wp,j , and water injection rates, q wi,j , at time t j change as a function of the control variable x. The oil price is denoted v o . The costs of water disposal and water injection are denoted v wp and v wi , respectively. The discount factor is denoted by r and t j is the difference between two timesteps (measured in days).
In this example, the control variables are the sum of the entering reservoir volumes of oil, gas, water for each producer, and the injected water rates. The price of oil is set to $80/stb. The cost of both water injection and water disposal is $5/stb. The discount rate is 10% per year. The time frame for the optimization is 20 years and the control settings are modified every year, so the number of control steps is 20. Thus, the total number of control parameters is 600, which is the product of the number of wells and the number of control steps. As a reference case, the producers are set to produce at a rate of 2500 stb/day, with a minimum bottom hole pressure of 725 psia. The injectors have an upper limit in bottom hole pressure of 2610 psia and are operated at a rate of 3500 stb/day.
The performance of EnOpt using Hessian approximations in both trust-region (TR) method and backtracking (BT) method are compared with the original EnOpt method in which the steepest decent direction is applied without using Hessian approximations. In addition, we show two The same initial ensemble is used for all algorithms. We construct Σ using an autoregressive model of order 1 for each well. The autocorrelation function for each well at time t is given by: where h is an interger s.t. h ∈ [0, T − t] and T is the total number of control steps and ρ = 0.5. There is no correlation between producers and injectors. The variance is given by σ 2 P = 0.1 and σ 2 I = 0.1 for producers and injectors, respectively. The correlation structure is kept fixed during the optimization procedure, so that only the diagonal of Σ is updated.
One hundred initial ensemble members are generated from a Gaussian distribution with mean given by the reference case. The average NPV for the 100 initial ensemble members is $2.8227 × 10 9 . Figure 5 shows the change of the NPV with iterations for the four abovementioned scenarios. For the final simulated
It is noted that all variables are scaled based on the maximum rate of producers q p,max and injectors q i,max , respectively. In all four scenarios, the maximum rate is set to 3000 stb/day for producers and 4000 stb/day for injectors. Mathematically, scaled rates at time j of oil production rate, water production rate, and water injection rate are expressed as,  Of particular note is the importance of the initial trustregion radius k in Eq. 13 and the initial step size β 1 0 in Eq. 3. We followed guidelines of [3] on choosing parameters for the trust-region method. In this example, k is set to 0.1. Two other parameters, γ 1 and γ 2 , as described in Algorithm 2 are set to 0.9 and 2, respectively. Finally, η 1 and η 2 are set to 0.001 and 0.1, respectively.
For the backtracking method with Hessian approximations the initial step size β 1 0 is set to 0.05. This scenario was initially compared with the backtracking method without Hessian approximations scenario for which the initial step size was kept the same. However the latter scenario could not find further improvement after the second iteration as shown in Fig. 5. One possible explanation could be that the initial step size was set too large. To address this issue, we introduced the scenario where we reduced the initial step size of backtracking from 0.05 to 0.025. For all three cases, the step size for Σ was set to 0.01.
The results presented in Fig. 5 shows the same trend as for the Rosenbrock function, that including a Hessian approximation in EnOpt is beneficial. And again, the trustregion approach outperforms the backtracking.
Of particular interest are the oil field cumulative production, water field cumulative production, and water field cumulative injection as shown in Figs. 6, 7, and 8, respectively. Trustregion method with Hessian approximations has the highest cumulative oil production among four scenarios as shown in Fig. 6. The ideal result would be that the strategy ends up with more oil production, and less water production and water injection. However, as shown in Figs. 7 and 8 trust-region method with Hessian approximations results in the highest water injection and production among the four scenarios. These unwanted higher water injection and production values are compensated by more oil production for a better economic value.

Conclusions
In this article we have introduced a natural Hessian approximation for ensemble-based optimization and its  Field cumulative water production for four scenarios derivation in the context of trust-region methods. In particular, the case using Steihaug's approach for solving the trust-region subproblem. The natural Hessian can be approximated with a Monte Carlo approach which makes it suitable for the ensemble-based optimization in reservoir management, and within the natural evolution in other fields of engineering. The presented methodology showed improved convergence rate for the Rosenbrock function when compared to more standard EnOpt algorithms. Furthermore, it achieved higher Net-Present-Value than other EnOpt algorithms in the Brugge reservoir test case. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.