# History Matching Through a Smooth Formulation of Multiple-Point Statistics

- 1.3k Downloads
- 2 Citations

## Abstract

We propose a smooth formulation of multiple-point statistics that enables us to solve inverse problems using gradient-based optimization techniques. We introduce a differentiable function that quantifies the mismatch between multiple-point statistics of a training image and of a given model. We show that, by minimizing this function, any continuous image can be gradually transformed into an image that honors the multiple-point statistics of the discrete training image. The solution to an inverse problem is then found by minimizing the sum of two mismatches: the mismatch with data and the mismatch with multiple-point statistics. As a result, in the framework of the Bayesian approach, such a solution belongs to a high posterior region. The methodology, while applicable to any inverse problem with a training-image-based prior, is especially beneficial for problems which require expensive forward simulations, as, for instance, history matching. We demonstrate the applicability of the method on a two-dimensional history matching problem. Starting from different initial models we obtain an ensemble of solutions fitting the data and prior information defined by the training image. At the end we propose a closed form expression for calculating the prior probabilities using the theory of multinomial distributions, that allows us to rank the history-matched models in accordance with their relative posterior probabilities.

### Keywords

History matching Multiple-point statistics Optimization Inverse problems## 1 Introduction

History matching is a task of inferring knowledge about subsurface models of oil reservoirs from production data. History matching is a strongly underdetermined problem: having data in a limited number of wells, one needs to estimate rock properties in the whole reservoir model. This problem has infinitely many solutions, and in addition, most of them are not geologically plausible. Furthermore, the intensive computational work needed to simulate the data redoubles the complexity. To address these challenges, we develop a probabilistic framework that combines complex a priori information and simultaneously aims at reducing the number of forward simulations needed for finding solutions. We propose a smooth formulation of the inverse problem with discrete-facies prior defined by a multiple-point statistics model. This allows us to use gradient-based optimization methods to search for feasible models. In probabilistic inverse problem theory (Tarantola 2005) the solution of an inverse problem is represented by its a posteriori probability density function (PDF). Each possible state in the model space is assigned a number—a posteriori probability density—which reflects how well the model honors the data and the a priori information (knowledge about the model parameters independent from the data). The a posteriori PDF of high-dimensional, underdetermined inverse problems, such as history matching, may feature isolated islands of significant probabilities and low probabilities everywhere else. Therefore, when the full description of the posterior PDF is not available, the goal is to locate and explore islands of significant posterior probabilities.

One may explore the a posteriori PDF in several ways. Monte Carlo methods (Mosegaard and Tarantola 1995; Cordua et al. 2012) allow, in principle, sampling of the a posteriori PDF. However, for large scale non-linear inverse problems, there is a risk of detecting only a single island of significant posterior probability. In addition, sampling is not feasible for inverse problems with computationally expensive forward simulations, such as history matching. Other methods rely on optimization (Caers and Hoffman 2006; Jafarpour and Khodabakhshi 2011) to determine a collection of models that fit the data and the a priori information. However, these methods fail to describe a posteriori variability of the models as the weighting of prior information versus data information (likelihood) is not taken into account.

Regardless of the chosen strategy, most of the research community favors the advanced prior information that helps to significantly shrink the solution space of allowed models (Caers 2003; Jafarpour and Khodabakhshi 2011; Hansen et al. 2012). For instance, the a priori information borrowed from a training image (Guardiano and Srivastava 1993; Strebelle 2002) would permit only models of a specific configuration defined by statistical properties of the image. Ideally, training images reflect expert knowledge about geological phenomena (facies geometry, contrast in rock properties, location of faults) and play a role of vital additional information, drastically restricting the solution space (Hansen et al. 2009). Our strategy for exploring the a posteriori PDF, which is especially suitable for inverse problems with expensive forward simulation (e.g. history matching), is to obtain a set of models that feature high posterior values, and rank the solutions afterwards in accordance with their relative posterior probabilities. We integrate complex a priori information represented by multiple-point statistics inferred from a training image. One of the challenges here is to define a closed form expression for the prior probability that, multiplied by the likelihood function, provides the a posteriori probability. It is not sufficient to perturb the model in consistency with the training image until the dynamic data are matched as it is done in the probability perturbation method (Caers and Hoffman 2006). As it was noticed by Hansen et al. (2012), in this method the fit to the prior information is not quantified, so the method will spot models of maximum likelihood/non-zero prior, not of maximum posterior; the resulting model may resemble the training image very poorly, and therefore may have a low posterior value.

Lange et al. (2012) were the first who aimed at estimating prior probabilities solving inverse problems with training images. The developed frequency matching (FM) method is able to quantify the prior probability of a proposed model and hence to iteratively guide it towards the high posterior solution. Specifically, Lange et al. (2012) solve a combinatorial optimization problem, perturbing the model in a discrete manner until it explains both data and a priori information. In practice, this requires many forward simulations and can be prohibitive for the history matching problem. While following the philosophy of the frequency matching method, we are interested in minimizing the number of forward simulations needed to achieve a model of a high posterior probability. Similarly to the FM method, we minimize the sum of data and prior misfits. However, the new smooth formulation of the objective function allows us to apply gradient-based optimization and sufficiently cut down the number of reservoir simulations. After convergence the model has all statistical properties of the training image and simultaneously fits the data. Having several starting models, possibly very different, we are able to obtain different solutions of the inverse problem and to detect regions of high posterior probability. In the case of the history matching problem, starting models obtained from seismic data interpretation probably would be of most practical use.

To our knowledge, gradient-based techniques were first coupled with training images in the work of Sarma et al. (2008) by means of kernel principal component analysis (PCA). The authors were the first who used kernel PCA for geological model parametrization. The kernel PCA generates differentiable (smooth) realizations of the training image, maintaining its multiple-point statistics and, as a result, reproducing geological structures. The differentiable formulation by Sarma et al. (2008) allows the use of gradient-based methods; however, the quality of the solution in terms of consistency with the prior information is not estimated. In this work, we actually derive a closed form expression for the prior probability. This allows us to quantify the relative posterior probabilities of the solutions and therefore to assess their importance.

This paper is organized as follows. In Sect. 2, we introduce the smooth formulation of multiple-point statistics. The proposed formulation makes it possible to measure the mismatch between multiple-point statistics of the training image and of any, possibly continuous, model. As the result, we are able to generate realizations of the training image from any starting model image using gradient-based optimization (Sect. 2.4). Combination of the proposed measure with the data misfit allows us then to search a solution to an inverse problem with training-image-based prior by minimizing a single differentiable objective function (Sect. 2.5). In Sect. 3, we demonstrate the applicability of the method solving a two-dimensional history matching problem. At the end, we rank the solutions in accordance with their relative posterior probabilities using derivations from Sect. 2.3. Section 4 summarizes our findings.

## 2 Methodology

In this work, we use a probabilistic formulation of inverse problems, integrating complex a priori information (training image) and data into a single differentiable objective function. Solving the optimization problem for an ensemble of starting models we obtain a set of solutions that honor both the observations and multiple-point statistics of the training image. We start with a definition of the inverse problem.

### 2.1 Inverse Problems with Training Image-Defined Prior

### 2.2 The Smooth Formulation of Multiple-Point Statistics

Notation

Notation | Description |
---|---|

\(\mathbf {TI}\) | Training image, categorical |

\(\mathbf {m}\) | Model (test image), can contain continuous values |

\(\mathbf {T}\) | Scanning template |

\(H^{\mathrm{d}, \mathbf {m} } \) | Pseudo-histogram of \(\mathbf {m}\) |

\(H^{\mathrm{d}, \mathbf {TI} } \) | Pseudo-histogram of \(\mathbf {TI}\) |

\(N^\mathbf {m}\) | Number of patterns in \(\mathbf {m}\) |

\(N^\mathbf {TI}\) | Number of patterns in \(\mathbf {TI}\) |

\(N^{\mathbf {TI},\text {un}}\) | Number of unique patterns in \(\mathbf {TI}\) |

\(\mathrm{pat}^{\mathbf {m}}_i\) | Pixel values of \(i\)th pattern in \(\mathbf {m}\) |

\(\mathrm{pat}^{\mathbf {TI}}_i\) | Pixel values of \(i\)th pattern in \(\mathbf {TI}\) |

\(\mathrm{pat}^{\mathbf {TI}, \text {un}}_j\) | \(j\)th unique pattern in \(\mathbf {TI}\). |

The smooth histogram computed for the discrete Image A (Fig. 2a is shown in Fig. 2c by light-blue color, while its original frequency distribution is depicted by the dark-blue color. Categories of discrete patterns, contributions to which are calculated using Eq. 8, are shown below the \(x\)-axis. Figure 2b shows a continuous image, while in Fig. 2c one can see its histogram, defined in the smooth sense, depicted by the orange color. Notice the small counts everywhere: indeed, according to Eq. 9, this image does not contain patterns sufficiently similar to those observed in the training image. For the visualization purposes parameters of Eq. 8 are chosen as \(A=50\), \(k=2\) and \(s=2\). These values are applicable after \(t_{ij}\) has been normalized on the quantity representing maximum possible Euclidean distance between the discrete patterns.

The choice of parameters \(A\), \(k\) and \(s\) in Eq. 8 is very important: from one side, they define how well the pseudo-histogram approximates the true frequency distribution; from the other side, they are responsible for smoothing and, consequently, for the convergence properties. Figure 3 reflects how different values of \(k\), \(s\) with fixed \(A=100\) influence the shape of the pattern similarity function (Eq. 8). Our empirical conclusion is that values \(A=100\), \(k=2\), \(s=2\) are optimal. Compare them (Fig. 3) with the extreme case \(A=100\), \(k=1\), \(s=2\) where the majority of patterns have a close-to-zero contribution.

### 2.3 Relation of the Dissimilarity Measure to Prior Probability

*K*categories, where each category has a fixed probability of success \(p_i\). By definition, each element \(H_i\) in the frequency distribution \(\mathbf {H}\) indicates the number of times the \(i\)th category has appeared in \(N\) trials (number of patterns observed in the test image). Then the vector \(\mathbf {H} = (H_1,\ldots , H_K)\) follows the multinomial distribution with parameters \(N\) and \(\mathbf {p}\), where \(\mathbf {p} = (p_1,\ldots , p_K)\)

### 2.4 Generating Near-Maximum A Priori Models

### 2.5 Solving Inverse Problems

Similarly to Sect. 2.4, we apply the logarithmic transformation (Eq. 22) to the model and to the training image. For solving (25), we suggest using quasi-Newton methods that are known to be efficient for history matching problems (Oliver et al. 2008). The gradient of the data misfit term is calculated by an adjoint method implemented in the reservoir simulator Eclipse (Schlumberger GeoQuest 2009). The gradient of the prior term is computed analytically (Appendix A). The algorithm is stopped when the values of the objective terms in the optimization problem (25) approach their target values. The computational efficiency of the algorithm decreases with increase of the number of categories in the training image and/or the template size, since a larger number of Euclidean distances is to be calculated.

## 3 History Matching Example

Reservoir model parameters

Model size | 50 \(\times \) 50 cells |

Cell size | 10 \(\times \) 10 m |

Initial water saturation | 0.0 |

Porosity | 0.3 (constant everywhere) |

Posterior ranking of the solutions

Model N | \( \log (\rho (\mathbf {m})) -\frac{1}{2}||g(\mathbf {m})-\mathbf {d^{obs }}||^2_{C_D}\) |
---|---|

1 | \(-\)8122.0324 |

2 | \(-\)8134.6031 |

3 | \(-\)10383.1467 |

4 | \(-\)7860.2211 |

5 | \(-\)6568.7915 |

6 | \(-\)8900.5525 |

7 | \(-\)9781.7611 |

8 | \(-\)7107.3847 |

9 | \(-\)6734.4299 |

10 | \(-\)7608.2761 |

True model | \(-\)7713.9272 |

For comparison, in the last row, we give the value calculated for the true model (Fig. 6). We can conclude that models 5, 8 and 9 are the most preferable within this ensemble, while model 3 is the most inferior.

## 4 Conclusions

We presented an efficient method for solving the history matching problem employing a gradient-based optimization technique that integrates complex a priori information (in the form of a training image). History matching is a severely undetermined inverse problem and existence of multiple solutions is a direct (and unfortunate) consequence of this property. However, production data contain valuable information about rock properties, such as porosity and permeability. Inversion of them is necessary for construction of reservoir models that can be used in prediction. Geological information, if available, can drastically decrease the size of the solution space, hence reducing the non-uniqueness of the solution. One way of applying the methodology is to explore the solution space. Since we are able to start from any smooth model in many cases we can detect solutions that have high posterior values and look very different, due to the fact that they belong to the different islands of high probability. Quantification of the relative posterior probabilities allows us to rank solutions and choose the most reliable ones.

The algorithm needs a starting guess, and, clearly as in any gradient-based optimization, the convergence properties depend on it. In the history matching problem, the choice of the starting guess is particularly important. The sensitivity of the production data with respect to the rock properties decreases non-linearly with the distance from wells. Therefore, it is hard to invert for model parameters in the areas with poor well coverage. The situation can be greatly simplified if one would integrate seismic data, or at least, would use the results of the seismic inversion as the starting guesses. This is a topic of our future research.

## Notes

### Acknowledgments

The present work was sponsored by the Danish Council for Independent Research-Technology and Production Sciences (FTP Grant No. 274-09-0332) and DONG Energy.

### References

- Caers J (2003) History matching under training-image-based geological model constraints. SPE J 8:218–226CrossRefGoogle Scholar
- Caers J, Hoffman T (2006) The probability perturbation method: a new look at bayesian inverse modeling. Math Geol 38:81–100CrossRefGoogle Scholar
- Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13:359–394CrossRefGoogle Scholar
- Cordua KS, Hansen TM, Lange K, Frydendall J, Mosegaard K (2012a) Improving multiple-point-based a priori models for inverse problems by combining sequential simulation with the frequency matching method. Paper presented at 82th annual meeting for the society of exploration geophysicists (SEG 2012), Las Vegas, NE, United StatesGoogle Scholar
- Cordua KS, Hansen TM, Mosegaard K (2012b) Monte carlo full waveform inversion of crosshole gpr data using multiple-point geostatistical a priori information. Geophysics 77:H19–H31CrossRefGoogle Scholar
- Gao G, Reynolds AC (2006) An improved implementation of the lbfgs algorithm for automatic history matching. SPE J 11(1):5–17CrossRefGoogle Scholar
- Guardiano F, Srivastava RM (1993) Multivariate geostatistics: beyond bivariate moments. In: Soares A (ed), vol 1, Geostatistics Troia, Kluwer AcademicGoogle Scholar
- Hansen TM, Mosegaard K, Cordua KS (2009) Reducing complexity of inverse problems using geostatistical priors. In: Proceedings of IAMG 09Google Scholar
- Hansen TM, Cordua KS, Mosegaard K (2012) Inverse problems with non-trivial priors: efficient solution through sequential gibbs sampling. Comput Geosci 16:593–611CrossRefGoogle Scholar
- Honarkhah M (2011) Stochastic simulation of patterns using distance-based pattern modeling. PhD thesis, Stanford UniversityGoogle Scholar
- Jafarpour B, Khodabakhshi M (2011) A probability conditioning method (PCM) for nonlinear flow data integration into multipoint statistical facies simulation. Math Geosci 43:133–164CrossRefGoogle Scholar
- Lange K, Frydendall J, Cordua KS, Hansen TM, Melnikova Y, Mosegaard K (2012) A frequency matching method: solving inverse problems by use of geologically realistic prior information. Math Geosci 44: 783–803Google Scholar
- Marler RT, Arora JS (2004) Survey of multi-objective optimization methods for engineering. Struct Multidisc Optim 26:369–395CrossRefGoogle Scholar
- Mosegaard K, Tarantola A (1995) Monte carlo sampling of solutions to inverse problems. J Geophys Res 100:12431–12447CrossRefGoogle Scholar
- Nocedal J, Wright SJ (2006) Numerical optimization. Springer, BerlinGoogle Scholar
- Oliver DS, Reynolds AC, Liu N (2008) Petroleum reservoir characterization and history matching. Cambridge University Press, New YorkGoogle Scholar
- Osyczka A (1978) An approach to multicriterion optimization problems for engineering design. Comput Meth Appl Mech Eng 15:309–333CrossRefGoogle Scholar
- Sarma P, Durlofsky LJ, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40:3–32CrossRefGoogle Scholar
- Schlumberger GeoQuest (2009) ECLIPSE reservoir simulator. Technical descriptionGoogle Scholar
- Strebelle S (2000) Sequential simulation drawing structures from training images. PhD thesis, Stanford UniversityGoogle Scholar
- Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34(1):1–20CrossRefGoogle Scholar
- Tarantola A (2005) Inverse problem theory and methods for model parameter estimation. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefGoogle Scholar
- Zhu C, Byrd RH, Lu P, Nocedal J (1997) L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS) 23(4):550–560Google Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.