Geostatistics Valencia 2016 pp 573-588 | Cite as
Machine Learning Methods for Sweet Spot Detection: A Case Study
Abstract
In the geosciences, sweet spots are defined as areas of a reservoir that represent best production potential. From the outset, it is not always obvious which reservoir characteristics that best determine the location, and influence the likelihood, of a sweet spot. Here, we will view detection of sweet spots as a supervised learning problem and use tools and methodology from machine learning to build data-driven sweet spot classifiers. We will discuss some popular machine learning methods for classification including logistic regression, k-nearest neighbors, support vector machine, and random forest. We will highlight strengths and shortcomings of each method. In particular, we will draw attention to a complex setting and focus on a smaller real data study with limited evidence for sweet spots, where most of these methods struggle. We will illustrate a simple solution where we aim at increasing the performance of these by optimizing for precision. In conclusion, we observe that all methods considered need some sort of preprocessing or additional tuning to attain practical utility. While the application of support vector machine and random forest shows a fair degree of promise, we still stress the need for caution in naive use of machine learning methodology in the geosciences.
1 Introduction
In petroleum geoscience, sweet spots are defined as areas of oil or gas reservoirs that represent best production potential. In particular, the term has emerged in unconventional reservoirs where the reserves are not restricted to traps or structures, but may exist across large geographical areas. In unconventional reservoirs the sweet spots are typically combinations of certain key rock properties. Total organic carbon (TOC), brittleness, and fractures are some of the properties influencing possible production. In identifying these sweet spots, the operators face the challenge of working with large amounts of data from horizontal wells and modeling the complex relationships between reservoir properties and production.
In general, a more data-driven approach for sweet spot detection allows for a more direct use of less costly reservoir data, such as seismic attributes. Moreover, such an approach may potentially avoid parts of the expensive reservoir modeling. In particular, the time-consuming computations needed to build a full reservoir model can be avoided. Fast and reliable classification of the sweet spots is of high significance, as it allows for focusing efforts toward the most productive areas of a reservoir. This makes machine learning algorithms desirable, since these are typically fast to train, often easy to regularize, and have the ability to adapt and learn complex relationships.
The use of machine learning methodology for predicting and detecting potential areas of interest is gaining attention and is not new to the geosciences. A multidisciplinary workflow in order to predict sweet spot locations is presented in Vonnet and Hermansen (2015). An example of support vector machine application on well data for prediction purposes is given in Li (2005). In Wohlberg et al. (2006), the support vector machine is demonstrated as a tool for facies delineation, and in Al-Anazi and Gates (2010), the method is applied for predicting permeability distributions.
In this paper we continue this exploration and view sweet spot detection in a machine learning setting, framed as a traditional supervised learning problem, i.e., classification. These are data-driven algorithms that aim to learn relationships between the reservoir properties and sweet spots from labeled well-log training data. We illustrate different popular machine learning algorithms through a case study, considering a real and challenging data set with a weak signal for sweet spots. The algorithms we consider and compare are logistic regression, k-nearest neighbor (kNN), support vector machines (SVMs), and random forest.
We will emphasize a more moderate and cautious approach to uncritical use of machine learning for classification, wherein the awareness of what we can learn is of significance for interpreting the results. The main challenge here is related to the low data quality and the limited evidence for sweet spots (see Sect. 2). In such cases, the focus should be on the confidence of evidence of sweet spots, despite a potentially low discovery rate. There is usually a high cost associated to exploration and development of a field. It is therefore generally better to sacrifice some sweet spots (i.e., detection rate) in order to gain accuracy and precision. This is our main focus and we compare the ability of these machine learning algorithms to learn from a weak signal. We show how a simple modification can be used to improve such methods and how this improves recovering of the potential and providing sufficiently confident evidence of sweet spots. We also discuss the inadequacy of simple summary statistics for model validation and show that generally a more detailed investigation is needed in order to assess the actual performance.
In Sect. 2 we describe our real data set and set the sweet spot detection in a binary classification setting. Next, in Sect. 3, we discuss the machine learning algorithms used in this case study. Section 4 outlines the setup for training and validating the machine learning methods, before the numeric results are presented and discussed. Lastly, Sect. 5 concludes the case study.
The training and validation of machine learning methods and the predictions and numeric comparisons are carried out in R, using the package e1071, class, and randomForest.
2 Data and the Problem
Number of sweet spots and non-sweet spots in the four wells
Sweet spots: non-sweet spots | |
Well 1 | 40: 63 |
Well 2 | 9: 40 |
Well 5 | 38: 63 |
Well 6 | 19: 43 |
Note that the first four features (Vp, Vs, density, and AI) have been corrected for a depth trend. Thus, the a priori background model for the parameters has been removed, since this introduced a systematic bias in the predictions.
The S-wave velocity in the labeled data set appears to be artificially constructed from P-wave velocity, as the estimated correlation between the two is above 0.99, which is also verified by plotting. Traditionally, we would be inclined to exclude one of these variables in the statistical analysis, e.g., to avoid collinearity. However, we will keep both features in our training data to test and illustrate the robustness of the (probabilistic) model-free machine leaning methods.
True negative (TN) | False positive (FP) |
Correctly classified true non-sweet spots | Wrongly classified true non-sweet spots as sweet spots |
False negative (FN) | True positive (TP) |
Wrongly classified true sweet spots as non-sweet spots | Correctly classified true sweet spots |
In the sweet spot setting, we argue that TPR is of most importance, as an assurance of correct sweet spot predictions. On the other hand, a carefully balanced focus on the TDR will ensure that more sweet spots are found, at the cost of including misclassified sweet spots. Again, care is needed when tuning methods against these measures.
Moreover, we expect that there is an overrepresentation of sweet spots in the data. This seems obvious, since the initial or any wells are not placed randomly into the field, but they are placed exactly where the developers expect they have the greatest potential for success, i.e., in the sweet spots. This suggests that there is most likely a confounding, or omitted, variable not observed. The information and process underlying the positioning of wells can be thought of as an unobserved (and highly complex) variable influencing both the response and the explanatory variables. This may in turn result in an unbalanced data problem (too many sweet spots) and introduce potentially complex correlations among the explanatory variables and the response; see, among others, He and Garcia (2009) and King and Xeng (2001) for additional discussion. It is generally hard, or even impossible, to correct for such; see Li et al. (2011) for an attempt to correct the support vector machine. The logistic regression model is particularly sensitive; see also Mood (2010). As a final remark, if we consider the overall reservoir from which well logs are collected, we will expect a minority of the sweet spots, causing an additional imbalance, this time in the opposite direction. Proper treatment of such effects and possible extensions are outside the scope of this paper.
3 Machine Learning Methods
In general, machine learning refers to algorithms and statistical methods for data analysis. Here, we will focus on machine learning methodology for prediction of binary class labels, i.e., two class problems. It should be pointed out that all methods discussed can easily be generalized to multiclass problems. We will consider four common and popular supervised learning algorithms, which are the logistic regression, random forest, k-nearest neighbor (kNN), and support vector machine (SVM).
3.1 Logistic Regression
Logistic regression is a classical and popular model-based classification algorithm. We refer the reader to, e.g., Hastie et al. (2009) or any introductory textbook in statistics for a general introduction. The logistic regression model provides estimates for the probability of a binary response as a function of one or more explanatory variables. Since it is model based, it is possible to obtain proper and valid statistical inference, e.g., for statistical tests for feature selection. In addition, compared to some machine learning algorithms, e.g., kNN, SVM, or tree-based models, the outputs of a fitted logistic regression model can be interpreted as actual class probabilities under the model conditions. Most (model-free) machine learning algorithm only output class labels, and the probabilistic proxies are obtained and tuned from the raw outputs to mimic an output from a probabilistic model; see, for instance, Platt (1999) for an algorithm for obtaining class probabilities for SVMs.
The logistic regression model has certain well-known challenges. Firstly, compared to simple machine learning algorithms, like the kNN and SVMs, fitting a logistic regression model requires some form of semi-complex and iterative optimization algorithm (like gradient decent). On the other hand, since it is based on a low-dimensional parametric model (the number of parameters is essentially number of features + 1), the fitted model is very efficient for predicting in large grids. Another challenge is that the logistic regression is sensitive to collinearity and confounding; it is not particularly robust against outliers and may become hard to tune automatically (i.e., select the appropriate number of features to use); see, among others, Menard (2002) for details on applied use of logistic regression.
3.2 Random Forest
Random forest is the ensemble of multiple decision or classification trees; see, e.g., Hastie et al. (2009). A decision tree is a greedy approach that recursively partitions the feature space. A single decision tree will easily overfit the training data to the test data and has potentially a large bias. In particular, with noisy data, the generalization of a single decision tree is poor. To avoid overfitting, the ensemble of decision trees, i.e., random forest, averages multiple decision trees based on different resampling of training data. Each of the trees in the ensemble has potentially a high variance, and the averaging of the ensemble reduced this variance. In general, random forest is computationally efficient and is easily interpreted. For more details, we refer the reader to Breiman (2001).
3.3 k-Nearest Neighbor (kNN)
The k-nearest neighbor (kNN) algorithm is one of the simpler and more robust supervised learning algorithms. An introduction can be found in any introductory textbook in machine learning. The algorithm classifies a new observation, or location, by comparing it with the k-nearest observations in the training set and classifies the new observation according to the dominant class. This algorithm is completely model-free and nonparametric. However, each new prediction needs a unique nearest neighbor search. This makes the algorithm less efficient for large data sets and prediction grids. The best choice of the number of neighbors, k, depends upon the data. In our case we perform a cross validation to find this parameter. In general, small values of k may result in noisier results. Larger values of k reduce the effect of noise, but make boundaries between classes less distinct. This algorithm will always improve with more data, and the method is known to work well in simpler classification problems; see also Beyer et al. (1999).
3.4 Support Vector Machine (SVM)
For data sets that are not completely separable, the concept of soft margin is introduced to allow some data to be within the margin. SVM now attempts to find a hyperplane that separates the data as cleanly as possible, however, not strictly enforcing that there are no data in the margin (hence the term soft margin). The soft margin is controlled through a regularization parameter, often referred to as C. A large value for this regularization parameter aims for a smaller soft margin and fewer misclassified points. On the other hand, a small value for the regularization parameter aims for a larger soft margin, allowing more points to be misclassified and yielding a smoother decision boundary. Figure 4b has interchanged three points between the blue and red classes, making the data set linearly inseparable. This figure shows the separating plane as the black line, support vectors again marked with circles, and we observe that some points are allowed to appear within the margins (dashed lines).
The SVMs handle nonlinear classification by applying the so-called kernel trick, which allows for nonlinear decision boundaries, while the algorithm for the linear SVM still can be applied for determination of the hyperplane. The kernel trick can be thought of as mapping the observation points into some higher-dimensional space, in which an optimal separating hyperplane is found. Projecting the hyperplane back to the original space yields a nonlinear decision boundary. A typical choice for the kernel function applied is the radial basis function; see, e.g., Hastie et al. (2009). The radial basis function kernel is a scaled version of the Gaussian kernel, in which the squared Euclidean distance between two features is scaled by a free parameter. In the following, we will denote this kernel parameter γ. Adjusting these parameters allows the decision boundary to go from finely detailed decision boundary to a coarser distinction between the classes. Figure 4c shows a nonlinear separating boundary.
The use of SVMs is of great interest as a sweet spot classifier, as it is known to perform well in classification problems where the decision regions of the feature space are of a smooth geometric nature, as we expect to be the case in several applications in the geosciences. The SVM is often referred to be the “out-of-the-box” classifier and is known to be of high accuracy and has the ability to deal with high-dimensional data, i.e., usually no preselection of features is needed.
For a more extensive introduction to SVMs, the reader is referred to Bishop (2006) and Cortes and Vapnik (1995).
4 Numeric Comparisons
In the following, we first outline the setup for validating the various machine learning methods. Next, we report results of several comparisons. Along with the discussion of the results, we present additional tuning of the methods to sharpen and balance the performances.
4.1 Training, Testing, and Validation of Methods
To evaluate the machine learning methods, we use the labeled data and carry out a fitting (training and testing) and validation setup. In 100 rounds of validation, we assign 30–70 % of the labeled data set (randomly) for validating. The rest is left for fitting. In the validation, the fitted methods are applied on the validation data set, and F1 score, True Prediction Rate (TPR), and True Detection Rate (TDR) are recorded. For fitting of the methods (training and testing), again 30–70 % is assigned (randomly) for testing the methods, leaving the rest of the data set for training. In both training and testing, cross validation is used to obtain optimal parameters for the algorithms. Here we have focused on maximizing mainly the TPR value, but also various Fβ-scores. After the 100 rounds of training, testing, and validating, we average the obtained performance measures.
Note that when validating the methods, we randomly choose the observations from all of the four wells. We also consider a more real-case predicting study, where we sequentially hold out one well, fitting the methods on the remaining three wells, and investigate performance on the held-out well.
The optimal parameters, found by cross validation, refer to the parameters yielding, e.g., the largest TPR score. For the kNN we find the optimal number of nearest neighbors 0 < k <30. For the SVM we cross validate for the regularization parameter \( {2}^{-5}< C<{2}^{10} \) and kernel parameter \( {2}^{-10}<\gamma <{2}^5 \). For the random forest algorithm, ensembles of up to a couple of 1000 trees were tested. Interestingly we saw no significant change in performance for ensembles of more than 100 trees.
4.2 Results
Summary of the performance of random forest, kNN (tuned k), and SVM (tuned C and γ)
Random forest | kNN | SVM | Random | ||||
---|---|---|---|---|---|---|---|
F1 | 0.24 | 0.23 | 0.51 | 0.50 | 0.50 | 0.49 | 0.40 |
TPR | 0.33 | 0.32 | 0.35 | 0.34 | 0.36 | 0.34 | 0.33 |
TDR | 0.21 | 0.19 | 1.00 | 1.00 | 0.88 | 0.89 | 0.50 |
From Table 2 we observe that all methods perform more or less equally with the random classifier, with quite low detection rate. Note that both kNN and SVM seem to perform well with the high detection rates. These rates, however, are a consequence of several cases of classifying all predictions as sweet spots, hence finding all, at the cost of significant misclassification rates. This suggests that additional fine-tuning, or preprocessing, is needed to improve the potential.
In further tuning of the methods, we were able to obtain better measures for all reported methods. Specifically, the tuning of random forest consists of excluding the features 4D residual and average magnitude of reflectivity. These features have a negative variable importance measure score; see Liaw and Wiener (2002). It is interesting to note that the kNN algorithm was essentially (with unchanged scores) insensitive to this preprocessing. Furthermore, the SVM actually did worse on the reduced data set, suggesting that SVM is able to make the feature selection on its own.
Therefore, to further improve performance, we included an additional fine-tuning parameter with the aim of obtaining a higher level of precision, i.e., TPR score, by increasing the threshold used by each algorithm to classify observations into the respective classes. This makes it harder, by requiring more evidence, to classify locations as sweet spots.
Summary of the performance for random forest (excluding the features 4D residual and average magnitude of reflectivity), kNN (also threshold tuned), and SVM (also threshold tuned)
Random forest | kNN | SVM | ||||
---|---|---|---|---|---|---|
F1 | 0.32 | 0.34 | 0.41 | 0.38 | 0.21 | 0.27 |
TPR | 0.38 | 0.40 | 0.38 | 0.35 | 0.49 | 0.44 |
TDR | 0.30 | 0.33 | 0.61 | 0.58 | 0.20 | 0.26 |
Comparing reported results in Table 3 with Table 2, we see that (prior) feature selection for random forest increases both precision and detection and seems to be the winner among the three. Additional tuning provided no significant improvements to the kNN algorithm, suggesting that sophisticated versions of kNN are required, e.g., the popular (Friedman 1994) or the more involved (Goldberger et al. 2005). The SVM algorithm received a considerable increase in the TPR score, indicating a good potential for additional fine-tuning of the SVM toward the most important properties (e.g., a predefined balance between TPR and TDR).
Obtained performance measures when sequentially holding out one well at a time
Random forest | SVM | kNN | |||||
---|---|---|---|---|---|---|---|
TPR | TDR | TPR | TDR | β | TPR | TDR | |
Well 1 | 0.55 | 0.15 | 0.60 | 0.30 | 0.30 | 0.39 | 1.00 |
Well 2 | 0.25 | 0.33 | 0.00 | 0.00 | 0.40 | 0.19 | 0.78 |
Well 5 | 0.54 | 0.18 | 0.50 | 0.37 | 0.30 | 0.38 | 0.55 |
Well 6 | 0.53 | 0.42 | 0.38 | 0.53 | 0.45 | 0.31 | 1.00 |
Next, Fig. 5b shows predictions in all four wells using kNN, with only tuning of the number of neighbors, k (as specified for Table 2). This poor performance is included to illustrate how “good” performance measures indeed transfer to real field. Although we might be led to believe in the predicting power of kNN from Table 2, here kNN is either useless (as in Wells 1 and 6) or yields quite noisy predictions (as in Wells 2 and 5). Also, note that for Well 5 the performance measures in Table 4 are indeed the same as one would expect from a random classifier.
Selecting an appropriate weight β for each well yields predictions in the wells as displayed in Fig. 5c. By appropriate we here refer to the weights that best balance TPR and TDR, typically at the point where TPR and TDR cross in Fig. 6. Here, the “optimal” balance point is determined by inspecting of Fig. 6. Table 4 reports the weight β used for each of the wells. In general, we now observe in Fig. 5c that the detection has increased compared to random forest, as well as the precision is kept at an acceptable level, indicating good generalization potential.
As pointed out earlier, four of the features in the data set have been corrected for a depth trend. Figure 5d displays the obtained predictions applying random forest by including depth as an independent feature. We observe a seemingly good match, indicating possible spurious relationship. In the well data in our case study, the majority of the defined sweet spots are indeed located toward the bottom of the reservoir. However, none of the other methods performed acceptably with the depth trend; results were indeed worse.
5 Conclusion
In this paper we have illustrated the application of machine learning methods to a small, but challenging, real field case study of sweet spot detection. The data set has weak evidence of sweet spots, and validation of the methods supports the difficulty of detection. To increase the performance of the methods, we illustrate and discuss a simple solution. As a concluding summary, random forest, given proper preprocessing and feature selection, seems a safe and simple choice, at least for the described data set. Next, SVM shows flexibility and a good potential by responding well to tuning of parameters. SVM is able to obtain acceptable rates and proves transferrable to field predictions. However, unguided use of SVM easily leads to poor performance. The simple kNN with the described tuning does not seem to yield trustworthy results, and logistic regression failed already in the onset of these analyses and did not recover. In general, machine learning algorithms should be used with caution and proper preprocessing, and guided tuning seems to be needed for obtaining reasonable performance.
Notes
Acknowledgment
We thank Arne Skorstad and Markus Lund Vevle, both at Emerson Process Management Roxar AS, for the data set and for answering questions related to it.
Bibliography
- Al-Anazi A, Gates I (2010) A support vector machine algorithm to classify lithofacies and model permeability in heterogeneous reservoirs. Eng Geol 114(3–4):267–277CrossRefGoogle Scholar
- Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Database theory — ICDT’99, vol 1540. Springer, Berlin, pp 217–235CrossRefGoogle Scholar
- Bishop CM (2006) Pattern recognition and machine learning (Information science and statistics). Springer, New YorkGoogle Scholar
- Breiman L (2001) Random forest. Mach Learn 45(1):5–32CrossRefGoogle Scholar
- Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297Google Scholar
- Friedman J (1994) Flexible metric nearest neighbor classification. Stanford UniversityGoogle Scholar
- Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighborhood components analysis. Adv Neural Inf Process Syst 17:513–520Google Scholar
- Hastie TJ, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New YorkCrossRefGoogle Scholar
- He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
- King G, Xeng L (2001) Logistic regression in rare events data. Polit Anal 2:137–163CrossRefGoogle Scholar
- Li J (2005) Multiattributes pattern recognition for reservoir prediction. CSEG Natl Conv 2005:205–208Google Scholar
- Li L, Rakitsch B, Borgwardt K (2011) ccSVM: correcting support vector machines for confounding factors in biological data classification. Bioinformatics 27(13):i342–i348CrossRefGoogle Scholar
- Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22Google Scholar
- Menard S (2002) Applied logistic regression analysis. Sage, Thousand OaksCrossRefGoogle Scholar
- Mood C (2010) Logistic regression: why we cannot do what we think we can do, and what we can do about it. Eur Sociol Rev 26(1):67–82CrossRefGoogle Scholar
- Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74Google Scholar
- Vonnet J, Hermansen G (2015) Using predictive analytics to unlock unconventional plays. First Break 33(2):87–92Google Scholar
- Wohlberg B, Tartakovsky D, Guadagnini A (2006) Subsurface characterization with support vector machines. IEEE Trans Geosci Remote Sens 44(1):47–57CrossRefGoogle Scholar