Abstract
Acoustic impedance is the product of the density of a material and the speed at which an acoustic wave travels through it. Understanding this relationship is essential because low acoustic impedance values are closely associated with high porosity, facilitating the accumulation of more hydrocarbons. In this study, we estimate the acoustic impedance based on nine different inputs of seismic attributes in addition to depth and two-way travel time using three supervised machine learning models, namely extra tree regression (ETR), random forest regression, and a multilayer perceptron regression algorithm using the scikit-learn library. Our results show that the R2 of multilayer perceptron regression is 0.85, which is close to what has been reported in recent studies. However, the ETR method outperformed those reported in the literature in terms of the mean absolute error, mean squared error, and root-mean-squared error. The novelty of this study lies in achieving more accurate predictions of acoustic impedance for exploration.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
The best definition of acoustic impedance is given in medical physics (Suzuki et al. 2019). They defined it as the resistance to the propagation of ultrasound waves through tissues. In parallel, the Earth’s layers can be likened to these biological tissues. The acoustic impedance is derived within each layer by multiplying the density of the material by its acoustic velocity (g/cm3 × m/s). Based on the lithology of the layer, it is understood that porosity is typically high in sand and low in shale. An increase in acoustic impedance implies a decrease in porosity (Agbadze et al. 2022). Since low acoustic impedance and high porosity are conducive to accumulating additional hydrocarbons in each layer, sands are more likely than shales to accumulate hydrocarbons (Ali and Al-Shuhail 2018). Recognizing this, there is a growing emphasis on developing methods to determine acoustic impedance as an intrinsic property of rock layers. Current research methodologies for estimating acoustic impedance can be broadly categorized into nonmachine learning-based and machine learning (ML)-based techniques. Some examples of nonmachine learning-based methods include direct, iterative, and nonlinear inversion methods (Liu et al. 2018).
A review of the latest ML techniques for acoustic impedance estimation was conducted by Zeng et al. (2021). They introduced new ML methods to predict reservoir parameters using both post- and prestack seismic attributes. Hampson et al. (2001) utilized multiattribute transforms and neural networks to predict log characteristics from seismic data. Cracknell and Reading (2013) analyzed aircraft and satellite data using random forest (RF) and support vector machines to identify lithologic contact zones. Harris and Grunsky (2015) employed geophysical and geochemical data to predict lithology using RF. Zhang et al. (2018) conducted an experiment using deep neural networks and convolutional neural networks (CNNs) to predict seismic lithology. Biswas et al. (2019) and Das and Mukerji (2020) advocated for pre- and poststack seismic inversion using CNNs. Priezzhev et al. (2019) compared various machine learning-supported regression models, including random forest, nearest neighbor, neural network, and adaptive classifier-ensemble models. According to a recent review by Zeng et al. (2021), the random forest method is deemed one of the most effective methods for addressing highly nonlinear problems. Our research aims to explore the methodology proposed by Mardani, which employs multilayer neural backpropagation to estimate acoustic impedance using six distinct seismic attribute inputs: amplitude, second derivatives, trace gradient, quadrature amplitude, instantaneous frequency, and gradient magnitude (Mardani and Thrust 2020).
This study aims to estimate acoustic impedance with enhanced well log resolution using various machine learning methods. These methods leverage extended inputs of seismic attributes to achieve greater accuracy. Several techniques can assess the accuracy of our predictions, with one notable approach being the coefficient of determination (R2) values. This research aims to achieve R2 values near one, signifying that our predictions closely align with the actual acoustic impedance values. To meet these objectives, we face several challenges: How can we determine the acoustic impedance on a continuous well log scale using seismic attribute inputs that differ from those reported earlier in the literature? For instance, how does the extra trees regressor algorithm compare to the random forest and the multilayer perceptron regression algorithm? Moreover, how can we effectively apply these methods to real-world data for interpretation?
Methodology
This study primarily explored the application of machine learning in predicting acoustic impedance, contrasting it with the conventional band-limited impedance (BLIMP) inversion method. The groundwork data for the traditional and ML approaches were described in a prior paper (Mardani and Thrust 2020). In our study, we utilized BLIMP acoustic impedance solely for comparative analysis.
The implementation of acoustic impedance based on recursive inversion using poststack time-migrated seismic data and well logs results in a band-limited inversion. This inversion has a surface seismic frequency ranging from approximately 10–50 Hz (Mardani and Thrust 2020; Russell 1988).
Machine learning methods
Machine learning methods are utilized through the scikit-learn 1.2.1 library (Pedregosa et al. 2011). In regard to training and prediction, the input consists of depth, two-way travel time, and various seismic attributes. These attributes encompass amplitudes, their integrals, trace gradients, quadrature amplitudes, second derivatives, gradient magnitudes, instantaneous frequencies, phases, and cosine phases. The primary target for these procedures is the well log acoustic impedance, commonly referred to as the true AI. The output generated is the predicted acoustic impedance at the well log resolution.
Algorithms
In this study, three models were employed: multilayer perceptron regression (MLPR), random forest regression (RFR), and extra tree regression (ETR). A brief description of each is given below.
Multilayer perceptron regression (MLPR)
The multilayer perceptron (MLP) is encapsulated within the MLPRegressor class, which employs a backpropagation-trained multilayer perceptron. Given that the output consists of a series of continuous numbers, the square error serves as the loss function (Pedregosa et al. 2011). Broadly speaking, the inherent specifications of the MLPR function include a hidden layer size of 100, the rectified linear unit (ReLU) activation function (which outputs the input directly if it is positive and zero if it is negative), the Adam optimizer for optimization, an autobatch size, a constant learning rate, and a maximum iteration count of 1000, among others.
Random forest regression (RFR)
A random forest operates as a meta-estimator aggregating numerous decision trees, each trained on various subsamples of datasets. This aggregation aims to enhance the prediction accuracy and curtail overfitting (Pedregosa et al. 2011). The built-in function encompasses several estimators set at 100 and employs the squared error as its criterion, among other features.
Step-by-step mathematical derivation of random forest regression using scikit-learn (Pedregosa et al. 2011):
-
Initialize max_depth as the maximum depth of each tree and n_estimators as the number of trees in the forest.
-
For every forest tree, create a new dataset by randomly selecting a portion of the training data (with replacement), choose a subset of the features at random to consider while dividing each tree node, and build a decision tree using the new dataset and selected characteristics, with a maximum depth of max_depth.
-
To predict a new data point, each tree in the forest was predicted, and the predictions were averaged to obtain the final prediction.
The mathematical formulation for the random forest algorithm, as outlined by Pedregosa et al. (2011), is as follows:
-
1.
Initialization
-
n_estimators: This represents the number of trees in the forest.
-
max_depth: This denotes the maximum depth of each decision tree.
-
2.
For each tree in the forest
-
a.
Dataset creation
-
Let D be the original dataset containing N samples.
-
D′, a new dataset of size N′ (where N′ ≤ N), is constructed by randomly selecting N′ samples from D with replacement.
-
b.
Feature Selection.
-
Let n be the total number of features in the dataset.
-
Here, n′ is defined as the number of features to consider when splitting each node in the tree, ensuring that n′ ≤ n.
-
Let F represent the set of all features in the dataset.
-
F′, a new set of features, is constructed by randomly selecting n′ features from F without replacement.
-
c.
Decision Tree Construction.
-
A decision tree, denoted as T, is built using dataset D′ and feature set F′, ensuring that it does not exceed the specified maximum depth.
-
4.
Prediction for a New Data Point:
-
5a.
Individual Tree Predictions:
-
Let T1, T2…., and Tn be the n decision trees in the forest.
-
For a new data point x to be predicted, let yi be the prediction made by tree Ti for x.
-
b.
Final Prediction:
The final prediction y for x is determined by averaging the predictions y1, y2...., yn.
Extra tree regressor
For implementation in this class, averaging is employed as part of the meta-estimator approach. This method fits multiple randomized decision trees, often referred to as "extra trees," on various subsamples of the dataset. This strategy aims to enhance the prediction accuracy and mitigate overfitting (Pedregosa et al. 2011) (Figs. 1, 2, 3, 4, 5).
Results and discussion
Data visualization and analysis
We utilized the dataset from Mardani and Thrust (2020) to illustrate our selected method. The data are depicted in Fig. 6, which comprises an extension to nine inputs of seismic attributes in addition to those of depth and TWTT. The details and definitions of each attribute can be found in Appendix A. A plot of the zero-offset seismic trace is shown in the first track on the left. Seismic traces for all wells are used to extract the following training input attributes: the amplitude of the seismic trace, integral, second derivative, quadrature amplitude, trace gradient, gradient magnitude, instantaneous frequency, phase, and cosine of the phase as the nine of eleven input features, while the acoustic impedance at well log resolution is the target (Mardani and Thrust 2020).
The relationships between the attributes are more clearly visualized in the cross-correlation matrix presented in Fig. 7. Within this matrix, values closer to 1 indicate a positive correlation, while those nearing -1 signify a negative correlation. Leveraging the data splitting feature, we designate the AI log (acoustic impedance derived from the well log) as both the target and output. Other than AI_HRS_inv are treated as inputs. The AI_HRS_inv represents the band-limited inversion from prior research, generated using Hampson Russel software (Mardani 2020). According to the matrix, the most robust input‒output relationship exists between Quadr and AI_Log.
To evaluate the performance of the selected methods, we present the errors in the prediction results for RFR, MLPR, and ETR using the default parameters in Table 1. The comparison reveals that ETR consistently exhibits lower values for MAE, MSE, and RMSE than the other methods. To illustrate the performance of the three selected methods, we compare the R2 values between the actual and predicted impedance values in Fig. 8. Notably, the R2 value for the ETR surpasses those for both the RFR and MLPR. Specifically, ETR achieves the highest R2, followed by RFR, outperforming MLPR. These findings further reveal that, in comparison to Mardani and Thrust (2020), who achieved an R2 of 0.88 using the TensorFlow platform, both our ETR and RFR methods yield superior R2 values, while our MLPR produces a slightly inferior result. Such discrepancies can arise due to variations in parameters such as the number of features, and platforms including different libraries utilized. In our study, we employed the sci-kit Learn 1.2.1 libraries for computations, whereas Mardani and Thrust (2020) utilized TensorFlow and Keras. Furthermore, we compare histograms of the prediction errors of the selected methods in Fig. 9. It is evident that the prediction error of the ETR method primarily spikes at zero error, unlike that of the RFR and MLPR methods, which display a broader spread of nonzero errors. This observation leads to the ranking of prediction error quality for the three-based methods as ETR > RFR > MLPR.
The learning characteristics of the three regression methods can be discerned from Fig. 10. The training and testing scores of the RFR method gravitate toward 1, whereas the MLPR hovers approximately 0.6, although it continues to rise with an increase in training size. This suggests that the RFR method requires only a minimal training size, as it plateaus with a flat curve after reaching a training size of approximately 4000. This indicates that, given the same training size, the learning efficacy of RFR surpasses that of the MLPR method. Notably, the ETR, which stands out as our most effective method, consistently delivers outstanding training outcomes, ranging from a training size of 500 and maintaining this performance consistently to the endpoint, akin to the other two methods, ultimately yielding the most commendable test scores.
The final estimations of the acoustic impedance are shown in Fig. 11. The band-limited AI was reproduced using available data from Mardani and Thrust (2020) to investigate how the property changes with varying depth. They generally follow a similar trend but are unable to match the same variation for localized parts (Mardani and Thrust 2020). This figure juxtaposes the acoustic impedance derived from well logs, the predicted AI using the extended inputs, and the band-limited inversion. From the figure, it is evident that the AI predicted by the RFR method, represented by a red line, aligns more closely with the true AI, depicted by a blue line, than the AI predicted by the MLPR method, which is indicated by a yellow stripe. Overall, the most accurate prediction is rendered by the ETR method, highlighted by the green line. Furthermore, the predicted acoustic impedance reveals insights suggesting that hydrocarbons are likely to accumulate at depths ranging from approximately 3100–3200 m and 3300–3400 m.
Conclusions
-
1.
The estimation of acoustic impedance, utilizing extended inputs of depth, two-way travel time and seismic attributes, employed regression methods, including multilayer perceptron regression, random forest regression, and extra tree regression with the scikit-learn library.
-
2.
Our RFR and ETR studies exhibited improved determination coefficient values, surpassing those reported in the literature, even with a larger dataset due to additional features.
-
3.
Among the tested models, the extra tree regression model demonstrated superior performance in terms of the coefficient of determination and was particularly suitable for highly nonlinear well log scales.
-
4.
We conclude that for hydrocarbon exploration based on the available acoustic impedance data, the ETR model with default parameters represents the optimal choice for more accurate predictions. Other datasets may have different options.
-
5.
Future studies should consider adopting the ETR method with diverse real datasets, incorporating additional seismic attributes, and exploring alternative machine learning models. This comprehensive approach aims to identify an even more accurate machine learning model that can then be validated using real-world data.
Data availability
The research data and codes before modification are available at: https://github.com/mardani72/AI_ML_Seismic_Log/blob/master/AI_From_Seismic_Attributes_ML_final.ipynb
Abbreviations
- B :
-
N_estimator, the number of decision trees in the forest
- f(x) = max(0,x):
-
ReLu function
- \(\left.{f}_{b}(x\right)\) :
-
The predicted target variable for the input data point x by the bth decision tree in the random forest
- w hj :
-
Weights between the first and middle or hidden layer
- x o = + 1:
-
Bias of the first layer
- X = x j ; j =1 → d :
-
Features of the first layer
- y i :
-
Output
- \(\hat{y}\) :
-
Predicted target variable for the input data point x
- Z = z j; j =1 → d :
-
Features of the middle layer
- η :
-
Learning rate, which is set to a value greater than zero
- \(v\) ih :
-
Weights between the middle and upper or output layer
- \(\Delta {v}_{{\text{h}}}\) :
-
Incremental weight between the hidden and output layers
- \(\Delta {w}_{{\text{h}}j}\) :
-
Incremental weight between the first and hidden layers
- AI:
-
Acoustic impedance
- ANN:
-
Artificial neural network
- MLPR:
-
Multilayer perceptron regression
- RFR:
-
Random forest regression
- ETR:
-
Extra tree regression
- MAE:
-
Mean absolute error
- MSE:
-
Mean squared error
- RMSE:
-
Root-mean-squared error
- ReLu:
-
Rectified linear unit
References
Agbadze OK, Qiang C, Jiaren Y (2022) Acoustic impedance and lithology-based reservoir porosity analysis using predictive machine learning algorithms. J Pet Sci Eng 208:109656
Ali A, Al-Shuhail AA (2018) Characterizing fluid contacts by joint inversion of seismic P-wave impedance and velocity. J Pet Explor Prod Technol 8:117–130
Barnes AE (2016) Handbook of poststack seismic attributes. Society of Exploration Geophysicists
Biswas R, Sen MK, Das V, Mukerji T (2019) Prestack and poststack inversion using a physics-guided convolutional neural network. Interpretation 7(3):SE161–SE174. https://doi.org/10.1190/INT-2018-0236.1
Chu Z, Yu J, Hamdulla A (2021) Throughput prediction based on Extra Tree for stream processing tasks. Comput Sci Inf Syst 18(1):1–22
Cracknell MJ, Reading AM (2013) The upside of uncertainty: Identification of lithology contact zones from airborne geophysics and satellite data using random forests and support vector machines. Geophysics 78(3):WB113–WB126
Das V, Mukerji T (2020) Petrophysical properties prediction from prestack seismic data using convolutional neural networks. Geophysics 85(5):N41–N55. https://doi.org/10.1190/geo2019-0650.1
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42
Hampson DP, Schuelke JS, Quirein JA (2001) Use of multiattribute transforms to predict log properties from seismic data. Geophysics 66(1):220–236
Harris JR, Grunsky EC (2015) Predictive lithological mapping of Canada’s North using random forest classification applied to geophysical and geochemical data. Comput Geosci 80:9–25
Liu J, Zhang J, Huang Z (2018) Accurate estimation of acoustic impedance based on spectral inversion. Geophys Prospect 66(1):169–181
Mardani RA, Thrust GV (2020) Estimation of acoustic impedance from seismic data in well-log resolution using machine learning, neural network, and comparison with band-limited seismic inversion
Maurya SP, Singh NP (2018) Comparing pre-and post-stack seismic inversion methods-a case study from Scotian Shelf. Canada J Ind Geophys Union 22(6):585–597
Maurya SP, Singh NP (2019) Estimating reservoir zone from seismic reflection data using maximum-likelihood sparse spike inversion technique: a case study from the Blackfoot field (Alberta, Canada). J Pet Explor Prod Technol 9:1907–1918
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in {P}ython. J Mach Learn Res 12:2825–2830
Petre I (2021) How to train a multilayer perceptron for regression. https://www.youtube.com/watch?v=Y-38j9pZ_QQ
Priezzhev II, Veeken PCH, Egorov SV, Strecker U (2019) Direct prediction of petrophysical and petroelastic reservoir properties from seismic and well-log data using nonlinear machine learning algorithms. Lead Edge 38(12):949–958
Russell BH (1988) Introduction to seismic inversion methods (Issue 2). SEG Books.
Suzuki S, Gerner P, Lirk P (2019) Local anesthetics. In: Pharmacology and physiology for anesthesia. Elsevier, pp 390–411
Zeng H, He Y, Zeng L (2021) Impact of sedimentary facies on machine learning of acoustic impedance from seismic data: lessons from a geologically realistic 3D model. Interpretation 9(3):T1009–T1024
Zhang G, Wang Z, Chen Y (2018) Deep learning for seismic lithology prediction. Geophys J Int 215(2):1368–1387
Acknowledgements
The authors thank the College of Petroleum Engineering and Geosciences at King Fahd University of Petroleum and Minerals for supporting this research.
Funding
This study was funded by the College of Petroleum Engineering and Geosciences at King Fahd University of Petroleum and Minerals.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest.
Ethical approval
The authors followed the moral standards of publications.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1. Seismic Attributes (Barnes 2016)
Appendix 1. Seismic Attributes (Barnes 2016)
Amp (amplitude)
A measure of the raw amplitude of seismic trace values
D2 (Second Derivative)
If raw amplitude profiles do not manage to present continuity well enough, interpreters might consider the second derivative attribute quite helpful.
Int (Integral)
The integration of the raw trace amplitude over time
Quadr (quadrature amplitude)
Quadrature amplitude is the imaginary component of the analytical signal that is derived from the 90° phase of the original trace, through the Hilbert transform. When combined with the real part, they create the analytical signal.
Trace gradient
The gradient along the trace. The most significant gradient occurs at the most significant change.
Gradient magnitude
The magnitude of the instantaneous gradient in 3 dimensions utilizing adjacent traces.
Instantaneous frequency
The first time derivative of the instantaneous phase scaled to Hertz units.
Phase
The average value of a signal's phase spectrum; the relative position along a sinusoid.
CosPhase
Cosine of the phase.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Surachman, L.M., Abdulraheem, A., Al-Shuhail, A. et al. Acoustic impedance prediction based on extended seismic attributes using multilayer perceptron, random forest, and extra tree regressor algorithms. J Petrol Explor Prod Technol 14, 1923–1931 (2024). https://doi.org/10.1007/s13202-024-01795-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13202-024-01795-7