Smart predictions of petrophysical formation pore pressure via robust data-driven intelligent models

Krishna, Shwetank; Irfan, Sayed Ameenuddin; Keshavarz, Sahar; Thonhauser, Gerhard; Ilyas, Suhaib Umer

doi:10.1007/s41939-024-00542-z

Smart predictions of petrophysical formation pore pressure via robust data-driven intelligent models

Original Paper
Open access
Published: 24 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Multiscale and Multidisciplinary Modeling, Experiments and Design Aims and scope Submit manuscript

Smart predictions of petrophysical formation pore pressure via robust data-driven intelligent models

Download PDF

Shwetank Krishna¹,
Sayed Ameenuddin Irfan²,
Sahar Keshavarz¹,
Gerhard Thonhauser¹ &
…
Suhaib Umer Ilyas³

393 Accesses
Explore all metrics

Abstract

Predicting pore pressure in the formation is crucial for assessing reservoir geomechanical characteristics, designing drilling schemes/mud programs, and strategies to enhance oil recovery. Accurate predictions are vital for safe and cost-effective exploration and development. Recent research has seen the emergence of intelligent models utilizing machine learning (ML) and deep learning (DL) algorithms, offering promising outcomes. However, there remains a need to identify the most accurate and dependable model among these. This study aims to address this gap by comparing the performance of various ML and DL models, as reported in existing literature, to determine the optimal approach for pore pressure prediction. The sorted machine learning (ML) and deep learning (DL) regression algorithms used for the comparative analysis are decision tree (DT), extreme gradient boosting (XGBoost), random forest (RF), recurrent neural network (RNN), and convolutional neural network (CNN). A total dataset of 22,539 is gathered from five wells (15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, 15/9-F-11 T2, and 15/9-F-14) drilled at North-sea Volve oil field, Norway. The first four wells are used to train and test the ML and DL algorithm, and the remaining well (15/9-F-14) is used to evaluate the best-performing algorithm’s universality in predicting pore pressure at the field of study. Seven different petrophysical parameters are used as input parameters to develop the predictive models. Statistical performance metrics are carried out to analyze the applied ML and DL performance. Based on performance indicators, the RF algorithm showed superior results compared to other predictive models with R² and RMSE values of 0.97 and 2.70 MPa, respectively. Furthermore, the best-performing predictive model with low prediction error RMSE value is applied to the other well dataset from the field of study to access the universality of the RF algorithm to predict pore pressure in the field of study. The results of the universality analysis show a satisfactory prediction accuracy with R² and RMSE values of 0.905 and 6.48 MPa, respectively.

Estimation of Bottom-Hole Temperature Based on Machine/Deep Learning

Exploratory analysis of machine learning methods in predicting subsurface temperature and geothermal gradient of Northeastern United States

Article Open access 02 July 2021

Translation of machine learning approaches into gas hydrate saturation proxy: a case study from Krishna-Godavari (KG) offshore basin

Article 14 June 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Background

The pore pressure variation concerning depth is important for successful oil and gas extraction from conventional or unconventional reservoirs. Historically, the pore pressure of sedimentary basin strata remains difficult to predict despite decades of study (Abdelghany et al. 2021; Flemings 2021; Radwan et al. 2021; Ramdhan and Goulty 2011, 2010; Zhang 2011). Pore pressure (PP), sometimes called formation pressure, is the fluid pressure within the subsurface formation pore due to the effect of hydraulic potential (Oloruntobi et al. 2020). Oil and gas well drilling is considered safe when the wellbore hydrostatic pressure is between the pore range and fracture pressure (Darvishpour et al. 2019; Richards et al. 2020). Inaccurate prediction of pore pressure during drilling can result in unforeseen hazardous phenomena such as fluid loss, which could result in a kick (uncontrolled flow of formation fluid in the wellbore) and ultimately lead to a blowout, which will cause incurable damage to the drilling rig and well personnel (Mahetaji et al. 2020; Zhang et al. 2022, 2020). Usually, pore pressure is determined using well logs and seismic data. However, these datasets help to provide information about the pore pressure in the proximity of the borehole. Furthermore, calculating pore pressure during drilling takes a considerable amount of time, effort, and money, and occasionally, the data needed to estimate pore pressure may not be recorded effectively because of poor hole conditions. Thus, it would be appropriate to construct a roadmap based on machine learning (ML) that can effectively predict pore pressure profiles based on depth profiles without requiring frequent measurements of the formation pressure. Additionally, it is advantageous for pore pressure estimates to be reliably extrapolated outward from a wellbore into formations that are not drilled. These ML roadmaps are beneficial for optimizing and improving pore pressure estimates and serve as essential parameters in calculating geomechanical variables and help improve the quality of reservoir models (Radwan et al. 2022).

Geoscience and subsurface engineering are a few problems where machine learning techniques are widely applied. Indeed, technology and machine learning algorithms are advancing and revolutionizing the energy sector. Solving the industrial issues accurately in less time, cost, and effort will significantly impact commercial production and resource recovery in the oil and gas industry. The ML technique has been successfully applied in the earth sciences and subsurface engineering disciplines by utilizing geophysical, lithological, and petrophysical data (Poulton 2002; Silversides et al. 2015; Xie et al. 2018). Subsequently, the application of ML techniques continues to diversify and grow across the various divisions of geoscience and petroleum engineering (Anifowose et al. 2011; Schmidhuber 2015). Recently, some studies have been published that utilize the ML learning technique to predict formation pore pressure (Booncharoen et al. 2021; Paglia et al. 2019; Wei et al. 2021). A brief discussion of these existing models is discussed in the next section.

1.2 Existing models

Few studies have been published on applying Multiple ML techniques for predicting pore pressure. In this section, we discuss the recent development of this technique pore pressure prediction application (Ahmed et al. 2019; Booncharoen et al. 2021; Farsi et al. 2021; Radwan et al. 2022; Wei et al. 2021; Yu et al. 2020; Zhang et al. 2022). Yu et al. (2020) suggested an ML technique that utilizes the explicit petrophysical log inputs, namely, sonic velocity, shale volume, and porosity, to predict pore pressure in a subsurface formation with varying lithology. In this study, a normally pressured sequence was trained using a combination of the petrophysical properties and theoretically effective stress—subsequently, the Bower’s unloading equation for the overpressure zones. A total of four ML techniques are studied: multilayer perceptron (MLP) neural network, random forest (RF), support vector machine (SVM), and gradient boosting. Based on the results, all four ML techniques reveal good agreement between measured and predicted pore pressure data, where the RF technique outperforms the other ML techniques. Moreover, the results also depicted the superiority of the RF model in detecting the onset of overpressure compared to other tested ML models. Farsi et al. (2021) compared the efficiency of three hybrid ML models to predict pore pressure. These three hybrid-ML models are multilayer perceptron (MLP) neural network, least square support vector machine, and extreme learning machine (ELM); each of these ML models is hybridized by combining it with powerful particle swarm optimization (PSO). According to the result analysis, ELM with PSO is comparatively accurate in predicting pore pressure compared to the other two hybrid ML models. To identify the reliability of the ELM-PSO model, Farsi et al. (2021) applied the model to three different well data and concluded that the newly developed model could predict pore pressure accurately across the entire studied area.

As part of their study, Wei et al. (2021) compared deep learning recurrent neural networks (RNN) with MLP to predict pore pressures in the soil. Based on the outcome, it is evident that the RNN model predicts pore pressure with high accuracy compared to the MLP model. The prediction performance of these two models (RNN and MLP) is evaluated based on RMSE and R² values. Booncharoen et al. (2021) studied the performance of various regressor-based algorithms (extreme gradient boosting (XGBoost), ridge, and quantile) to predict pore pressure in the Pattani basin, Thailand using drilling parameters and reservoir characteristics as input variables. As per the root mean square error (RMSE) of the train and test datasets, the overall model performance was calculated to be approximately 1.2 and 1.5 points per gallon, respectively, based on 12 drilling projects in the Pattani basin. According to Ahmed et al. (2019), five machine-learning models were developed to predict pore pressure based on drilling parameters and drilling logs collected in the field. These five models are fuzzy logic (FL), functional network (FN), radial basis function (RBF), and artificial neural network (ANN). Based on comparative analysis, SVM showed good agreement between predicted and measured pore pressure data with an average percentage error of 0.14%.

Radwan et al. (2022) and Zhang et al. (2022) are the two most recent studies on multiple ML techniques to predict pore pressure. In these studies, Radwan et al. (2022) used nine different ML techniques, whereas Zhang et al. (2022) used four techniques and employed field data from New Zealand (Mangahewa gas field) and the Middle East, respectively. Based on the result analysis of the models developed by Radwan and Zhang, the decision tree (DT) showed excellent predictive efficiency with RMSE values in the range of 0.25 to 14.71 psi and 0.99 to 14.46 psi, respectively. Another study by Matinkia et al. (2022) evaluated the application of a convolutional neural network (CNN) for predicting pore pressure. A comparative analysis between CNN, least square support vector machine (LSSVM) with PSO, and multilayer extreme learning machine (MELM) hybridized with the PSO, cuckoo optimization algorithm (COA), and genetic algorithm (GA) is conducted. Here, CNN outperformed the other hybrid deep learning models with RMSE and R² values of 0.1066 and 0.9806, respectively.

This research aims to find the best machine learning and deep learning (ML/DL) model for predicting pore pressure, addressing a crucial need in the energy industry particularly geothermal, CCUS (Carbon Capture, Utilisation and Storage), and oil and gas. By comparing various ML/DL models known for their accuracy and reliability in pore pressure prediction, like decision tree (DT), XGBoost, random forest (RF), recurrent neural network (RNN), and convolutional neural network (CNN), this research significantly advances the understanding of reservoir dynamics. Importantly, it fills a gap in the literature by conducting the first comparative analysis of ML and DL models using data from the Volve oil field. This approach improves both our knowledge of pore pressure prediction methods and provides insights on using data-driven solutions to tackle complex geomechanical challenges. Moreover, while past studies often had limited data points, ranging from 5064 to 7171, this research uses a larger dataset of 22,539 points from five different offshore wells in the Volve oil field. This extensive data enables better transfer learning and model optimization, boosting prediction efficiency. Further, this research holds significant industrial value. By comprehensively evaluating pore pressure prediction reliability across the entire study area, it provides valuable insights that can optimize reservoir management and ensure safe and efficient hydrocarbon extraction.

2 Study area: North sea-volve oil field

The Volve oil field is located inside Block 15/9, 200 km west of Stavanger, at the southern end of Norway’s continental shelf in shallow water (water depth of about 80 m), shown in Fig. 1. It was discovered in 1993 and established using a jack-up rig processing and drilling capability system. Drilling started in this field in 2007 and commenced production with traditional water injection pressure support in the year 2008, having a life expectancy of 3–5 years. This field shut down in September 2016 after operating for eight years. During its operation, Volve Field produced fifty-six thousand (56,000) barrels per day and delivered a total of sixty-three (63) million barrels of oil in its lifetime (Sen and Ganguli 2019).

Located at a 2700–3100 m depth, the Hugin formation is the primary siliciclastic reservoir during the Middle Jurassic period (Vollset and Doré 1984). A combination of structural and stratigraphic traps has contained the oil in this reservoir. It should be noted that the reservoir structure has a small dome-type shape that originated due to the disintegration of the adjacent salt ridges during the Middle Jurassic period (Szydlik et al. 2007). Several faults occur in the western portion of the structure, most of which are from regional extensions in salt tectonics, making communication across these faults difficult. As a result of these structural events, the reservoir thickness is constricted in a range of 20 m at the crest and 100 m on the flanks of the structure (Sen and Ganguli 2019).

The sediment deposition in the Hugin system of the Viking group resulted from the rift collapse and continuous flooding of Viking graben due to a major transgression during the Mid-Late Jurassic period (Sneider et al. 1995). Consequently, the Hugin formation consists largely of shallow marine to marginal marine sandstone deposits with the occurrence of coal seams. This formation is mainly composed of clean sandstones. These clean sandstones primarily consist of mica, quartz, and clay minerals. The distribution of lithofacies controlled by shallow water tidal systems was responsible for the deposition of the clean sandstones Folkestad and Satur, (2008). Under sea level, there is an average depth of 2750 to 3120 m at which the top Hugin Formation can be encountered. The value of general properties of Hugin formation, i.e., porosity, water saturation in the oil zone, permeability, and net-gross, is 0.21 p.u., 20%, 1 Darcy, and 0.93, respectively. A generalized version of regional stratigraphy is illustrated in Fig. 2.

3 Methodology

This paper uses five different algorithms to identify the most effective pore pressure predictor: DT, XGBoost, RF, RNN, and CNN. A short explanation of these algorithms is given in Sect. 3.3. However, for detailed understanding on the archeiteture, hyper/control parameters and the associated methodologies of these ML/DL algorithms are documented elsewhere with most, previously employed for pore pressure prediction: DT (Zhang et al. 2022); XGBoost (Booncharoen et al. 2021); RF (Yu et al. 2020); RNN (Wei et al. 2021); and CNN (Matinkia et al. 2022). Figure 3 shows the schematic workflow depicting how the intelligent algorithms for predicting pore pressure were constructed, evaluated, and compared for accuracy. As illustrated in Fig. 3, the first two steps represent the data gathering and data cleaning process, which involves removing the outliers and null values. Next, an analytical model, i.e., Eatons’ model, is used to calculate the pore pressure, which is further compared with the limited repeat formation tester (RFT) pore pressure data to establish the credibility of the analytical model (Fig. 4). Afterward, all the variable’s maximum and minimum values are identified, and then the data are normalized in a numerical range of + 1 and − 1. Equation (1) is used for normalization.

$$\begin{array}{*{20}c} {D_{n}^{v} = 2 \times \left( {\frac{{D_{n}^{v} - D_{min}^{v} }}{{D_{max}^{v} - D_{min}^{v} }}} \right) - 1} \\ \end{array}$$

(1)

where ${D}_{n}^{v}$ is the value of the variable ($v$) for n^th data; ${D}_{min}^{v}$ and ${D}_{max}^{v}$ are the minimum and maximum values of the variables in the entire dataset, respectively.

Subsequently, the normalized dataset is split into two subsets, namely train and test, with a 70:30 division ratio. Then, the selected models are developed, trained, and tested using the subsets. After that, a statistical approach is used to identify the accuracy of the developed models and the selection of the best-performing model. Finally, the best-performing model is utilized to predict the pore pressure of a completely new dataset from different wells to identify its level of universality/generalizability.

3.1 Well-log data collection and description

The petrophysical data from the five wells drilled in the Volve oil field (15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, 15/9-F-11 T2, and 15/9-F-14) are used in this study of ML and DL evaluations. The dataset consists of high-resolution petrophysical variables, which include compressional wave travel time (DTC), shear wave travel time (DTS), gamma ray (GR), neutron porosity (NPHI), photoelectric absorption factor (PEF), bulk density (RHOB), and resistivity log (RT). A summary of these variables’ application to identify target formation characteristics is detailed in Table 1. The selected variables constitute the input data used to predict the formation of pore pressure. The five wells compiled a total of 22,539 data points (5749 for 15/9-F-1 A, 1751 for 15/9-F-1 B, 3900 for 15/9-F-11 A, 7358 for 15/9-F-11 T2, and 3781 for 15/9-F-14). Out of these five wells, four wells are used to train and test the models, and the remaining one well data is used to determine the generalization/universality of the best-performing model. In this study, the pore pressure (PP) is calculated using Eaton’s method, which appeared to be reasonable based on the limited number of repeat formation tester (RFT) data (Fig. 4). The statistical details of the total dataset and the data distribution of the five wells are provided in Tables 2 and 3. In Tables 2 and 3, statistical parameters, namely mean, standard deviation, percentile P25, percentile P50, and percentile P75, are provided for each variable in the dataset. The mean and standard deviation of the dataset present the average of the total data and the distribution of the dataset. P25, P50, and P75 provide valuable information about the distribution and spread of the data. P25, P50, and P75 are the dataset's first, second, and third quartiles. P25 provides insight into the lower portion of the data distribution, P50 in the central part, and P75 in the upper part of the total dataset. It aids in evaluating data variability, identifying outliers, comparing datasets, and making informed decisions based on the data distribution.

Table 1 Application of the selected variables used in this study to determine the characteristics of the target formation

Full size table

Table 2 Selected variables statistical details for the total dataset (total 22,539 data records)

Full size table

Table 3 Selected variables and statistical details for the datasets were collected separately from 15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, 15/9-F-11 T2, and 15/9-F-14

Full size table

A correlation heatmap considering the four wells data (15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, and 15/9-F-11 T2) is also illustrated using Fig. 5. The correlation heatmap, shown in Fig. 5 is provided to quantify the degree of correlation amongst the selected petrophysical variables. This examination assists in identifying the distribution of data and their anticipated patterns. According to Fig. 5, PEF, RHOB, and RT positively correlate with pore pressure. Meanwhile, NPHI and DTC show a good negative correlation with pore pressure, while DTS and GR show a poor negative correlation (Fig. 5).

3.2 Overburden and pore pressure modeling

The overburdened stress (${\gamma }_{s}$) at a selected point based on the Terzaghi and Peck, (1948) theory of soil bearing capacity explains that it is assisted by the rock matrix and the fluid stored in that rock matrix, represented as Eq. (2) (Eaton 1975; Liu et al. 2018).

$$\begin{array}{*{20}c} {\gamma_{s} = \gamma_{e} + PP} \\ \end{array}$$

(2)

where, ${\gamma }_{e}$ is the effective vertical stress, and $PP$ is the pore pressure. Hence, the fluid experiences high overpressure in contrast to hydrostatic pressure due to the overburden stress. The bulk density of the overlying formation and the target depth are the parameters used to calculate the overburden stress (${\gamma }_{s}$) (Eq. (3)). The bulk density is determined from the petrophysical log data.

$$\begin{array}{*{20}c} {\gamma_{s} = \displaystyle\int\limits_{0}^{z} \rho .dz} \\ \end{array}$$

(3)

where $\rho$ refers to the bulk density, and $z$ refers to the depth (Oloruntobi et al. 2018; Oloruntobi and Butt 2019). The subsurface formation rock properties, such as cd-exponent, acoustic velocity, and resistivity, usually increase with increasing depth. However, trapped overpressured fluid in the rock matrix tends to negatively affect the trend of these measured rock properties. Hence, many pore pressure methodologies were developed, the primary foundation of which is Eq. (2). Eaton’s method is widely used and approved by the industry in these methodologies. Hence, this model is utilized for pore pressure calculation in this study.

According to the Eaton, (1975) article, cd-exponent, acoustic velocity, or resistivity data can be used to estimate pore pressure as presented in Eq. (4).

$$\begin{array}{c}PP={\gamma }_{s}-\left({\gamma }_{s}-{P}_{h}\right){\left({x}_{obs}/{x}_{norm}\right)}^{k}\end{array}$$

(4)

where, ${P}_{h}$ refers to the hydrostatic pressure, ${x}_{obs}$ refers to the observed attribute and ${x}_{norm}$ refers to the normally expected attribute for resistivity, acoustic velocity, or cd-exponent parameter. $k$ is the Eaton’s exponent, which varies with the selection of petrophysical parameters used to estimate the pore pressure. $k$ is equal to 1.2 when resistivity or cd-exponent data are used, and it is equal to 3 when acoustic velocity data is used to estimate pore pressure.

3.3 ML and DL approaches and development

Artificial intelligence (AI) is commonly used to refer to machine intelligence or prediction based on machine learning algorithms. In most research literature, this tool is defined as an intelligent product with knowledge and design that resembles the natural capability of human minds (Legg and Hutter 2007; Temirchev et al. 2020). AI mimics human behavior; the special focus will be on how the neurons work in the brain, solve problems, and acquire knowledge and reasoning (Hassanpouryouzband et al. 2021; Poole et al. 1998). Nowadays, many of the engineering problems in different sectors of science and technology are solved by using artificial intelligence algorithms (Choubineh et al. 2017; Farsi et al. 2021; Ghorbani et al. 2017; Hazbeh et al. 2021; Shamshirband et al. 2019). Philosophical, linguistic, mathematical, psychological, neuroscience, physiology, game theory, etc., are sources of the origin and the primary ideas behind artificial intelligence algorithms development (Lieder and Griffiths 2020).

3.3.1 DT

Decision trees are one of the powerful techniques commonly used in several fields, such as classification problems and regression analysis for different engineering applications; another advantage of the decision tree is the explainability of the model (Stein et al. 2005). The general architecture of DT is illustrated in Fig. 6. In a decision tree (DT), a series of basic tests where in each test, the numerical feature is compared to a threshold value are effectively united, and the calculation is shown in the following Eq. (5) (Damanik et al. 2019).

$$\begin{array}{c}y=f\left(x\right)=\sum\limits_{i}{y}_{i}\cdot I\left({x}_{i}\in {R}_{i}\right)\end{array}$$

(5)

where $y$ is the predicted target variable (output) (pore pressure).

$x$ is the vector of input variables (features) (DT, DTS, GR, NPHI, PEF, RHOB, RT).

${y}_{i}$ is the predicted output value for the $i$ th leaf node (i.e., the average value of the training samples in that leaf node).

${R}_{i}$ is the region of feature space corresponding to the $i$ th leaf node.

$I$ is the indicator function, which returns $1$ if the condition in parentheses is true and $0$ otherwise.

Developing the conceptual rule in DT is much more straightforward than constructing the numerical weights of the connections in neural network nodes (Anuradha and Gupta 2014; Barros et al. 2012). DT is mainly used for grouping and classification modeling in data mining (Gavankar and Sawarkar 2017). Each tree is composed of nodes and subsets. Each node represents the features that must be classified, while the subset defines the value for the node itself (Mahesh 2020; Swain and Hauska 1977). The decision tree has been successfully implemented in many fields due to its simple analysis and precision in various data forms (Charbuty and Abdulazeez 2021; Mrva et al. 2019). Table 4 shows the list of hyper/control parameters for the developed DT model to predict pore pressure.

Table 4 Hyper/control parameters used for the 5 different ML/DL models to predict pore pressure

Full size table

3.3.2 XGBoost

The gradient-boosting decision tree algorithm was initially proposed in 2001(Friedman 2001). The method is based on the gradient descent method for generating new trees based on previous trees considering minimizing an objective function. The extreme gradient boosting decision tree (XGBoost) is a decision tree-based ensemble model that uses a scalable gradient boosting framework Chen, and Guestrin developed in 2016. Due to its high performance, it has been popular for classification and regression problems (Booncharoen et al. 2021; Chen and Guestrin 2016; Liu et al. 2022; Ogunleye and Wang 2020; Pan et al. 2022; Torlay et al. 2017). The distributed, parallel, cache-aware, and out-of-core computing made this algorithm much faster than most machine learning and deep learning algorithms. Moreover, the algorithm is well-optimized and scalable enough to process a large amount of data in memory-limited or distributed computing settings. The method can be easily implemented on datasets with data sparsity challenges, missing values, and too frequent zero values (Chen and Guestrin 2016). Implementing the XGBoost to a regression problem continuously generates and adds new regression trees to the model. Further, the residual of the old model is fitted through the updated classification and regression trees. The final predicted value is based on the sum of the results of each tree (Nguyen et al. 2019). The control parameters used to develop the XGBoost are listed in Table 4. The architecture of the XGBoost and RF model is illustrated in Fig. 7.

3.3.3 RF

Random forest is an ensemble model based on decision trees, which uses bagging to overcome one of the disadvantages of decision trees, namely overfitting (Fig. 7). The Random Forest (RF) algorithm is a DT-based model that selects the best regression tree from tree-structured regressions with identically independent distributed random vectors by a unit voting (Reis et al. 2018; Zebari et al. 2019). The method has been widely used for classification and regression tasks by providing a higher cross-validation accuracy (Bargarai et al. 2020). It has been applied mainly in the image processing field and on various problems such as news classification, content filtering, intrusion detection, and sentiment analysis (Aljumaily et al. 2023). In RF, a random vector independent from previous similar distribution random vectors and a tree from the training set is generated. An upper bound is extracted to compute the generalization error to evaluate the accuracy and interdependency of individual classifiers (Ozgode Yigin et al. 2020). RF proved its effectiveness via large data sets as well as a large number of input variables. The algorithm can handle lost details and incomplete data without accuracy reduction (Aljumaily et al. 2023). The control parameters used to develop the RF are listed in Table 4.

3.3.4 RNN

An extension of an artificial neural network known as a recurrent neural network (RNN) can handle temporal patterns and contains an internal memory to process arbitrary temporal sequences. This framework has a feedback loop to store data about previous behavior (Connor et al. 1994; Giles et al. 2001). The schematic RNN architecture is shown in Fig. 8 and hyper/control parameters used to develop this model is listed in Table 4. RNN evaluates the current observation in the context of the network's internal (hidden) state and converts the input sequence data into vectors to produce the output (Al-Shabandar et al. 2021). Due to the vanishing gradient challenge, RNN may not be capable of analyzing long-term sequential data. In large time-series datasets, network errors are reproduced from the output layer forward to the training phase’s input layer (Karim et al. 2018; Tian and Pan 2015). The weight decay of the gradient descent may decrease and ultimately block in the lower levels due to this error propagation. The vanishing gradient problem can stop an RNN’s further training (Hochreiter 1998).

The Long Short-Term Memory (LSTM) as a type of RNN mitigates the vanishing gradient problem by representing the long-term dependencies. LSTM has a standard and a special unit. The information is stored and transferred for a long time in memory using multiplicative unit gates via the special unit. It sets the constant error while the unit gates control the constant error flow, and eventually, LSTM can modify the weight decay parameters over time (Weninger and Bergmann 2015). The other advantage of LSTM over the standard RNN is its sensitivity to the length of the input time series (Dhamija and Boult 2017). LSTM is a popular technique for sequential text data sets, and LSTM is one of the popular techniques for time series datasets; in this study, we have implemented RNN in a regression dataset to compare its performance with other machine learning models.

3.3.5 CNN

Convolutional neural networks (CNN) have achieved outstanding results, making them one of the most representative neural networks in deep learning. CNN uses convolutional architecture to extract features from data. Compared to conventional feature extraction techniques, CNN does not require manual extraction of the features (Ahonen et al. 2006). CNN structure has been inspired by visual perception (Hubel and Wiesel 1962). The artificial neuron represents the biological neurons, while the receptors responding to different features correspond to the CNN kernels. The activation function also replicates a function that transmits the neural network signal to the next neuron if it exceeds a predetermined threshold. In addition, loss and optimization functions help the CNN to learn the expected result (Li et al. 2022). In CNN, a neuron is only connected to a few neurons rather than all neurons of previous layers, which speeds up the convergence. The structure is in a way that a group of connections sharing similar weights reduces the additional parameters. As the last advantage, a pooling layer downsamples the data while keeping useful information to reduce the number of parameters and dimensions (Li et al. 2022). CNN is widely used in image classification problems; in this study, we tried to implement CNN for regression and compare its performance with traditional regression algorithms like RF and XGBoost.

3.3.6 Model development

Five ML and DL algorithms (DT, RF, XGBoost, RNN, and DNN) are developed using the recorded data from the wells drilled in the Volve oil field. The total dataset from these wells is divided into training and testing subsets. 70% of these are used to train the algorithms, and the remaining 30% are used to test the algorithms. A tenfold cross-validation approach is performed on the training and testing data to ensure that all the data subsets are considered during modeling estimation.

We used RandomizedSearchCV from the Scikit-Learn library for the ML models and a grid search approach for the DL models. RandomizedSearchCV performs a randomized search over hyperparameters, offering efficiency over exhaustive search methods like GridSearchCV. During the hyperparameter tuning process with RandomizedSearchCV, we used a fivefold cross-validation. The primary reason for using a fivefold cross-validation in the RandomizedSearchCV step was to balance computational efficiency and robust validation. The RandomizedSearchCV process is used to identify the best hyperparameters for the model by searching through a large parameter space. In this step, using a fivefold cross-validation allows for faster iterations while still providing reliable estimates of model performance. This choice helps to reduce the overall computational time required for the hyperparameter tuning process. Once the best hyperparameters are identified, a more rigorous tenfold cross-validation is performed on the training and testing data using the optimized model. The tenfold cross-validation provides a more robust assessment of the model’s performance and helps to minimize the risk of overfitting.

We defined the following ranges for each hyperparameter and conducted 50 search trials: for the Decision Tree (DT), maximum depth ranged from 10 to 100 in increments of 10, and splitter options included ‘best’ and ‘random’; for XGBoost, the number of estimators was set to 50, 100, or 200, learning rates were 0.01, 0.1, or 0.3, maximum depths were 3, 6, or 9, and subsample and colsample_bytree values were 0.5, 0.75, or 1. For the Random Forest (RF), maximum depth ranged from 10 to 1030 in increments of 10, and random states included 0, 42, and 100.

For deep learning models, we defined a grid search for tuning hyperparameters. For the Recurrent Neural Network (RNN), we considered LSTM units of 10, 50, or 100; loss functions ‘mae’ or ‘mse’; optimizers ‘adam’ or ‘rmsprop’; batch sizes of 32, 64, or 128; and epochs of 20, 50, or 100. For the Deep Neural Network (DNN), dense layer units were 15, 30, or 50 with ‘relu’ activation; hidden layers configurations included (512, 512), (1024, 512), (1024, 1024, 512), (512, 256), or (512, 1024, 1024, 512, 256); output layer had 1 unit with ‘linear’ activation; kernel initializer was ‘normal’; optimizer was ‘Adam’ with learning rates of 0.001 or 0.01, beta_1 of 0.9, and beta_2 of 0.999; loss functions were 'mae' or 'mse'; batch sizes were 64 or 128; and epochs were 50 or 100.

After fitting the randomized search model with our training data, we identified the best estimators and used these optimal hyperparameters to build an updated XGBoost regressor model, decision tree regressor, and random forest regressor. The model was then fit on the training data and evaluated using the root mean square error (RMSE) metric.

In addition to machine learning regressors, we implemented a grid search for deep learning modules. The architecture of CNN consisted of an input layer, five (5) hidden layers with varying numbers of neurons (see Table 4), and a final output layer. Dropouts were used to prevent overfitting, and L1 regularization was added to the output layer to improve the model's generalization capabilities further. Each layer in the network leverages the ReLU activation function, which helps the model learn complex patterns. Similarly, for RNN, specifically the LSTM model we applied the grid search method. The architecture of the RNN included an input layer, a single LSTM layer with 50 units, and a final dense output layer. The model was trained using the mean absolute error (MAE) as the loss function and the Adam optimizer for efficient convergence. Table 4 shows the optimized values of hyper/control parameters used for developing the ML/DL models to predict pore pressure.

The details of the performance metric of training, testing, and total dataset from the four different wells at the Volve oil field to predict PP by the different ML and DL algorithms are presented in Tables 5, 6, 7 in terms of seven performance prediction measures.

Table 5 Statistical accuracy measures for predicting PP were taken using the five ML and DL algorithms created using the training subset

Full size table

Table 6 Statistical accuracy measures for predicting PP via the five ML and DL algorithms created using the testing subset

Full size table

Table 7 Statistical accuracy measures for predicting PP via the five ML and DL algorithms created using the total subset

Full size table

3.4 Performance evaluation

To compare and evaluate the performance and accuracy of the AI models to predict formation pore pressure, the following statistical error indicators, namely root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), average relative error (ARE), absolute average relative error (AARE), standard deviation (SD), and coefficient of determination (R²) are applied in the study. The coefficient of determination (R2) range is between 0 to 1. The closer the R² value to 1, the better the model prediction is. For the other performance parameters, the smaller the value, the better the prediction. The specific formula is described as follows Eqs. 6–12.

$$\begin{array}{c}RMSE=\sqrt{\frac{1}{m}\sum_{i=1}^{m}{\left({x}_{i}^{mea}-{x}_{i}^{Pred}\right)}^{2}}\end{array}$$

(6)

$$\begin{array}{c}MAE=\frac{1}{m}\sum\limits_{i=1}^{m}\left|{x}_{i}^{mea}-{x}_{i}^{Pred}\right|\end{array}$$

(7)

$$\begin{array}{c}MAPE=\frac{1}{m}\sum\limits_{i=1}^{m}\left|\frac{{x}_{i}^{mea}-{x}_{i}^{Pred}}{{x}_{i}^{mea}}\right|\times 100\end{array}$$

(8)

$$\begin{array}{c}ARE=\frac{\sum_{i=1}^{m}\left(\frac{{x}_{i}^{mea}-{x}_{i}^{Pred}}{{x}_{i}^{mea}}\times 100\right)}{m}\end{array}$$

(9)

$$\begin{array}{c}AARE=\frac{\sum_{i=1}^{m}\left|\frac{{x}_{i}^{mea}-{x}_{i}^{Pred}}{{x}_{i}^{mea}}\times 100\right|}{m}\end{array}$$

(10)

$$\begin{array}{c}SD=\sqrt{\begin{array}{c}\frac{\sum_{i=1}^{m}{\left(\left({x}_{i}^{mea}-{x}_{i}^{Pred}\right)-\frac{1}{m}\sum_{i=1}^{m}\left({x}_{i}^{mea}-{x}_{i}^{Pred}\right)\right)}^{2} }{m-1}\\ \end{array}}\end{array}$$

(11)

$$\begin{array}{c}{R}^{2}=\frac{\sum_{i=1}^{m}({x}_{i}^{mea}-\overline{{x }_{i}^{mea}})({x}_{i}^{Pred}-\overline{{x }_{i}^{Pred}})}{\sqrt{\sum_{i=1}^{m}{\left({x}_{i}^{mea}-\overline{{x }_{i}^{mea}}\right)}^{2}\sum_{i=1}^{m}{\left({x}_{i}^{Pred}-\overline{{x }_{i}^{Pred}}\right)}^{2}}} \end{array}$$

(12)

where, ${x}_{i}^{mea}$ is the measured value, ${x}_{i}^{Pred}$ is the predicted value, $\overline{{x }_{i}^{mea}}$ is the average of the measured value, $\overline{{x }_{i}^{Pred}}$ is the average of the predicted value and $m$ is the number of the sample data. Each prediction performance measure has unique attributes. Although some of these measures may have strong correlations, assessing and contrasting multiple statistical error parameters is beneficial when assessing the effectiveness of AI algorithms. Furthermore, the results of comparative analysis on the performance of the models are illustrated using a combination of RMSE/R² and MAE/MAPE to provide visual confirmation of model performance.

4 Result and discussion

Based on the results presented in Tables 5, 6, 7, it is apparent that the accuracy of predicting pore pressure favoring DT and RF algorithms over the other algorithms (XGBoost, RNN, CNN). During the training phase, both DT and RF show an R² value of 0.9927 and 0.9928, respectively. However, this R² value of DT and RF reduces to 0.8801 and 0.9438, respectively, during the testing stage. The low performance of DT on the testing dataset may indicate some overfitting or data memorization occurring during the training stage. This is attributed to the issue of data overfitting during the DT model training. Hence, unlike DT and RF algorithms, the latter delivers a more accurate pore pressure prediction. As compared to DT, RF delivers more accurate pore pressure prediction on the testing dataset. In Tables 5, 6, 7, it can also be seen that RF delivers prediction with higher accuracy and lower error, where the RMSE value is 1.57, 4.30, and 2.70 MPa for the training, testing, and total dataset, respectively. Therefore, based on the results, the RF algorithm is declared the best-performing algorithm for the present study by outperforming the other four algorithms. However, it is important to consider the relative importance of each metric when choosing an algorithm for a particular task. For example, if minimizing absolute errors is most important, then MAE or RMSE might be the best metrics to use. Whereas, if the data has a wide range of values, then MAPE might be a more appropriate metric. DT and RF shows the lowest MAE value with DT being the lowest during training and RF being the lowest during Testing, followed by XGBoost, RNN and CNN both during training and testing. The MAPE value seems to show a similar pattern to MAE with DT and RF having the lowest error and CNN having the highest error.

CNN has the highest RMSE, indicating the largest deviation from the actual values for both training and testing stage. In contrast, DT and RF exhibit the lowest RMSE values, with RF slightly outperforming DT, suggesting they have the smallest prediction errors. RNN demonstrates a high RMSE, indicating less accurate predictions. XGBoost performs better than CNN and RNN but does not achieve the same level of accuracy as DT and RF. The prediction accuracy of the developed ML and DL algorithms are also illustrated graphically in terms of RMSE/ R² and MAE/MAPE in Fig. 9a, b.

Figure 10 provides a comparative analysis between the predicted and calculated pore pressure values for the five ML and DL algorithms, using the total dataset. The maroon straight line in the figure represents the perfect fit line, where predicted values would exactly match the calculated values. The Random Forest (RF) algorithm shows the most accurate predictions, with the majority of its data points clustered closely around the perfect fit line. This indicates a strong correlation between the predicted and actual pore pressures, demonstrating RF’s high level of precision. The Decision Tree (DT) algorithm also performs well, with many of its points aligning closely with the perfect fit line. However, there is a slightly greater spread of points away from the line compared to RF, indicating that DT, while accurate, is marginally less precise than RF. XGBoost shows predictions that are generally close to the perfect fit line but exhibit a wider spread than those of RF and DT. This suggests that XGBoost, while relatively accurate, has a higher degree of variability in its predictions, indicating less consistency. RNN and CNN has the widest spread of points from the perfect fit line among the five algorithms, indicating the least accurate predictions. The points are more dispersed, showing higher deviations from the actual values.

Overall, the figure clearly illustrates the superiority of the RF algorithm in predicting pore pressure. RF’s predictions are consistently closer to the calculated values, highlighting its effectiveness and reliability. The DT algorithm follows, showing good performance but with slightly less precision. XGBoost, while relatively accurate, has more variability in its predictions. Both RNN and CNN show less accurate predictions, with CNN performing the worst among the five algorithms. This analysis underscores the robustness and accuracy of the RF algorithm in pore pressure prediction compared to the other models.

Figure 11 displays a cross-plot of the predicted and calculated pore pressure for each ML and DL algorithm individually, allowing for further comparison of their evaluation accuracy. As per Fig. 8, the RF algorithm predicts better as most results are close to the perfect fit line. The coefficient of determination (R²) value delivered by each algorithm is used to determine the accuracy, which can be ordered as RF > DT > XGBoost > RNN > CNN. In addition, Fig. 11 displays the range of relative residuals of the predicted pore pressure values for the five algorithms (DT, XGBoost, RF, RNN, and CNN). The findings indicate that the RF algorithms have a smaller relative residual range (ranging from 0.5 to − 1.25) than the remaining algorithms. At this point, it needs to be highlighted that the superiority of the random forest algorithm over the other remaining algorithms, especially DT, is because of its ability to create an uncorrelated decision tree forest. This decision tree forest is created using the bagging and feature randomness method, which reduces the chance of overfitting as features of the random subset are generated, which ensures a low correlation between decision trees.

4.1 RF algorithm generalization test for pore pressure (PP) prediction

The outcomes presented in the preceding section, which pertained to the performance of the five ML and DL algorithms on the training, testing, and overall datasets, were obtained using data from four wells of the Volve oil field (15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, and 15/9-F-11 T2). While the initial training and testing on these four wells provide a robust evaluation of model performance within that subset of data, it does not fully confirm the model’s ability to generalize to new, unseen data from the same field. Therefore, an additional set of data gathered from the well 15/9-F-14 of the Volve field, containing 3781 data points, is utilized to evaluate the capability and reliability of the RF algorithm for precise pore pressure prediction. This additional evaluation using Well 15/9-F-14 serves as an independent test set that was not used in the training or initial testing phases, thereby allowing us to assess the model’s generalization capabilities more comprehensively. Assessing the model’s performance on Well 15/9-F-14 provides additional insights into its robustness and reliability when faced with data from a different well in the same field. This extra validation step strengthens the confidence in the model’s performance and its potential for practical application in the field of study.

Table 8 provides the statistical accuracy metrics acquired by the RF algorithm after applying the Well 15/9-F-14 dataset. Comparing the results presented in Table 8 and Tables 5, 6, 7 confirms a significant capability of the developed RF algorithm in predicting pore pressure when applied on another well (15/9-F-14) from the field of study. Figures 12a and b show the cross plot of predicted and calculated pore pressure with the RF-trained algorithm's relative residual and distribution histogram when applied to another data set from the same field of study. Figure 12a shows a satisfactory alignment of the predicted data with the perfect fit line, R² value of 0.909, and relative residual in a range of 0 to − 7, which corroborates the reliability of the RF algorithm to predict the pore pressure in the other wells throughout the field of study. Figure 12b illustrates the cross plot and distribution histogram of predicted and calculated pore pressure. As mentioned above, the predicted value is satisfactorily relevant to the calculated pore pressure, and the distribution of the predicted results is relatively concentrated at degrees with few discrete points.

Table 8 Statistical accuracy measures for predicting PP via the RF algorithms trained using wells 15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, and 15/9-F-11 T2 dataset applied to well 15/9-F-14 from the same field of study

Full size table

4.2 Comparison with other studies

Table 9 summarizes recent research on pore pressure prediction using machine learning and deep learning algorithms, highlighting a comparative analysis with the present study. Existing studies generally utilize smaller datasets for training and testing their models, often gathered from one or two wells, limiting the diversity of data and potentially reducing model reliability. Moreover, previous studies commonly lack generalization tests for assessing model performance beyond the training data.

Table 9 Comparison of present study results with existing studies

Full size table

In contrast, the present study distinguishes itself by employing a significantly larger dataset comprising 18,758 data points for model development and evaluation gathered from four different wells. The study finds that Random Forest (RF) outperforms other machine learning and deep learning algorithms regarding prediction accuracy when provided with a large quantity of petrophysical data points. Additionally, the present research conducts generalization tests to assess the robustness and reliability of the developed models, a step often overlooked in prior studies. Moreover, the study compares various machine learning and deep learning algorithms such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), XGBoost, Decision Trees (DT), and RF to assess their performance in predicting pore pressure. This comprehensive evaluation, which has not been conducted before, offers valuable insights into identifying the most suitable algorithms for pore pressure prediction tasks.

5 Conclusion

The study evaluates five machine learning and deep learning algorithms for predicting pore pressure through supervised learning. It was found that the RF algorithms showed the best performance as compared to the other algorithms, with an RMSE value of 2.70 MPa and an R² value of 0.97. On the other hand, the CNN algorithm was found to be the least accurate, with a RMSE value of 8.26 MPa and R² of 0.82. A generalization test of the best-performing algorithm (RF) is carried out to assess its reliability. This is done by predicting the pore pressure of another well (15/9-F-14) from the same field using the trained RF algorithm. Based on the result, it is concluded that the RF algorithm can be used for pore pressure prediction in another well of the same field as it showed a satisfactory RMSE and R² value, i.e., 6.48 MPa and 0.905, respectively. The best-performing RF model possesses advantageous attributes that enable it to offer rapid and dependable predictions of pore pressure across different conditions within the field under investigation. This research also demonstrates using machine learning (ML) and deep learning (DL) models to develop precise data-based approaches for forecasting sub-surface pore pressure using easily accessible petrophysical well-log data.

The current study exclusively utilizes petrophysical data from wells within a single oil field to train and test the chosen ML/DL models. This limited scope decreases the likelihood of the best-performing model exhibiting strong performance when applied to petrophysical data from other reservoirs. In future research, incorporating geological datasets alongside petrophysical data will be explored to assess the influence of geological parameters on pore pressure. Additionally, combining petrophysical datasets from wells across different oil fields and regions will be considered to enhance the training of ML and DL models for pore pressure prediction. This approach will also further validate the generalizability of the best-performing ML or DL models.

Data availability

Data will be available on request.

Abbreviations

ADA:: Adaboost
ANN:: Artificial neural network
CGR:: Corrected Gamma Ray
CNN:: Convolutional neural network
COA:: Cuckoo optimization algorithm
DT:: Decision tree
DTCO:: Compressive-wave slowness
DTC:: Compression-wave velocity
DTS:: Shear-wave sonic travel time
DWKNN:: Distance weighted KNN
ELM:: Extreme learning machine
FL:: Fuzzy logic
FN:: Functional networks
GA:: Genetic algorithm
GBRT:: Gradient boosted regression trees
GR:: Gamma ray
ILD:: Deep resistivity
KNN:: K-nearest neighbor
LSSVM:: Least-square support vector machine
MLP:: Multilayer Perceptron
MLR:: Multilayer regression
MELM:: Multilayer extreme learning machine
MW:: Mud density
NPHI:: Neutron porosity
OF:: Optimizer formula
PEF:: Photoelectric absorption factor
PSO:: Particle swarm optimization
SGR:: Spectral gamma ray
SVM:: Support vector machine
SVR:: Support vector regression
RBF:: Radial basis function
RF:: Random Forest
RHOB:: Bulk density log
ROP:: Rate of penetration
RPM:: Rotary speed
RT:: True resistivity of the formation
TOB:: Transparent open box
WKNN:: Weight K-nearest neighbor
WOB:: Weight on bit
${c}_{p}$ :: Pore compressibility coefficient

References

Abdelghany WK, Radwan AE, Elkhawaga MA, Wood DA, Sen S, Kassem AA (2021) Geomechanical modeling using the depth-of-damage approach to achieve successful underbalanced drilling in the Gulf of Suez rift basin. J Pet Sci Eng. https://doi.org/10.1016/J.PETROL.2020.108311
Article Google Scholar
Ahmed A, Elkatatny S, Ali A, Abdulraheem A (2019) Comparative analysis of artificial intelligence techniques for formation pressure prediction while drilling. Arab J Geosci. https://doi.org/10.1007/s12517-019-4800-7
Article Google Scholar
Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal Mach Intell 28:2037–2041. https://doi.org/10.1109/TPAMI.2006.244
Article Google Scholar
Aljumaily H, Laefer DF, Cuadra D, Velasco M (2023) Point cloud voxel classification of aerial urban LiDAR using voxel attributes and random forest approach. Int J Appl Earth Obs Geoinf 118:103208. https://doi.org/10.1016/j.jag.2023.103208
Article Google Scholar
Al-Shabandar R, Jaddoa A, Liatsis P, Hussain AJ (2021) A deep gated recurrent neural network for petroleum production forecasting. Mach Learn with Appl 3:100013. https://doi.org/10.1016/j.mlwa.2020.100013
Article Google Scholar
Anifowose F, Ewenla AA, Eludiora S (2011) Prediction of oil and gas reservoir properties using support vector machines. Int Pet Technol Conf 2011 IPTC. https://doi.org/10.2523/IPTC-14514-MS
Article Google Scholar
Anuradha, Gupta G (2014) A self explanatory review of decision tree classifiers. Int Conf Recent Adv Innov Eng IEEE. https://doi.org/10.1109/ICRAIE.2014.6909245
Article Google Scholar
Bargarai FAM, Abdulazeez AM, Tiryaki VM, Zeebaree DQ (2020) Management of wireless communication systems using artificial intelligence-based software defined radio. Int J Interact Mob Technol 14:107. https://doi.org/10.3991/ijim.v14i13.14211
Article Google Scholar
Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA (2012) A Survey of Evolutionary Algorithms for Decision-Tree Induction. IEEE Trans Syst Man, Cybern Part C Applications Rev 42:291–312. https://doi.org/10.1109/TSMCC.2011.2157494.
Booncharoen P, Rinsiri T, Paiboon P, Karnbanjob S, Ackagosol S, Chaiwan P et al (2021) Pore Pressure Estimation by Using Machine Learning Model. 2021 paper presented at the International Petroleum Technology Conference, Virtual. https://doi.org/10.2523/IPTC-21490-MS
Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends. https://doi.org/10.38094/jastt20165
Article Google Scholar
Chen T, Guestrin C. XGBoost. Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., New York, NY, USA: ACM; 2016, p. 785–94. https://doi.org/10.1145/2939672.2939785.
Choubineh A, Ghorbani H, Wood DA, Robab Moosavi S, Khalafi E, Sadatshojaei E (2017) Improved predictions of wellhead choke liquid critical-flow rates: Modelling based on hybrid neural network training learning based optimization. Fuel 207:547–560. https://doi.org/10.1016/J.FUEL.2017.06.131
Article Google Scholar
Connor JT, Martin RD, Atlas LE (1994) Recurrent neural networks and robust time series prediction. IEEE Trans Neural Networks 5(2):240–254
Article Google Scholar
Damanik IS, Windarto AP, Wanto A, Andani SR, Poningsih, Saputra W (2019) Decision tree optimization in C4.5 algorithm using genetic algorithm. J Phys Conf Ser. https://doi.org/10.1088/1742-6596/1255/1/012012
Article Google Scholar
Darvishpour A, Cheraghi Setfabad M, Wood DA, Ghorbani H (2019) Wellbore stability analysis to determine the safe mud weight window for sandstone layers. Pet Explor Dev 46:1031–1038. https://doi.org/10.1016/S1876-3804(19)60260-0
Article Google Scholar
Dhamija S, Boult TE (2017) Exploring contextual engagement for trauma recovery. 2017 IEEE Conf Comput vis Pattern Recognit Work IEEE. https://doi.org/10.1109/CVPRW.2017.281
Article Google Scholar
Eaton B (1975) The equation for geopressure prediction from well logs. Paper presented at the Fall Meeting of the Society of Petroleum Engineers of AIME, Dallas. https://doi.org/10.2118/5544-MS
Farsi M, Mohamadian N, Ghorbani H, Wood DA, Davoodi S, Moghadasi J et al (2021) Predicting formation pore-pressure from well-log data with hybrid machine-learning optimization algorithms. Nat Resour Res 30:3455–3481. https://doi.org/10.1007/S11053-021-09852-2/FIGURES/19
Article Google Scholar
Flemings PB (2021) A concise guide to geopressure: Origin, prediction, and applications. Cambridge University Press
Book Google Scholar
Folkestad A, Satur N (2008) Regressive and transgressive cycles in a rift-basin: Depositional model and sedimentary partitioning of the Middle Jurassic Hugin Formation, Southern Viking Graben. North Sea Sediment Geol 207:1–21. https://doi.org/10.1016/J.SEDGEO.2008.03.006
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Article MathSciNet Google Scholar
Gavankar SS, Sawarkar SD (2017) Eager decision tree. 2017 2nd Int Conf Converg Technol IEEE. https://doi.org/10.1109/I2CT.2017.8226246
Article Google Scholar
Ghorbani H, Moghadasi J, Wood DA (2017) Prediction of gas flow rates from gas condensate reservoirs through wellhead chokes using a firefly optimization algorithm. J Nat Gas Sci Eng 45:256–271. https://doi.org/10.1016/J.JNGSE.2017.04.034
Article Google Scholar
Giles CL, Lawrence S, Tsoi AC (2001) Noisy time series prediction using recurrent neural networks and grammatical inference. Mach Learn 44:161–183. https://doi.org/10.1023/A:1010884214864
Article Google Scholar
Hassanpouryouzband A, Joonaki E, Edlmann K, Haszeldine RS (2021) Offshore geological storage of hydrogen: is this our best option to achieve net-zero? ACS Energy Lett. https://doi.org/10.1021/ACSENERGYLETT.1C00845
Article Google Scholar
Hazbeh O, Ye ASK, Ghorbani H, Mohamadian N, Ahmadi Alvar M, Moghadasi J (2021) Comparison of accuracy and computational performance between the machine learning algorithms for rate of penetration in directional drilling well. Pet Res 6:271–282. https://doi.org/10.1016/J.PTLRS.2021.02.004
Article Google Scholar
Hochreiter S (1998) Recurrent neural net learning and vanishing gradient. Fuzziness Knowl Based Syst 6:107–116
Article Google Scholar
Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154. https://doi.org/10.1113/jphysiol.1962.sp006837
Article Google Scholar
Jafarizadeh F, Rajabi M, Tabasi S, Seyedkamali R, Davoodi S, Ghorbani H et al (2022) Data driven models to predict pore pressure using drilling and petrophysical data. Energy Rep 8:6551–6562. https://doi.org/10.1016/j.egyr.2022.04.073
Article Google Scholar
Karim F, Majumdar S, Darabi H, Chen S (2018) LSTM fully convolutional networks for time series classification. IEEE Access 6:1662–1669. https://doi.org/10.1109/ACCESS.2017.2779939
Article Google Scholar
Legg S, Hutter M (2007) A Collection of Definitions of Intelligence. Proc. 2007 Conf. Adv. Artif. Gen. Intell. Concepts, Archit. Algorithms Proc. AGI Work. 2006, Amsterdam, The Netherlands, The Netherlands: IOS Press, pp 17–24.
Li Z, Liu F, Yang W, Peng S, Zhou J (2022) A Survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Networks Learn Syst 33:6999–7019. https://doi.org/10.1109/TNNLS.2021.3084827
Article MathSciNet Google Scholar
Lieder F, Griffiths TL (2020) Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behav Brain Sci 43:e1. https://doi.org/10.1017/S0140525X1900061X
Article Google Scholar
Liu L, He BB, Cheng GJ, Yen HW, Huang MX (2018) Optimum properties of quenching and partitioning steels achieved by balancing fraction and stability of retained austenite. Scr Mater 150:1–6. https://doi.org/10.1016/J.SCRIPTAMAT.2018.02.035
Article Google Scholar
Liu W, Chen Z, Hu Y (2022) XGBoost algorithm-based prediction of safety assessment for pipelines. Int J Press Vessel Pip 197:104655. https://doi.org/10.1016/j.ijpvp.2022.104655
Article Google Scholar
Mahesh B (2020) Machine learning algorithms-a review. Int J Sci Res IJSR Internet 9:381–386. https://doi.org/10.21275/ART20203995
Article Google Scholar
Mahetaji M, Brahma J, Sircar A (2020) Pre-drill pore pressure prediction and safe well design on the top of Tulamura anticline, Tripura, India: a comparative study. J Pet Explor Prod Technol 10:1021–1049. https://doi.org/10.1007/S13202-019-00816-0/FIGURES/22
Article Google Scholar
Matinkia M, Amraeiniya A, Behboud MM, Mehrad M, Bajolvand M, Gandomgoun MH et al (2022) A novel approach to pore pressure modeling based on conventional well logs using convolutional neural network. J Pet Sci Eng 211:110156. https://doi.org/10.1016/j.petrol.2022.110156
Article Google Scholar
Mrva J, Neupauer S, Hudec L, Sevcech J, Kapec P (2019) Decision support in medical data using 3D decision tree visualisation. 2019 E-Health Bioeng Conf IEEE. https://doi.org/10.1109/EHB47216.2019.8969926
Article Google Scholar
Nguyen H, Bui X-N, Bui H-B, Cuong DT (2019) Developing an XGBoost model to predict blast-induced peak particle velocity in an open-pit mine: a case study. Acta Geophys 67:477–490. https://doi.org/10.1007/s11600-019-00268-4
Article Google Scholar
Ogunleye A, Wang Q-G (2020) XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans Comput Biol Bioinforma 17:2131–2140. https://doi.org/10.1109/TCBB.2019.2911071
Article Google Scholar
Oloruntobi O, Butt S (2019) Energy-based formation pressure prediction. J Pet Sci Eng 173:955–964. https://doi.org/10.1016/J.PETROL.2018.10.060
Article Google Scholar
Oloruntobi O, Adedigba S, Khan F, Chunduru R, Butt S (2018) Overpressure prediction using the hydro-rotary specific energy concept. J Nat Gas Sci Eng 55:243–253. https://doi.org/10.1016/J.JNGSE.2018.05.003
Article Google Scholar
Oloruntobi O, Falugba O, Ekanem-Attah O, Awa C, Butt S (2020) The Niger Delta basin fracture pressure prediction. Environ Earth Sci 79:1–11. https://doi.org/10.1007/S12665-020-09081-5/TABLES/2
Article Google Scholar
Ozgode Yigin B, Algin O, Saygili G (2020) Comparison of morphometric parameters in prediction of hydrocephalus using random forests. Comput Biol Med 116:103547. https://doi.org/10.1016/j.compbiomed.2019.103547
Article Google Scholar
Paglia J, Eidsvik J, Grøver A, Elisabet Lothe A (2019) Statistical modeling for real-time pore pressure prediction from predrill analysis and well logs. Geophysics. https://doi.org/10.1190/GEO2018-0168.1
Article Google Scholar
Pan S, Zheng Z, Guo Z, Luo H (2022) An optimized XGBoost method for predicting reservoir porosity using petrophysical logs. J Pet Sci Eng 208:109520. https://doi.org/10.1016/j.petrol.2021.109520
Article Google Scholar
Poole DI, Goebel RG, Mackworth AK (1998) Computational intelligence, vol 1. Oxford University Press, New York
Google Scholar
Poulton MM (2002) Neural networks as an intelligence amplification tool: a review of applications. Geophysics 67:979–993. https://doi.org/10.1190/1.1484539
Article Google Scholar
Radwan AE, Abdelghany WK, Elkhawaga MA (2021) Present-day in-situ stresses in Southern Gulf of Suez, Egypt: Insights for stress rotation in an extensional rift basin. J Struct Geol. https://doi.org/10.1016/J.JSG.2021.104334
Article Google Scholar
Radwan AE, Wood DA, Radwan AA (2022) Machine learning and data-driven prediction of pore pressure from geophysical logs: a case study for the Mangahewa gas field, New Zealand. J Rock Mech Geotech Eng 14:1799–1809. https://doi.org/10.1016/J.JRMGE.2022.01.012
Article Google Scholar
Ramdhan AM, Goulty NR (2010) Overpressure-generating mechanisms in the Peciko Field, Lower Kutai Basin. Indonesia Pet Geosci 16:367–376. https://doi.org/10.1144/1354-079309-027
Article Google Scholar
Ramdhan AM, Goulty NR (2011) Overpressure and mudrock compaction in the Lower Kutai Basin, Indonesia: a radical reappraisal. Am Assoc Pet Geol Bull 95:1725–1744. https://doi.org/10.1306/02221110094
Article Google Scholar
Ravasi M, Vasconcelos I, Curtis A, Kritski A (2015) Vector-acoustic reverse time migration of Volve ocean-bottom cable data set without up/down decomposed wavefields. Geophysics 80:S137–S150. https://doi.org/10.1190/geo2014-0554.1
Article Google Scholar
Reis I, Baron D, Shahaf S (2018) Probabilistic random forest: a machine learning algorithm for noisy data sets. Astron J 157:16. https://doi.org/10.3847/1538-3881/aaf101
Article Google Scholar
Richards G, Roberts D, Bere A, Martinez S, Tilita M, Harrold T (2020) Pore pressure prediction based on the Full Effective Stress (FES) method. 3rd EAGE Work Pore Press Predict PPP 2020 (Online), vol 2020, pp 1–5. https://doi.org/10.3997/2214-4609.202038004
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/J.NEUNET.2014.09.003
Article Google Scholar
Sen S, Ganguli SS (2019) Estimation of pore pressure and fracture gradient in volve field, Norwegian North Sea. Paper presented at the SPE Oil and Gas India Conference and Exhibition, Mumbai. https://doi.org/10.2118/194578-MS
Shamshirband S, Rabczuk T, Chau KW (2019) A survey of deep learning techniques: application in wind and solar energy resources. IEEE Access 7:164650–164666. https://doi.org/10.1109/ACCESS.2019.2951750
Article Google Scholar
Silversides K, Melkumyan A, Wyman D, Hatherly P (2015) Automated recognition of stratigraphic marker shales from geophysical logs in iron ore deposits. Comput Geosci 77:118–125. https://doi.org/10.1016/J.CAGEO.2015.02.002
Article Google Scholar
Sneider JS, de Clarens P, Vail PR (1995) Sequence stratigraphy of the middle to upper jurassic, viking graben, north sea. Nor Pet Soc Spec Publ 5:167–197. https://doi.org/10.1016/S0928-8937(06)80068-8
Article Google Scholar
Stein G, Chen B, Wu AS, Hua KA (2005) Decision tree classifier for network intrusion detection with GA-based feature selection. Proc Annu Southeast Conf 2:2136–2141. https://doi.org/10.1145/1167253.1167288
Article Google Scholar
Swain PH, Hauska H (1977) The decision tree classifier: design and potential. IEEE Trans Geosci Electron 15:142–147. https://doi.org/10.1109/TGE.1977.6498972
Article Google Scholar
Szydlik T, Smith P, Way S, Aamodt L, Friedrich C (2007) 3D PP/PS prestack depth migration on the Volve field. First Break 25:43–47
Article Google Scholar
Temirchev P, Simonov M, Kostoev R, Burnaev E, Oseledets I, Akhmetov A et al (2020) Deep neural networks predicting oil movement in a development unit. J Pet Sci Eng 184:106513. https://doi.org/10.1016/j.petrol.2019.106513
Article Google Scholar
Terzaghi K, Peck RB (1948) Soil mechanics. Eng Pract John Wiley Sons Inc, New York
Google Scholar
Tian Y, Pan L (2015) Predicting short-term traffic flow by long short-term memory recurrent neural network. 2015 IEEE Int Conf Smart City/SocialCom/SustainCom IEEE. https://doi.org/10.1109/SmartCity.2015.63
Article Google Scholar
Torlay L, Perrone-Bertolotti M, Thomas E, Baciu M (2017) Machine learning–XGBoost analysis of language networks to classify patients with epilepsy. Brain Informatics 4:159–169. https://doi.org/10.1007/s40708-017-0065-7
Article Google Scholar
Vollset J, Doré AG (1984) A revised triassic and jurassic lithostratigraphic nomenclature for the Norwegian North Sea. Norwegian Petroleum Directorate Bulletin 3. Stavanger, p 53
Wei X, Lulu Z, Yang HQ, Limin Z, Yao YP (2021) Machine learning for pore-water pressure time-series prediction: Application of recurrent neural networks. Geosci Front 12:453–467. https://doi.org/10.1016/J.GSF.2020.04.011
Article Google Scholar
Weninger F, Bergmann J (2015) Introducing currennt : the munich open-source CUDA recurrent neural network toolkit. J Mach Learn Res 16:547–551
MathSciNet Google Scholar
Xie Y, Zhu C, Zhou W, Li Z, Liu X, Tu M (2018) Evaluation of machine learning methods for formation lithology identification: a comparison of tuning processes and model performances. J Pet Sci Eng 160:182–193. https://doi.org/10.1016/J.PETROL.2017.10.028
Article Google Scholar
Yu H, Chen G, Gu H (2020) A machine learning methodology for multivariate pore-pressure prediction. Comput Geosci 143:104548. https://doi.org/10.1016/J.CAGEO.2020.104548
Article Google Scholar
Zebari DA, Haron H, Zeebaree DQ, Zain AM (2019) A simultaneous approach for compression and encryption techniques using deoxyribonucleic acid. 2019 13th Int Conf Software Knowledge Inf Manag Appl IEEE. https://doi.org/10.1109/SKIMA47702.2019.8982392
Article Google Scholar
Zhang J (2011) Pore pressure prediction from well logs: methods, modifications, and new approaches. Earth Sci Rev 108:50–63. https://doi.org/10.1016/J.EARSCIREV.2011.06.001
Article Google Scholar
Zhang Y, Lv D, Wang Y, Liu H, Song G, Gao J (2020) Geological characteristics and abnormal pore pressure prediction in shale oil formations of the Dongying depression. China Energy Sci Eng 8:1962–1979. https://doi.org/10.1002/ESE3.641
Article Google Scholar
Zhang G, Davoodi S, Shamshirband S, Ghorbani H, Mosavi A, Moslehpour M (2022) A robust approach to pore pressure prediction applying petrophysical log data aided by machine learning techniques. Energy Rep 8:2233–2247. https://doi.org/10.1016/J.EGYR.2022.01.012
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Chair of Drilling and Completion Engineering of the Department of Petroleum Engineering at Montanuniversitaet Leoben, Austria. The authors would also like to acknowledge the collaboration with the University of Jeddah (KSA) and their support by the Research, Development and Innovation Development Authority (KSA) with grant number (12992-UJ-2023-UJ-R-3-1-SE-).

Funding

Open access funding provided by Montanuniversität Leoben.

Author information

Authors and Affiliations

Chair of Drilling and Completion Engineering, Department Geoenergy, Montanuniversität Leoben, Parkstraße 27, 8700, Leoben, Austria
Shwetank Krishna, Sahar Keshavarz & Gerhard Thonhauser
Artificial Intelligence Research and Computational Optimization (AIRCO) Laboratory, DigiPen Institute of Technology, 510 Dover Rd, 03-01 SIT@SP, Building, Singapore, 139660, Singapore
Sayed Ameenuddin Irfan
Chemical Engineering Department, University of Jeddah, Jeddah, 23890, Kingdom of Saudi Arabia
Suhaib Umer Ilyas

Authors

Shwetank Krishna
View author publications
You can also search for this author in PubMed Google Scholar
Sayed Ameenuddin Irfan
View author publications
You can also search for this author in PubMed Google Scholar
Sahar Keshavarz
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Thonhauser
View author publications
You can also search for this author in PubMed Google Scholar
Suhaib Umer Ilyas
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.K. - Conceptualize the idea, Prepared the methodology, Carried out the validation, Writing the manuscript original draft, and contributed in developing the data-driven intelligent model. S.A.I. - Contributed in developing the data-driven intelligent model, Contributed in writing the manuscript original draft, and Manuscript review and editing. S.K. - Contributed in writing the manuscript original draft. G.T. - Supervising, and Manuscript review and editing. S.U.I. - Manuscript review and editing.

Corresponding author

Correspondence to Shwetank Krishna.

Ethics declarations

Conflict of interest

The authors declare no conflict of financial interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Krishna, S., Irfan, S.A., Keshavarz, S. et al. Smart predictions of petrophysical formation pore pressure via robust data-driven intelligent models. Multiscale and Multidiscip. Model. Exp. and Des. (2024). https://doi.org/10.1007/s41939-024-00542-z

Download citation

Received: 28 March 2024
Accepted: 09 July 2024
Published: 24 July 2024
DOI: https://doi.org/10.1007/s41939-024-00542-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Smart predictions of petrophysical formation pore pressure via robust data-driven intelligent models

Abstract

Similar content being viewed by others

Estimation of Bottom-Hole Temperature Based on Machine/Deep Learning

Exploratory analysis of machine learning methods in predicting subsurface temperature and geothermal gradient of Northeastern United States

Translation of machine learning approaches into gas hydrate saturation proxy: a case study from Krishna-Godavari (KG) offshore basin

Explore related subjects

1 Introduction

1.1 Background

1.2 Existing models

2 Study area: North sea-volve oil field

3 Methodology

3.1 Well-log data collection and description

3.2 Overburden and pore pressure modeling

3.3 ML and DL approaches and development

3.3.1 DT

3.3.2 XGBoost

3.3.3 RF

3.3.4 RNN

3.3.5 CNN

3.3.6 Model development

3.4 Performance evaluation

4 Result and discussion

4.1 RF algorithm generalization test for pore pressure (PP) prediction

4.2 Comparison with other studies

5 Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation