1 Introduction

1.1 Background

The pore pressure variation concerning depth is important for successful oil and gas extraction from conventional or unconventional reservoirs. Historically, the pore pressure of sedimentary basin strata remains difficult to predict despite decades of study (Abdelghany et al. 2021; Flemings 2021; Radwan et al. 2021; Ramdhan and Goulty 2011, 2010; Zhang 2011). Pore pressure (PP), sometimes called formation pressure, is the fluid pressure within the subsurface formation pore due to the effect of hydraulic potential (Oloruntobi et al. 2020). Oil and gas well drilling is considered safe when the wellbore hydrostatic pressure is between the pore range and fracture pressure (Darvishpour et al. 2019; Richards et al. 2020). Inaccurate prediction of pore pressure during drilling can result in unforeseen hazardous phenomena such as fluid loss, which could result in a kick (uncontrolled flow of formation fluid in the wellbore) and ultimately lead to a blowout, which will cause incurable damage to the drilling rig and well personnel (Mahetaji et al. 2020; Zhang et al. 2022, 2020). Usually, pore pressure is determined using well logs and seismic data. However, these datasets help to provide information about the pore pressure in the proximity of the borehole. Furthermore, calculating pore pressure during drilling takes a considerable amount of time, effort, and money, and occasionally, the data needed to estimate pore pressure may not be recorded effectively because of poor hole conditions. Thus, it would be appropriate to construct a roadmap based on machine learning (ML) that can effectively predict pore pressure profiles based on depth profiles without requiring frequent measurements of the formation pressure. Additionally, it is advantageous for pore pressure estimates to be reliably extrapolated outward from a wellbore into formations that are not drilled. These ML roadmaps are beneficial for optimizing and improving pore pressure estimates and serve as essential parameters in calculating geomechanical variables and help improve the quality of reservoir models (Radwan et al. 2022).

Geoscience and subsurface engineering are a few problems where machine learning techniques are widely applied. Indeed, technology and machine learning algorithms are advancing and revolutionizing the energy sector. Solving the industrial issues accurately in less time, cost, and effort will significantly impact commercial production and resource recovery in the oil and gas industry. The ML technique has been successfully applied in the earth sciences and subsurface engineering disciplines by utilizing geophysical, lithological, and petrophysical data (Poulton 2002; Silversides et al. 2015; Xie et al. 2018). Subsequently, the application of ML techniques continues to diversify and grow across the various divisions of geoscience and petroleum engineering (Anifowose et al. 2011; Schmidhuber 2015). Recently, some studies have been published that utilize the ML learning technique to predict formation pore pressure (Booncharoen et al. 2021; Paglia et al. 2019; Wei et al. 2021). A brief discussion of these existing models is discussed in the next section.

1.2 Existing models

Few studies have been published on applying Multiple ML techniques for predicting pore pressure. In this section, we discuss the recent development of this technique pore pressure prediction application (Ahmed et al. 2019; Booncharoen et al. 2021; Farsi et al. 2021; Radwan et al. 2022; Wei et al. 2021; Yu et al. 2020; Zhang et al. 2022). Yu et al. (2020) suggested an ML technique that utilizes the explicit petrophysical log inputs, namely, sonic velocity, shale volume, and porosity, to predict pore pressure in a subsurface formation with varying lithology. In this study, a normally pressured sequence was trained using a combination of the petrophysical properties and theoretically effective stress—subsequently, the Bower’s unloading equation for the overpressure zones. A total of four ML techniques are studied: multilayer perceptron (MLP) neural network, random forest (RF), support vector machine (SVM), and gradient boosting. Based on the results, all four ML techniques reveal good agreement between measured and predicted pore pressure data, where the RF technique outperforms the other ML techniques. Moreover, the results also depicted the superiority of the RF model in detecting the onset of overpressure compared to other tested ML models. Farsi et al. (2021) compared the efficiency of three hybrid ML models to predict pore pressure. These three hybrid-ML models are multilayer perceptron (MLP) neural network, least square support vector machine, and extreme learning machine (ELM); each of these ML models is hybridized by combining it with powerful particle swarm optimization (PSO). According to the result analysis, ELM with PSO is comparatively accurate in predicting pore pressure compared to the other two hybrid ML models. To identify the reliability of the ELM-PSO model, Farsi et al. (2021) applied the model to three different well data and concluded that the newly developed model could predict pore pressure accurately across the entire studied area.

As part of their study, Wei et al. (2021) compared deep learning recurrent neural networks (RNN) with MLP to predict pore pressures in the soil. Based on the outcome, it is evident that the RNN model predicts pore pressure with high accuracy compared to the MLP model. The prediction performance of these two models (RNN and MLP) is evaluated based on RMSE and R2 values. Booncharoen et al. (2021) studied the performance of various regressor-based algorithms (extreme gradient boosting (XGBoost), ridge, and quantile) to predict pore pressure in the Pattani basin, Thailand using drilling parameters and reservoir characteristics as input variables. As per the root mean square error (RMSE) of the train and test datasets, the overall model performance was calculated to be approximately 1.2 and 1.5 points per gallon, respectively, based on 12 drilling projects in the Pattani basin. According to Ahmed et al. (2019), five machine-learning models were developed to predict pore pressure based on drilling parameters and drilling logs collected in the field. These five models are fuzzy logic (FL), functional network (FN), radial basis function (RBF), and artificial neural network (ANN). Based on comparative analysis, SVM showed good agreement between predicted and measured pore pressure data with an average percentage error of 0.14%.

Radwan et al. (2022) and Zhang et al. (2022) are the two most recent studies on multiple ML techniques to predict pore pressure. In these studies, Radwan et al. (2022) used nine different ML techniques, whereas Zhang et al. (2022) used four techniques and employed field data from New Zealand (Mangahewa gas field) and the Middle East, respectively. Based on the result analysis of the models developed by Radwan and Zhang, the decision tree (DT) showed excellent predictive efficiency with RMSE values in the range of 0.25 to 14.71 psi and 0.99 to 14.46 psi, respectively. Another study by Matinkia et al. (2022) evaluated the application of a convolutional neural network (CNN) for predicting pore pressure. A comparative analysis between CNN, least square support vector machine (LSSVM) with PSO, and multilayer extreme learning machine (MELM) hybridized with the PSO, cuckoo optimization algorithm (COA), and genetic algorithm (GA) is conducted. Here, CNN outperformed the other hybrid deep learning models with RMSE and R2 values of 0.1066 and 0.9806, respectively.

This research aims to find the best machine learning and deep learning (ML/DL) model for predicting pore pressure, addressing a crucial need in the energy industry particularly geothermal, CCUS (Carbon Capture, Utilisation and Storage), and oil and gas. By comparing various ML/DL models known for their accuracy and reliability in pore pressure prediction, like decision tree (DT), XGBoost, random forest (RF), recurrent neural network (RNN), and convolutional neural network (CNN), this research significantly advances the understanding of reservoir dynamics. Importantly, it fills a gap in the literature by conducting the first comparative analysis of ML and DL models using data from the Volve oil field. This approach improves both our knowledge of pore pressure prediction methods and provides insights on using data-driven solutions to tackle complex geomechanical challenges. Moreover, while past studies often had limited data points, ranging from 5064 to 7171, this research uses a larger dataset of 22,539 points from five different offshore wells in the Volve oil field. This extensive data enables better transfer learning and model optimization, boosting prediction efficiency. Further, this research holds significant industrial value. By comprehensively evaluating pore pressure prediction reliability across the entire study area, it provides valuable insights that can optimize reservoir management and ensure safe and efficient hydrocarbon extraction.

2 Study area: North sea-volve oil field

The Volve oil field is located inside Block 15/9, 200 km west of Stavanger, at the southern end of Norway’s continental shelf in shallow water (water depth of about 80 m), shown in Fig. 1. It was discovered in 1993 and established using a jack-up rig processing and drilling capability system. Drilling started in this field in 2007 and commenced production with traditional water injection pressure support in the year 2008, having a life expectancy of 3–5 years. This field shut down in September 2016 after operating for eight years. During its operation, Volve Field produced fifty-six thousand (56,000) barrels per day and delivered a total of sixty-three (63) million barrels of oil in its lifetime (Sen and Ganguli 2019).

Fig. 1
figure 1

North Sea Volve oil field geological location (modified from Ravasi et al., (2015))

Located at a 2700–3100 m depth, the Hugin formation is the primary siliciclastic reservoir during the Middle Jurassic period (Vollset and Doré 1984). A combination of structural and stratigraphic traps has contained the oil in this reservoir. It should be noted that the reservoir structure has a small dome-type shape that originated due to the disintegration of the adjacent salt ridges during the Middle Jurassic period (Szydlik et al. 2007). Several faults occur in the western portion of the structure, most of which are from regional extensions in salt tectonics, making communication across these faults difficult. As a result of these structural events, the reservoir thickness is constricted in a range of 20 m at the crest and 100 m on the flanks of the structure (Sen and Ganguli 2019).

The sediment deposition in the Hugin system of the Viking group resulted from the rift collapse and continuous flooding of Viking graben due to a major transgression during the Mid-Late Jurassic period (Sneider et al. 1995). Consequently, the Hugin formation consists largely of shallow marine to marginal marine sandstone deposits with the occurrence of coal seams. This formation is mainly composed of clean sandstones. These clean sandstones primarily consist of mica, quartz, and clay minerals. The distribution of lithofacies controlled by shallow water tidal systems was responsible for the deposition of the clean sandstones Folkestad and Satur, (2008). Under sea level, there is an average depth of 2750 to 3120 m at which the top Hugin Formation can be encountered. The value of general properties of Hugin formation, i.e., porosity, water saturation in the oil zone, permeability, and net-gross, is 0.21 p.u., 20%, 1 Darcy, and 0.93, respectively. A generalized version of regional stratigraphy is illustrated in Fig. 2.

Fig. 2
figure 2

Generalized stratigraphy of the study area, Volve Field, according to the studied wells (modified from Sen and Ganguli, (2019))

3 Methodology

This paper uses five different algorithms to identify the most effective pore pressure predictor: DT, XGBoost, RF, RNN, and CNN. A short explanation of these algorithms is given in Sect. 3.3. However, for detailed understanding on the archeiteture, hyper/control parameters and the associated methodologies of these ML/DL algorithms are documented elsewhere with most, previously employed for pore pressure prediction: DT (Zhang et al. 2022); XGBoost (Booncharoen et al. 2021); RF (Yu et al. 2020); RNN (Wei et al. 2021); and CNN (Matinkia et al. 2022). Figure 3 shows the schematic workflow depicting how the intelligent algorithms for predicting pore pressure were constructed, evaluated, and compared for accuracy. As illustrated in Fig. 3, the first two steps represent the data gathering and data cleaning process, which involves removing the outliers and null values. Next, an analytical model, i.e., Eatons’ model, is used to calculate the pore pressure, which is further compared with the limited repeat formation tester (RFT) pore pressure data to establish the credibility of the analytical model (Fig. 4). Afterward, all the variable’s maximum and minimum values are identified, and then the data are normalized in a numerical range of + 1 and − 1. Equation (1) is used for normalization.

$$\begin{array}{*{20}c} {D_{n}^{v} = 2 \times \left( {\frac{{D_{n}^{v} - D_{min}^{v} }}{{D_{max}^{v} - D_{min}^{v} }}} \right) - 1} \\ \end{array}$$
(1)

where \({D}_{n}^{v}\) is the value of the variable (\(v\)) for nth data; \({D}_{min}^{v}\) and \({D}_{max}^{v}\) are the minimum and maximum values of the variables in the entire dataset, respectively.

Fig. 3
figure 3

Schematic workflow of the present study

Fig. 4
figure 4

Measured pore pressure (RFT data) vs calculated pore pressure

Subsequently, the normalized dataset is split into two subsets, namely train and test, with a 70:30 division ratio. Then, the selected models are developed, trained, and tested using the subsets. After that, a statistical approach is used to identify the accuracy of the developed models and the selection of the best-performing model. Finally, the best-performing model is utilized to predict the pore pressure of a completely new dataset from different wells to identify its level of universality/generalizability.

Algorithm
figure a

DT, XGBoost, RF, RNN, CNN

3.1 Well-log data collection and description

The petrophysical data from the five wells drilled in the Volve oil field (15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, 15/9-F-11 T2, and 15/9-F-14) are used in this study of ML and DL evaluations. The dataset consists of high-resolution petrophysical variables, which include compressional wave travel time (DTC), shear wave travel time (DTS), gamma ray (GR), neutron porosity (NPHI), photoelectric absorption factor (PEF), bulk density (RHOB), and resistivity log (RT). A summary of these variables’ application to identify target formation characteristics is detailed in Table 1. The selected variables constitute the input data used to predict the formation of pore pressure. The five wells compiled a total of 22,539 data points (5749 for 15/9-F-1 A, 1751 for 15/9-F-1 B, 3900 for 15/9-F-11 A, 7358 for 15/9-F-11 T2, and 3781 for 15/9-F-14). Out of these five wells, four wells are used to train and test the models, and the remaining one well data is used to determine the generalization/universality of the best-performing model. In this study, the pore pressure (PP) is calculated using Eaton’s method, which appeared to be reasonable based on the limited number of repeat formation tester (RFT) data (Fig. 4). The statistical details of the total dataset and the data distribution of the five wells are provided in Tables 2 and 3. In Tables 2 and 3, statistical parameters, namely mean, standard deviation, percentile P25, percentile P50, and percentile P75, are provided for each variable in the dataset. The mean and standard deviation of the dataset present the average of the total data and the distribution of the dataset. P25, P50, and P75 provide valuable information about the distribution and spread of the data. P25, P50, and P75 are the dataset's first, second, and third quartiles. P25 provides insight into the lower portion of the data distribution, P50 in the central part, and P75 in the upper part of the total dataset. It aids in evaluating data variability, identifying outliers, comparing datasets, and making informed decisions based on the data distribution.

Table 1 Application of the selected variables used in this study to determine the characteristics of the target formation
Table 2 Selected variables statistical details for the total dataset (total 22,539 data records)
Table 3 Selected variables and statistical details for the datasets were collected separately from 15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, 15/9-F-11 T2, and 15/9-F-14

A correlation heatmap considering the four wells data (15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, and 15/9-F-11 T2) is also illustrated using Fig. 5. The correlation heatmap, shown in Fig. 5 is provided to quantify the degree of correlation amongst the selected petrophysical variables. This examination assists in identifying the distribution of data and their anticipated patterns. According to Fig. 5, PEF, RHOB, and RT positively correlate with pore pressure. Meanwhile, NPHI and DTC show a good negative correlation with pore pressure, while DTS and GR show a poor negative correlation (Fig. 5).

Fig. 5
figure 5

Heatmap plot to provide correlation among the selected petrophysical variables considering the four wells data accounting for the total data points of 18,758

3.2 Overburden and pore pressure modeling

The overburdened stress (\({\gamma }_{s}\)) at a selected point based on the Terzaghi and Peck, (1948) theory of soil bearing capacity explains that it is assisted by the rock matrix and the fluid stored in that rock matrix, represented as Eq. (2) (Eaton 1975; Liu et al. 2018).

$$\begin{array}{*{20}c} {\gamma_{s} = \gamma_{e} + PP} \\ \end{array}$$
(2)

where, \({\gamma }_{e}\) is the effective vertical stress, and \(PP\) is the pore pressure. Hence, the fluid experiences high overpressure in contrast to hydrostatic pressure due to the overburden stress. The bulk density of the overlying formation and the target depth are the parameters used to calculate the overburden stress (\({\gamma }_{s}\)) (Eq. (3)). The bulk density is determined from the petrophysical log data.

$$\begin{array}{*{20}c} {\gamma_{s} = \displaystyle\int\limits_{0}^{z} \rho .dz} \\ \end{array}$$
(3)

where \(\rho\) refers to the bulk density, and \(z\) refers to the depth (Oloruntobi et al. 2018; Oloruntobi and Butt 2019). The subsurface formation rock properties, such as cd-exponent, acoustic velocity, and resistivity, usually increase with increasing depth. However, trapped overpressured fluid in the rock matrix tends to negatively affect the trend of these measured rock properties. Hence, many pore pressure methodologies were developed, the primary foundation of which is Eq. (2). Eaton’s method is widely used and approved by the industry in these methodologies. Hence, this model is utilized for pore pressure calculation in this study.

According to the Eaton, (1975) article, cd-exponent, acoustic velocity, or resistivity data can be used to estimate pore pressure as presented in Eq. (4).

$$\begin{array}{c}PP={\gamma }_{s}-\left({\gamma }_{s}-{P}_{h}\right){\left({x}_{obs}/{x}_{norm}\right)}^{k}\end{array}$$
(4)

where, \({P}_{h}\) refers to the hydrostatic pressure, \({x}_{obs}\) refers to the observed attribute and \({x}_{norm}\) refers to the normally expected attribute for resistivity, acoustic velocity, or cd-exponent parameter. \(k\) is the Eaton’s exponent, which varies with the selection of petrophysical parameters used to estimate the pore pressure. \(k\) is equal to 1.2 when resistivity or cd-exponent data are used, and it is equal to 3 when acoustic velocity data is used to estimate pore pressure.

3.3 ML and DL approaches and development

Artificial intelligence (AI) is commonly used to refer to machine intelligence or prediction based on machine learning algorithms. In most research literature, this tool is defined as an intelligent product with knowledge and design that resembles the natural capability of human minds (Legg and Hutter 2007; Temirchev et al. 2020). AI mimics human behavior; the special focus will be on how the neurons work in the brain, solve problems, and acquire knowledge and reasoning (Hassanpouryouzband et al. 2021; Poole et al. 1998). Nowadays, many of the engineering problems in different sectors of science and technology are solved by using artificial intelligence algorithms (Choubineh et al. 2017; Farsi et al. 2021; Ghorbani et al. 2017; Hazbeh et al. 2021; Shamshirband et al. 2019). Philosophical, linguistic, mathematical, psychological, neuroscience, physiology, game theory, etc., are sources of the origin and the primary ideas behind artificial intelligence algorithms development (Lieder and Griffiths 2020).

3.3.1 DT

Decision trees are one of the powerful techniques commonly used in several fields, such as classification problems and regression analysis for different engineering applications; another advantage of the decision tree is the explainability of the model (Stein et al. 2005). The general architecture of DT is illustrated in Fig. 6. In a decision tree (DT), a series of basic tests where in each test, the numerical feature is compared to a threshold value are effectively united, and the calculation is shown in the following Eq. (5) (Damanik et al. 2019).

$$\begin{array}{c}y=f\left(x\right)=\sum\limits_{i}{y}_{i}\cdot I\left({x}_{i}\in {R}_{i}\right)\end{array}$$
(5)

where \(y\) is the predicted target variable (output) (pore pressure).

Fig. 6
figure 6

The decision tree (DT) model architecture. A decision tree comprises three key components: a node, a condition, and a production (value)

\(x\) is the vector of input variables (features) (DT, DTS, GR, NPHI, PEF, RHOB, RT).

\({y}_{i}\) is the predicted output value for the \(i\) th leaf node (i.e., the average value of the training samples in that leaf node).

\({R}_{i}\) is the region of feature space corresponding to the \(i\) th leaf node.

\(I\) is the indicator function, which returns \(1\) if the condition in parentheses is true and \(0\) otherwise.

Developing the conceptual rule in DT is much more straightforward than constructing the numerical weights of the connections in neural network nodes (Anuradha and Gupta 2014; Barros et al. 2012). DT is mainly used for grouping and classification modeling in data mining (Gavankar and Sawarkar 2017). Each tree is composed of nodes and subsets. Each node represents the features that must be classified, while the subset defines the value for the node itself (Mahesh 2020; Swain and Hauska 1977). The decision tree has been successfully implemented in many fields due to its simple analysis and precision in various data forms (Charbuty and Abdulazeez 2021; Mrva et al. 2019). Table 4 shows the list of hyper/control parameters for the developed DT model to predict pore pressure.

Table 4 Hyper/control parameters used for the 5 different ML/DL models to predict pore pressure

3.3.2 XGBoost

The gradient-boosting decision tree algorithm was initially proposed in 2001(Friedman 2001). The method is based on the gradient descent method for generating new trees based on previous trees considering minimizing an objective function. The extreme gradient boosting decision tree (XGBoost) is a decision tree-based ensemble model that uses a scalable gradient boosting framework Chen, and Guestrin developed in 2016. Due to its high performance, it has been popular for classification and regression problems (Booncharoen et al. 2021; Chen and Guestrin 2016; Liu et al. 2022; Ogunleye and Wang 2020; Pan et al. 2022; Torlay et al. 2017). The distributed, parallel, cache-aware, and out-of-core computing made this algorithm much faster than most machine learning and deep learning algorithms. Moreover, the algorithm is well-optimized and scalable enough to process a large amount of data in memory-limited or distributed computing settings. The method can be easily implemented on datasets with data sparsity challenges, missing values, and too frequent zero values (Chen and Guestrin 2016). Implementing the XGBoost to a regression problem continuously generates and adds new regression trees to the model. Further, the residual of the old model is fitted through the updated classification and regression trees. The final predicted value is based on the sum of the results of each tree (Nguyen et al. 2019). The control parameters used to develop the XGBoost are listed in Table 4. The architecture of the XGBoost and RF model is illustrated in Fig. 7.

Fig. 7
figure 7

The simplified structure of XGBoost and RF. While both models share a similar structure, XGBoost employs boosting, sequentially refining decision trees to reduce residual errors, whereas RF utilizes bagging, training multiple decision trees simultaneously, and determining the final output through a majority vote

3.3.3 RF

Random forest is an ensemble model based on decision trees, which uses bagging to overcome one of the disadvantages of decision trees, namely overfitting (Fig. 7). The Random Forest (RF) algorithm is a DT-based model that selects the best regression tree from tree-structured regressions with identically independent distributed random vectors by a unit voting (Reis et al. 2018; Zebari et al. 2019). The method has been widely used for classification and regression tasks by providing a higher cross-validation accuracy (Bargarai et al. 2020). It has been applied mainly in the image processing field and on various problems such as news classification, content filtering, intrusion detection, and sentiment analysis (Aljumaily et al. 2023). In RF, a random vector independent from previous similar distribution random vectors and a tree from the training set is generated. An upper bound is extracted to compute the generalization error to evaluate the accuracy and interdependency of individual classifiers (Ozgode Yigin et al. 2020). RF proved its effectiveness via large data sets as well as a large number of input variables. The algorithm can handle lost details and incomplete data without accuracy reduction (Aljumaily et al. 2023). The control parameters used to develop the RF are listed in Table 4.

3.3.4 RNN

An extension of an artificial neural network known as a recurrent neural network (RNN) can handle temporal patterns and contains an internal memory to process arbitrary temporal sequences. This framework has a feedback loop to store data about previous behavior (Connor et al. 1994; Giles et al. 2001). The schematic RNN architecture is shown in Fig. 8 and hyper/control parameters used to develop this model is listed in Table 4. RNN evaluates the current observation in the context of the network's internal (hidden) state and converts the input sequence data into vectors to produce the output (Al-Shabandar et al. 2021). Due to the vanishing gradient challenge, RNN may not be capable of analyzing long-term sequential data. In large time-series datasets, network errors are reproduced from the output layer forward to the training phase’s input layer (Karim et al. 2018; Tian and Pan 2015). The weight decay of the gradient descent may decrease and ultimately block in the lower levels due to this error propagation. The vanishing gradient problem can stop an RNN’s further training (Hochreiter 1998).

Fig. 8
figure 8

Simplified schematic of RNN architecture

The Long Short-Term Memory (LSTM) as a type of RNN mitigates the vanishing gradient problem by representing the long-term dependencies. LSTM has a standard and a special unit. The information is stored and transferred for a long time in memory using multiplicative unit gates via the special unit. It sets the constant error while the unit gates control the constant error flow, and eventually, LSTM can modify the weight decay parameters over time (Weninger and Bergmann 2015). The other advantage of LSTM over the standard RNN is its sensitivity to the length of the input time series (Dhamija and Boult 2017). LSTM is a popular technique for sequential text data sets, and LSTM is one of the popular techniques for time series datasets; in this study, we have implemented RNN in a regression dataset to compare its performance with other machine learning models.

3.3.5 CNN

Convolutional neural networks (CNN) have achieved outstanding results, making them one of the most representative neural networks in deep learning. CNN uses convolutional architecture to extract features from data. Compared to conventional feature extraction techniques, CNN does not require manual extraction of the features (Ahonen et al. 2006). CNN structure has been inspired by visual perception (Hubel and Wiesel 1962). The artificial neuron represents the biological neurons, while the receptors responding to different features correspond to the CNN kernels. The activation function also replicates a function that transmits the neural network signal to the next neuron if it exceeds a predetermined threshold. In addition, loss and optimization functions help the CNN to learn the expected result (Li et al. 2022). In CNN, a neuron is only connected to a few neurons rather than all neurons of previous layers, which speeds up the convergence. The structure is in a way that a group of connections sharing similar weights reduces the additional parameters. As the last advantage, a pooling layer downsamples the data while keeping useful information to reduce the number of parameters and dimensions (Li et al. 2022). CNN is widely used in image classification problems; in this study, we tried to implement CNN for regression and compare its performance with traditional regression algorithms like RF and XGBoost.

3.3.6 Model development

Five ML and DL algorithms (DT, RF, XGBoost, RNN, and DNN) are developed using the recorded data from the wells drilled in the Volve oil field. The total dataset from these wells is divided into training and testing subsets. 70% of these are used to train the algorithms, and the remaining 30% are used to test the algorithms. A tenfold cross-validation approach is performed on the training and testing data to ensure that all the data subsets are considered during modeling estimation.

We used RandomizedSearchCV from the Scikit-Learn library for the ML models and a grid search approach for the DL models. RandomizedSearchCV performs a randomized search over hyperparameters, offering efficiency over exhaustive search methods like GridSearchCV. During the hyperparameter tuning process with RandomizedSearchCV, we used a fivefold cross-validation. The primary reason for using a fivefold cross-validation in the RandomizedSearchCV step was to balance computational efficiency and robust validation. The RandomizedSearchCV process is used to identify the best hyperparameters for the model by searching through a large parameter space. In this step, using a fivefold cross-validation allows for faster iterations while still providing reliable estimates of model performance. This choice helps to reduce the overall computational time required for the hyperparameter tuning process. Once the best hyperparameters are identified, a more rigorous tenfold cross-validation is performed on the training and testing data using the optimized model. The tenfold cross-validation provides a more robust assessment of the model’s performance and helps to minimize the risk of overfitting.

We defined the following ranges for each hyperparameter and conducted 50 search trials: for the Decision Tree (DT), maximum depth ranged from 10 to 100 in increments of 10, and splitter options included ‘best’ and ‘random’; for XGBoost, the number of estimators was set to 50, 100, or 200, learning rates were 0.01, 0.1, or 0.3, maximum depths were 3, 6, or 9, and subsample and colsample_bytree values were 0.5, 0.75, or 1. For the Random Forest (RF), maximum depth ranged from 10 to 1030 in increments of 10, and random states included 0, 42, and 100.

For deep learning models, we defined a grid search for tuning hyperparameters. For the Recurrent Neural Network (RNN), we considered LSTM units of 10, 50, or 100; loss functions ‘mae’ or ‘mse’; optimizers ‘adam’ or ‘rmsprop’; batch sizes of 32, 64, or 128; and epochs of 20, 50, or 100. For the Deep Neural Network (DNN), dense layer units were 15, 30, or 50 with ‘relu’ activation; hidden layers configurations included (512, 512), (1024, 512), (1024, 1024, 512), (512, 256), or (512, 1024, 1024, 512, 256); output layer had 1 unit with ‘linear’ activation; kernel initializer was ‘normal’; optimizer was ‘Adam’ with learning rates of 0.001 or 0.01, beta_1 of 0.9, and beta_2 of 0.999; loss functions were 'mae' or 'mse'; batch sizes were 64 or 128; and epochs were 50 or 100.

After fitting the randomized search model with our training data, we identified the best estimators and used these optimal hyperparameters to build an updated XGBoost regressor model, decision tree regressor, and random forest regressor. The model was then fit on the training data and evaluated using the root mean square error (RMSE) metric.

In addition to machine learning regressors, we implemented a grid search for deep learning modules. The architecture of CNN consisted of an input layer, five (5) hidden layers with varying numbers of neurons (see Table 4), and a final output layer. Dropouts were used to prevent overfitting, and L1 regularization was added to the output layer to improve the model's generalization capabilities further. Each layer in the network leverages the ReLU activation function, which helps the model learn complex patterns. Similarly, for RNN, specifically the LSTM model we applied the grid search method. The architecture of the RNN included an input layer, a single LSTM layer with 50 units, and a final dense output layer. The model was trained using the mean absolute error (MAE) as the loss function and the Adam optimizer for efficient convergence. Table 4 shows the optimized values of hyper/control parameters used for developing the ML/DL models to predict pore pressure.

The details of the performance metric of training, testing, and total dataset from the four different wells at the Volve oil field to predict PP by the different ML and DL algorithms are presented in Tables 5, 6, 7 in terms of seven performance prediction measures.

Table 5 Statistical accuracy measures for predicting PP were taken using the five ML and DL algorithms created using the training subset
Table 6 Statistical accuracy measures for predicting PP via the five ML and DL algorithms created using the testing subset
Table 7 Statistical accuracy measures for predicting PP via the five ML and DL algorithms created using the total subset

3.4 Performance evaluation

To compare and evaluate the performance and accuracy of the AI models to predict formation pore pressure, the following statistical error indicators, namely root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), average relative error (ARE), absolute average relative error (AARE), standard deviation (SD), and coefficient of determination (R2) are applied in the study. The coefficient of determination (R2) range is between 0 to 1. The closer the R2 value to 1, the better the model prediction is. For the other performance parameters, the smaller the value, the better the prediction. The specific formula is described as follows Eqs. 612.

$$\begin{array}{c}RMSE=\sqrt{\frac{1}{m}\sum_{i=1}^{m}{\left({x}_{i}^{mea}-{x}_{i}^{Pred}\right)}^{2}}\end{array}$$
(6)
$$\begin{array}{c}MAE=\frac{1}{m}\sum\limits_{i=1}^{m}\left|{x}_{i}^{mea}-{x}_{i}^{Pred}\right|\end{array}$$
(7)
$$\begin{array}{c}MAPE=\frac{1}{m}\sum\limits_{i=1}^{m}\left|\frac{{x}_{i}^{mea}-{x}_{i}^{Pred}}{{x}_{i}^{mea}}\right|\times 100\end{array}$$
(8)
$$\begin{array}{c}ARE=\frac{\sum_{i=1}^{m}\left(\frac{{x}_{i}^{mea}-{x}_{i}^{Pred}}{{x}_{i}^{mea}}\times 100\right)}{m}\end{array}$$
(9)
$$\begin{array}{c}AARE=\frac{\sum_{i=1}^{m}\left|\frac{{x}_{i}^{mea}-{x}_{i}^{Pred}}{{x}_{i}^{mea}}\times 100\right|}{m}\end{array}$$
(10)
$$\begin{array}{c}SD=\sqrt{\begin{array}{c}\frac{\sum_{i=1}^{m}{\left(\left({x}_{i}^{mea}-{x}_{i}^{Pred}\right)-\frac{1}{m}\sum_{i=1}^{m}\left({x}_{i}^{mea}-{x}_{i}^{Pred}\right)\right)}^{2} }{m-1}\\ \end{array}}\end{array}$$
(11)
$$\begin{array}{c}{R}^{2}=\frac{\sum_{i=1}^{m}({x}_{i}^{mea}-\overline{{x }_{i}^{mea}})({x}_{i}^{Pred}-\overline{{x }_{i}^{Pred}})}{\sqrt{\sum_{i=1}^{m}{\left({x}_{i}^{mea}-\overline{{x }_{i}^{mea}}\right)}^{2}\sum_{i=1}^{m}{\left({x}_{i}^{Pred}-\overline{{x }_{i}^{Pred}}\right)}^{2}}} \end{array}$$
(12)

where, \({x}_{i}^{mea}\) is the measured value, \({x}_{i}^{Pred}\) is the predicted value, \(\overline{{x }_{i}^{mea}}\) is the average of the measured value, \(\overline{{x }_{i}^{Pred}}\) is the average of the predicted value and \(m\) is the number of the sample data. Each prediction performance measure has unique attributes. Although some of these measures may have strong correlations, assessing and contrasting multiple statistical error parameters is beneficial when assessing the effectiveness of AI algorithms. Furthermore, the results of comparative analysis on the performance of the models are illustrated using a combination of RMSE/R2 and MAE/MAPE to provide visual confirmation of model performance.

4 Result and discussion

Based on the results presented in Tables 5, 6, 7, it is apparent that the accuracy of predicting pore pressure favoring DT and RF algorithms over the other algorithms (XGBoost, RNN, CNN). During the training phase, both DT and RF show an R2 value of 0.9927 and 0.9928, respectively. However, this R2 value of DT and RF reduces to 0.8801 and 0.9438, respectively, during the testing stage. The low performance of DT on the testing dataset may indicate some overfitting or data memorization occurring during the training stage. This is attributed to the issue of data overfitting during the DT model training. Hence, unlike DT and RF algorithms, the latter delivers a more accurate pore pressure prediction. As compared to DT, RF delivers more accurate pore pressure prediction on the testing dataset. In Tables 5, 6, 7, it can also be seen that RF delivers prediction with higher accuracy and lower error, where the RMSE value is 1.57, 4.30, and 2.70 MPa for the training, testing, and total dataset, respectively. Therefore, based on the results, the RF algorithm is declared the best-performing algorithm for the present study by outperforming the other four algorithms. However, it is important to consider the relative importance of each metric when choosing an algorithm for a particular task. For example, if minimizing absolute errors is most important, then MAE or RMSE might be the best metrics to use. Whereas, if the data has a wide range of values, then MAPE might be a more appropriate metric. DT and RF shows the lowest MAE value with DT being the lowest during training and RF being the lowest during Testing, followed by XGBoost, RNN and CNN both during training and testing. The MAPE value seems to show a similar pattern to MAE with DT and RF having the lowest error and CNN having the highest error.

CNN has the highest RMSE, indicating the largest deviation from the actual values for both training and testing stage. In contrast, DT and RF exhibit the lowest RMSE values, with RF slightly outperforming DT, suggesting they have the smallest prediction errors. RNN demonstrates a high RMSE, indicating less accurate predictions. XGBoost performs better than CNN and RNN but does not achieve the same level of accuracy as DT and RF. The prediction accuracy of the developed ML and DL algorithms are also illustrated graphically in terms of RMSE/ R2 and MAE/MAPE in Fig. 9a, b.

Fig. 9
figure 9

The comparative results of the five developed algorithms are based on four evaluation metrics: MAE (a), MAPE (a), RMSE (b), and the coefficient of determination R2 (b)

Figure 10 provides a comparative analysis between the predicted and calculated pore pressure values for the five ML and DL algorithms, using the total dataset. The maroon straight line in the figure represents the perfect fit line, where predicted values would exactly match the calculated values. The Random Forest (RF) algorithm shows the most accurate predictions, with the majority of its data points clustered closely around the perfect fit line. This indicates a strong correlation between the predicted and actual pore pressures, demonstrating RF’s high level of precision. The Decision Tree (DT) algorithm also performs well, with many of its points aligning closely with the perfect fit line. However, there is a slightly greater spread of points away from the line compared to RF, indicating that DT, while accurate, is marginally less precise than RF. XGBoost shows predictions that are generally close to the perfect fit line but exhibit a wider spread than those of RF and DT. This suggests that XGBoost, while relatively accurate, has a higher degree of variability in its predictions, indicating less consistency. RNN and CNN has the widest spread of points from the perfect fit line among the five algorithms, indicating the least accurate predictions. The points are more dispersed, showing higher deviations from the actual values.

Fig. 10
figure 10

The comparative analysis between the predicted and calculated pore pressure for the five ML and DL algorithms was created using the total data subset

Overall, the figure clearly illustrates the superiority of the RF algorithm in predicting pore pressure. RF’s predictions are consistently closer to the calculated values, highlighting its effectiveness and reliability. The DT algorithm follows, showing good performance but with slightly less precision. XGBoost, while relatively accurate, has more variability in its predictions. Both RNN and CNN show less accurate predictions, with CNN performing the worst among the five algorithms. This analysis underscores the robustness and accuracy of the RF algorithm in pore pressure prediction compared to the other models.

Figure 11 displays a cross-plot of the predicted and calculated pore pressure for each ML and DL algorithm individually, allowing for further comparison of their evaluation accuracy. As per Fig. 8, the RF algorithm predicts better as most results are close to the perfect fit line. The coefficient of determination (R2) value delivered by each algorithm is used to determine the accuracy, which can be ordered as RF > DT > XGBoost > RNN > CNN. In addition, Fig. 11 displays the range of relative residuals of the predicted pore pressure values for the five algorithms (DT, XGBoost, RF, RNN, and CNN). The findings indicate that the RF algorithms have a smaller relative residual range (ranging from 0.5 to − 1.25) than the remaining algorithms. At this point, it needs to be highlighted that the superiority of the random forest algorithm over the other remaining algorithms, especially DT, is because of its ability to create an uncorrelated decision tree forest. This decision tree forest is created using the bagging and feature randomness method, which reduces the chance of overfitting as features of the random subset are generated, which ensures a low correlation between decision trees.

Fig. 11
figure 11

The cross plot of the predicted and calculated pore pressure with the range of relative residuals of the developed five ML and DL algorithms based on the total dataset: a CNN, b DT, c RF, d RNN, and e XGBoost

4.1 RF algorithm generalization test for pore pressure (PP) prediction

The outcomes presented in the preceding section, which pertained to the performance of the five ML and DL algorithms on the training, testing, and overall datasets, were obtained using data from four wells of the Volve oil field (15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, and 15/9-F-11 T2). While the initial training and testing on these four wells provide a robust evaluation of model performance within that subset of data, it does not fully confirm the model’s ability to generalize to new, unseen data from the same field. Therefore, an additional set of data gathered from the well 15/9-F-14 of the Volve field, containing 3781 data points, is utilized to evaluate the capability and reliability of the RF algorithm for precise pore pressure prediction. This additional evaluation using Well 15/9-F-14 serves as an independent test set that was not used in the training or initial testing phases, thereby allowing us to assess the model’s generalization capabilities more comprehensively. Assessing the model’s performance on Well 15/9-F-14 provides additional insights into its robustness and reliability when faced with data from a different well in the same field. This extra validation step strengthens the confidence in the model’s performance and its potential for practical application in the field of study.

Table 8 provides the statistical accuracy metrics acquired by the RF algorithm after applying the Well 15/9-F-14 dataset. Comparing the results presented in Table 8 and Tables 5, 6, 7 confirms a significant capability of the developed RF algorithm in predicting pore pressure when applied on another well (15/9-F-14) from the field of study. Figures 12a and b show the cross plot of predicted and calculated pore pressure with the RF-trained algorithm's relative residual and distribution histogram when applied to another data set from the same field of study. Figure 12a shows a satisfactory alignment of the predicted data with the perfect fit line, R2 value of 0.909, and relative residual in a range of 0 to − 7, which corroborates the reliability of the RF algorithm to predict the pore pressure in the other wells throughout the field of study. Figure 12b illustrates the cross plot and distribution histogram of predicted and calculated pore pressure. As mentioned above, the predicted value is satisfactorily relevant to the calculated pore pressure, and the distribution of the predicted results is relatively concentrated at degrees with few discrete points.

Table 8 Statistical accuracy measures for predicting PP via the RF algorithms trained using wells 15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, and 15/9-F-11 T2 dataset applied to well 15/9-F-14 from the same field of study
Fig. 12
figure 12

The cross plot of the predicted and calculated pore pressure with the a range of relative residuals and b distribution histogram, achieved by the RF algorithms trained using wells 15/9-F-1 A, 15/9-F-1 B, 15/9-F-11 A, and 15/9-F-11 T2 dataset applied on the well 15/9-F-14 from the same field of study

4.2 Comparison with other studies

Table 9 summarizes recent research on pore pressure prediction using machine learning and deep learning algorithms, highlighting a comparative analysis with the present study. Existing studies generally utilize smaller datasets for training and testing their models, often gathered from one or two wells, limiting the diversity of data and potentially reducing model reliability. Moreover, previous studies commonly lack generalization tests for assessing model performance beyond the training data.

Table 9 Comparison of present study results with existing studies

In contrast, the present study distinguishes itself by employing a significantly larger dataset comprising 18,758 data points for model development and evaluation gathered from four different wells. The study finds that Random Forest (RF) outperforms other machine learning and deep learning algorithms regarding prediction accuracy when provided with a large quantity of petrophysical data points. Additionally, the present research conducts generalization tests to assess the robustness and reliability of the developed models, a step often overlooked in prior studies. Moreover, the study compares various machine learning and deep learning algorithms such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), XGBoost, Decision Trees (DT), and RF to assess their performance in predicting pore pressure. This comprehensive evaluation, which has not been conducted before, offers valuable insights into identifying the most suitable algorithms for pore pressure prediction tasks.

5 Conclusion

The study evaluates five machine learning and deep learning algorithms for predicting pore pressure through supervised learning. It was found that the RF algorithms showed the best performance as compared to the other algorithms, with an RMSE value of 2.70 MPa and an R2 value of 0.97. On the other hand, the CNN algorithm was found to be the least accurate, with a RMSE value of 8.26 MPa and R2 of 0.82. A generalization test of the best-performing algorithm (RF) is carried out to assess its reliability. This is done by predicting the pore pressure of another well (15/9-F-14) from the same field using the trained RF algorithm. Based on the result, it is concluded that the RF algorithm can be used for pore pressure prediction in another well of the same field as it showed a satisfactory RMSE and R2 value, i.e., 6.48 MPa and 0.905, respectively. The best-performing RF model possesses advantageous attributes that enable it to offer rapid and dependable predictions of pore pressure across different conditions within the field under investigation. This research also demonstrates using machine learning (ML) and deep learning (DL) models to develop precise data-based approaches for forecasting sub-surface pore pressure using easily accessible petrophysical well-log data.

The current study exclusively utilizes petrophysical data from wells within a single oil field to train and test the chosen ML/DL models. This limited scope decreases the likelihood of the best-performing model exhibiting strong performance when applied to petrophysical data from other reservoirs. In future research, incorporating geological datasets alongside petrophysical data will be explored to assess the influence of geological parameters on pore pressure. Additionally, combining petrophysical datasets from wells across different oil fields and regions will be considered to enhance the training of ML and DL models for pore pressure prediction. This approach will also further validate the generalizability of the best-performing ML or DL models.