Introduction

The effectiveness of electrical resistivity logs in determining reservoir properties is primarily influenced by four factors: the accuracy of true resistivity measurement, the knowledge of the relationship between resistivity and formation characteristics (Shankar and Riedel 2011), the ability to estimate oil–water contacts, and the level of geologic information on facies variations within strata and their effect on the electrical characteristics of the reservoir (Archie 1942).

The most important application of resistivity logs is to identify hydrocarbon-bearing versus water-bearing zones (Shankar and Riedel 2011). Deciphering logs from oil wells is challenging because the geometry around logging tools is no longer axially symmetric. 3D geometry must be taken into consideration when interpreting the tool's response. Data obtained in horizontal wells may differ from vertical wells, especially in directional measurements such as electric resistivity and fluid permeability (Hagiwara 2014).

Electrical instruments, particularly dual laterolog devices (DLL), are a crucial part of logging tools used to measure resistivity in hydrocarbon and water prospecting boreholes. In current and archived well-logging datasets, curves from the DLL are commonly used to determine horizontal conductivity (Hc) and vertical conductivity (Vc), which impact the coaxial induction tool response in horizontal wells (Hagiwara 2006). Triaxial well-log instruments can also measure both horizontal resistivity (Rh) and vertical resistivity (Rv) for more accurate pay estimation (Wang et al. 2006).

Petroleum companies around the world continue to develop new well-logging methods that are better than previous ones (Smits et al. 1998). At the same time, computer scientists have presented the results of modeling the responses of various resistivity logs using a combination of physical and mathematical procedures, such as finite element modeling (FEM), to incorporate previous log corrections (Drahos-Attila and Galsa et al. 2015; Szijártó et al. 2017; Ni et al. 2017). Determining water/hydrocarbon saturation from the apparent/true resistivity recorded by electrical logs remains challenging in reservoir formations created by boreholes filled with highly conductive mud and where a highly resistive bed is located close to the reservoir.

Sufficient and complete information from recorded data may not be available due to poor hole conditions, equipment failure, or improper storage leading to data loss. However, with technological advancements, smart systems can now be used as powerful tools for modeling and forecasting parameters necessary in the petroleum industry. For instance, researchers have used intelligent systems to predict various reservoir properties from well-log responses (Rezaee et al. 2008).

Machine learning applications have been recently implemented in various fields in the oil and gas industry to enhance their ability to process vast amounts of petroleum data. These applications have been successful in predicting rock and fluid parameters such as density (Ahmed et al. 2021), permeability (Al Khalifah et al. 2020), capillary pressure, and relative permeability (Gharbi and Mahmoud 2020), rock strength (Gamal et al. 2021; Gowida et al. 2021), elastic and acoustic properties prediction (Gamal et al. 2022; Siddig et al. 2021), fluid rheology (Alsabaa et al. 2020; Alsabaa and Elkatatny et al. 2021), and optimizing the performance of drilling operations (Ahmed et al. 2021; Al Gharbi et al. 2020; Al-AbdulJabbar et al. 2020; Al-Abduljabbar et al. 2020; Alsaihati et al. 2022; Hassan et al. 2019). Artificial neural networks (ANNs), adaptive neuro-fuzzy inference systems (ANFIS), fuzzy logic (FL), functional networks (FN), and support vector machines (SVMs) are common techniques that have demonstrated significant advancements in petroleum data processing (Al-Sabaa 2021).

Sbiga and Mousa (2015) conducted a study on vertical electrical resistivity sounding (VES) data from the section of Barmer district, Rajasthan, which was inverted using an ANN approach. The ANN adaptive backpropagation (gradient descent) method was applied to invert/predict data obtained from the studied area. They found that the ANN method offers an excellent complementary tool and basis for the direct detection of layered resistivity structure with low root mean square (RMS) error using geophysical data.

Singh et al. (2005) applied ANN with various combinations of wireline logs to predict true resistivity and resistivity index from two wells. They collect extensive core to predict special core analysis in laboratory. The approach started by training ANN by using well A. Then for testing data, they used another oil well from a different oil field, as well as an adjacent well A-01 in the same oil field. Predictors trained on data with 0.5 ft depth spacing performed better than predictors trained on data at 1.0 ft depth spacing in predicting resistivity characteristics in a nearby well and at a different oil field, another test well.

Vereshagin et al. (2019) studied the prediction of vertical resistivity using machine learning to measure both Rh and Rv. Several ML algorithms were tested, and the best performance was obtained with a gradient boosting machine based on ensembles of decision trees using relevant scikit-learn libraries. The inputs included gamma-ray (GR), deep resistivity (RDEP), sonics (AC and ACS), neutron (NEU), density (DEN), and photoelectric effect (PE) along with true vertical depth, while the output was vertical resistivity. Four wells in the Norwegian Sea were used, and they concluded that the ML predictor for Rv is reliable enough.

Gharbi and Mahmoud (2020) applied the ANN model to predict formation resistivity, capillary pressure, and relative permeability. The primary goal of these models is to forecast capillary pressure from resistivity and then predict relative permeability from capillary pressure. The study suggests that resistivity from well logs can be used to determine both capillary pressure and relative permeability using artificial intelligence.

A better reservoir delineation uses machine learning and ultra-deep-reading resistivity. The structural consistency of the reservoirs has improved significantly as a result of the ML algorithm, and a hybrid inversion using ML algorithms demonstrates enhanced imaging of geological features based on their formation properties. A hybrid inversion using ML algorithms demonstrates enhanced imaging of geological features based on their formation properties (Wu et al. 2020).

Ważny et al., (2021) examined the effectiveness of using ANNs in processing magnetotelluric data in the Lublin Basin and predicting missing LLD logging in boreholes with similar strata sequences. The authors used a set of multilayer perceptron algorithms trained on five different chronostratigraphic intervals from well data, to achieve this goal. The results showed that the multilayer perceptron was a highly reliable and time-efficient tool for prediction in a one-dimensional geophysical environment. Hence, the authors conclude that ANNs can be used to supplement datasets and evaluate geophysical and geological data using synthetic LLD logs.

Formation resistivity is vital for estimating water saturation and the hydrocarbon in place. In addition to the high cost of running resistivity logs, complete recorded logging data may not be available due to poor hole conditions, equipment failure, or improper storage leading to data loss. Hence, in our study, models were developed to calculate the formation resistivity of well using logging data. Two different machine learning models were run, which were created using real-field data from an oil well. These newly created models are unique and have acceptable accuracy. These models can be used to fill the gaps and missing data in well logging and to improve the accuracy and completeness of the data.

Methodology

A dataset of 3529 logging data points was collected from horizontal oil carbonate wells and preprocessed to remove duplicates, improve the correlation coefficient, and transform and remove outliers. This resulted in 3438 clean points. The data were divided into three sections for use in two machine learning models: 65% for training and model education, 20% for testing, and 15% for validation. The inputs for the models were gamma-ray (API), delta time compressional logs (ft, min), sonic shear log (ms/ft), neutron porosity (pu), and bulk density (g/cm3), and the output was resistivity (Ω.m). The statistical analysis in Table 1 showed a good range of values for both the input and output parameters.

Table 1 Collected data with statistical analysis

In the study, two machine learning algorithms were implemented, RF and DT, to predict the resistivity of the oil well using logging data. The methodology for building the ML regression models is depicted in Fig. 1.

Fig. 1
figure 1

General flowchart for building DT and RF models

Random forest model

The RF algorithm (Breiman et al. 2001), which is effective in solving both classification and regression problems, is applied in this study to predict the resistivity of the oil well's carbonate layers as a function of different logging parameters. The final prediction result is obtained by taking the mean of the outputs from several independent decision trees (DTs) in this approach, which is based on the decision tree regression. The average of the outputs from each tree is used to determine the prediction result. The number of decision trees (n) and the number of features considered at each split are two crucial factors to consider when evaluating the performance of the algorithm. To evaluate the performance of the created RF model, four statistical measures are introduced: coefficient of determination (R2), mean absolute percentage error (MAPE), mean squared error (MSE), and RMSE.

Decision tree model

In decision trees, a large number of trees are created by avoiding correlations between them. To avoid overfitting, the predictive model is selected as the average of the decision trees. Boosted decision trees minimize overfitting by restricting the number of subdivisions and data points in each area. The boosted decision tree regression technique creates a series of trees, each of which corrects itself by learning from the error of the previous tree. Decision trees are a type of non-parametric machine learning model. The data structure of a binary tree is traversed in decision trees until a decision (leaf node) is reached. Additionally, decision trees can depict nonlinear decision boundaries. An ensemble of decision trees is used in decision forest regression prediction models. Each tree in the predictive model generates a Gaussian distribution prediction. After the training stage, the Gaussian distributions generated by all trees in the predictive models are combined into a single distribution (Almashan et al. 2019).

In this work, a decision tree regression (DTR) prediction model is developed, verified, and tested to estimate oil well resistivity values. Each tree in the DTR model is dependent on the previous tree. The predictive model, which is based on the boosting method, learns by fitting the residual of previous trees. The MART gradient boosting algorithm is the boosting technique used in training the suggested model. In this approach, a loss function is used to measure the error that is corrected in the next step during the model's training phase. Based on an arbitrary differentiable loss function, the optimal tree is selected from the created sequence of trees (Almashan et al. 2019).

Results and discussion

To assess the significance of the input parameters on the target, several tests are conducted for each model to find the highest model performance with the lowest number of input features. As a result, the RF model was constructed by disregarding some of the inputs, and the model performance was evaluated as shown in Table 2. As seen in the table, the best result was obtained even when ignoring DTC, whose validation R-value was 0.89, which was not as good compared to not ignoring any of the inputs, yielding a validation R of almost 0.94. The best parameters using grid search are recorded in Table 2.

Table 2 Input parameters impact on the RF model performance

Different hyperparameters were tested in both RF and DT models to improve the models' performance. The optimum parameters used to construct the models and provide the best solution are listed in Table 3 for both algorithms.

Table 3 RF and DT model’s best hyperparameters

Two algorithms produced similar results and met the desired outcome. Figure 2 displays the graph of cross plots that shows the results of each model in training and testing, comparing the actual and predicted LLD values. Both algorithms achieved great accuracy in training and good accuracy in testing. Figure 3 confirms the correlation between the actual and predicted values, which demonstrates that the machine learning model can accurately predict parameters in the oil and gas field. Figure 4 compares the models in the validation stage, which is crucial to evaluate the tool's validity. It shows that the RF model is more accurate than the DT model, indicating its suitability for the task. Several studies have shown that RF can be used to analyze feature importance/sensitivity in the oil and gas industry (Aulia et al. 2014), which will be valuable in forecasting in the oil and gas sector by revealing the petrophysical properties of the layers, such as resistivity, conductivity, porosity, permeability, and others.

Fig. 2
figure 2

Cross graphs of actual versus estimated oil well resistivity for the RF and DT models' training and testing datasets

Fig. 3
figure 3

Actual versus predicted oil well resistivity for the testing stage from a RF and b DT

Fig. 4
figure 4

Actual versus predicted oil well resistivity for the validation stage from a RF and b DT

As for the overall analysis of both models, there is no doubt that they are almost the same in terms of comparison. This is because their creation is fast and the performance of their predictions is relatively fast. Additionally, they are a set of trees, which can improve the predictive ability by reducing the prediction variance. There is no need to rescale, convert, or adjust the input data in terms of preprocessing for RF and DT. They can handle outliers and missing numbers automatically. Finally, they are highly interpretable, unlike neural networks. The relevance of the predictor, sample closeness, and tree structure can all provide valuable information (Aulia et al. 2014; Breiman et al. 2001).

The analysis of R, R2, AAPE, MSE, and RMSE is displayed in Table 4. All the models in the various stages show a higher correlation coefficient and coefficient of determination, which indicates that the models match the linearity of our data better. Error analysis is a common method for predicting errors and works best when the data has no outliers (and no zeros). As a result, this method was used in this study. The results showed that the RF model had a slightly larger percentage of errors in both training and testing performance compared to the DT model. However, the validation error performance indicates that the RF model has a lower error than the DT model.

Table 4 Summary of performance indicators of RF and DT models for three stages (training, testing, and validation)

The current study presents two ML models to predict formation resistivity from other well logging. The techniques can be applied to fill gaps and missing data in well logging. These algorithms can be trained on the available data from the same well to make predictions about the missing data points.

Conclusions

The estimation of water saturation is highly reliant on formation resistivity. However, obtaining complete logging data for resistivity may be challenging and expensive due to poor hole conditions, equipment failure, or data loss. The study developed novel models to predict formation resistivity based on conventional well logs. The following are the main findings of the study:

  1. 1.

    The importance of input parameters varied in predicting formation resistivity, with gamma-ray (GR), exhibiting a stronger correlation than delta time compressional logs (DTC).

  2. 2.

    The random forest (RF) and decision tree (DT) models accurately predicted formation resistivity in training data with a coefficient of determination (R) of 0.98 and 0.99 and an AAPE of 0.055 and 0.0045, respectively.

  3. 3.

    The validation data revealed that the RF and DT models had an R-value of 0.94 and 0.89 and an AAPE of 0.072 and 0.034, respectively.

These models offer a cost-effective solution for calculating formation resistivity in oil wells and can serve as an alternative to lost logging data.