Application of artificial neural network models and random forest algorithm for estimation of fracture intensity from petrophysical data

Natural fractures play an essential role in the characterization and modeling of hydrocarbon reservoirs. Modeling fractured reservoirs requires an understanding of fracture characteristics. Fractured zones can be detected by using seismic data, petrophysical logs, well tests, drilling mud loss history and core description. In this study, the feed-forward neural networks (FFNN), cascade feed forward neural networks (CFFN) and random forests (RF) were used to determine fracture density from petrophysical logs. The model performance was assessed using statistical measures including the root mean squared error (RMSE), coefficient of determination (R2), mean absolute error (MAE), Kling Gupta efficiency (KGE) and Willmott’s index (WI). Conventional good logs and full-bore micro-resistivity imaging data were available from three drilled wells of the Mozduran reservoir, Khangiran gas field. According to the findings of this research, the FFNN model showed a higher KGE and WI, and a higher correlation coefficient (R2) compared to the CFNN model. The CFNN model outperformed the FFNN model with lower neurons. The models' performance was also improved by increasing the number of neurons in the hidden layers from 8 to 35. The findings of this study demonstrate that the measured and FFNN calculated fracture intensity is in excellent agreement with image log results showing a correlation coefficient of 92%. The RF algorithm showed higher stability and robustness in predicting fracture intensity with a correlation coefficient of 93%. The results of this study can successfully be used as an aid in a more successful reservoir dynamic modeling and production data analysis.


Introduction
Natural fractures are the most significant factors determining the hydraulic behavior of oil and gas reservoirs. Proper knowledge of fractures is essential in oil production and development plans. In general, fractures play a significant part in the production of fractured reservoirs (Kadkhodaie et al. 2021;Derafshi et al. 2022;Pejic et al. 2022;Hosseinzadeh et al. 2023). Natural fractures in reservoirs range from large to small-scale fractures. Large-scale fractures are like significant faults seen at seismic sections. Different methods have been proposed to identify small-scale fractures around the well (e.g. Kosari et al. 2015Kosari et al. , 2017Pejic et al. 2022;Mazdarani et al. 2023). One of the methods is to use petrophysical logs such as neutron, density and sonic. However, they are not accurate enough due to their low resolution. This method is cost-effective and is currently in use. Image logs of the formation; provide essential information about fractures, such as their dip and azimuth, fracture spacing, fracture density and aperture. In addition, by interpreting them, other geological features such as stratification, stylolite, faults and anhydrite nodes can be identified. Image logs can identify fractures in the excellent way with a high resolution. Using image logs in a well is economically expensive and is acquired in a few wells of a hydrocarbon field. In this study, an artificial intelligence approach was used to derive image log-derived fracture parameters from petrophysical logs quickly with reliable accuracy. Numerous studies on the characterization of naturally fractured reservoirs have been conducted recently. Tokhmechi et al. (2010) utilized the power of petrophysical logs and a novel technique for estimating fracture density in fractured zones. In their study, the energy of the petrophysical logs was calculated in the fractured zones and linear and nonlinear regressions were established between them. Their investigation demonstrated a significant relationship between fracture density and the energy of calipers, sonic (DT), density (RHOB) and lithology (PEF) logs in each well. Using artificial neural networks and conventional well logs calibrated to core data, Zazoun (2013) implemented a model that can forecast fracture density (ANNs). Ja'fari et al. (2011) presented a model that uses an adaptive neuro-fuzzy inference system to estimate fracture density using conventional good logs. Their results demonstrated that the observed and neuro-fuzzy calculated fracture density may be reconciled well (correlation coefficient of 98%). Aghli et al. (2019) proposed employing the preprocessed petrophysical logs as a trustworthy and affordable tool to assess the fracture parameters in the heterogeneous carbonate reservoir based on Adaptive Neuro-Fuzzy Inference System (ANFIS) technique. The results show that conventional well logs might be improved as instrumental tools for evaluating fractures if they were statistically preprocessed and coupled with image logs or core data. They found a high correlation between petrophysical logs and images or cores results (R 2 = 0.8). Zerrouki et al. (2014) used four conventional log data consisting of deep resistivity, density, neutron porosity and gamma-ray to predict fracture porosity using fuzzy ranking and artificial neural network (ANN).
In this paper, a new method is presented for estimating fracture intensity using a combination of image logs, petrophysical logs, and artificial intelligence networks.

Geological setting
The Khangiran gas field is an NW-SE structure located in the northeastern part of Khorasan Razavi province, near the political border with Turkmenistan, around 180 km northeast of Mashhad city in Iran (Fig. 1). This field is situated in the Kopet Dagh Basin, and its neighboring field in Iran is Gonbadli Field and Dauletabad Field in Turkmenistan. The Kopet-Dagh Basin is located southern margin of the Amu-Darya basin which is a highly productive petroleum province in Turkmenistan and Uzbekistan, extending southwestward into Iran and southeastward into Afghanistan. The Hercynian accreted terrane made up of deformed and commonly metamorphosed Paleozoic rocks forms the basement of the Amu-Darya Basin. The Upper Jurassic carbonates (Mozduran Formation) and Lower Cretaceous sandstones (Shurijeh Formation) are the two reservoir sequences hosting giant Fig. 1 Location map of Khangiran gas field, northeastern, Iran gas reserves in Khangiran Field. The Relationship between sedimentary basins (e.g., Khavari et al. 2009, Arian and Aran 2014, Ehsani and Arian 2015, Aram and Arian 2016 and basement faulting (e.g., Arian (2012), Nouri et al. (2013a, b), Nouri and Arian (2017), Nabilou et al. (2018) and Mansouri et al. (2017Mansouri et al. ( , 2018 indicates the role of faults in controlling the sedimentary basins of NE Iran. In terms of lithofacies, especially the role of tectonics in the sedimentary basins, Arian (2015), Razaghian et al. (2018) and Taesiri et al. (2020) divided sedimentary basins of Iran into several large scale tectonic-stratigraphic zones. In the study area, a northwest-southeast rift has been formed at early Jurassic. The rift developed as a back-arc basin for the Neotethyan Ocean and the Mozduran and Shurijeh Formations were deposited, but the deformation of the basin started from Eocene by inversion tectonics. The stratigraphic chart of Kopet-Dagh Basin is shown in Fig. 2. To data a total of 77 wells have been drilled in the Khangiran Field (22 Fig. 2 Stratigraphic column of Kopet-Dagh Basin, modified from Robert et al. (2014) 1 3 wells in Shurijeh Formation, 51 wells in Mozduran Formation, three wells in the Kashafrud formation, and one well is under drilling during the preparing of this manuscript. It should be noted that the number of wells, which were completed in the Shurijeh, Mozduran and Shurijeh-Mozduran formations are 31, 40, and 2, respectively. Most of the carbonate rocks of the Mozduran formation were deposited in a carbonate platform adjacent to a deeper marine environment. The slope and basial environment were separated from an extensive shelf lagoon and tidal flat by platform margin ooid/ bioclast grain stones forming a rimmed shelf platform. The vertical sequence of the Mozduran formation indicates five significant episodes of deepening and shallowing (Callovian to early Kimmeridgian), with numerous shallowing-upward Para sequences. Both tectonic and Autocycles mechanisms are suggested as the main cause of the generation of these cycles.

Data collection
The dataset used in this study was taken from an oil field in northeastern Iran. To date, a total of 77 wells had been drilled in the field, and of which has both image logs and petrophysical logs data. The fracture densities were derived from image logs interpretation, and after a resampling process, the correlation between well logs and fracture intensity was investigated. The relationship between the fracture intensity and well log data, including Depth (Depth), sonic log (DT), gamma ray log (GR), volume of dolomite (VOL_DOLOM), volume of calcite (VOL_CALCIT), porosity (PHIT), effective porosity (PHIE), neutron porosity log (NPHI) and bulk density log (RHOB) is summarized in Table 1. A total of 70% of the data were used as the training dataset, and the remaining 30% were utilized as the testing and validation dataset. Identical inputs were exposed to the models in both training and testing phases to compare the accuracy of the models.
It should be noted that the training dataset were not used to test the model's performance. To estimate the fracture zones and their densities, the current study used FFNN and CFFNN methodologies. The network architectures of FFNN and CFFNN are comparable. An input layer, one or more hidden layers and an output layer are all features of the structures. Signal transmission between neurons differs between the two systems; FFNN only transmits neurons from the input layer to the output layer, but CFFNN is not confined to one-way transfers, and each layer is connected with both preceding and succeeding levels (Warsito et al. 2018). Cascade Forward Neural Network (CFNN) is a kind of artificial neural network (ANNs) that is extensively used for

Artificial neural networks (ANNs)
In recent decades, artificial neural networks (ANNs) have become a prominent AI method. They have been used in various fields, including geosciences and engineering. An artificial neural network (ANN) is a computer tool that can link factors influencing a complicated event. It is inspired by the human brain and comprises many essential processing components (Raikar 2004). The ANN model is trained using a collection of input data, then it can make predictions. In general, an ANN operation starts with data processing in neurons (nodes) and signals are exchanged between nodes through connections. Each connection has a weight assigned to it depending on the relevance of the nodes it connects.
To identify the output signal from the input signal, each node employs a nonlinear activation function (Raikar 2004). Most multilayer ANNs have three layers: an input layer, one or more hidden layers and an output layer, with each layer containing many neurons. ANN technologies have been presented in a variety of ways, and they are classified in a variety of ways. Feedforward (perceptron) networks, competitive networks and recurrent networks are the three main types of ANNs employed for prediction problems in earth sciences (Vaghefi et al. 2020).

Model development
FFNN and CFFNN As previously mentioned, the FFNN and CFFNN algorithms use many neurons as input and output variables in the input and output layers. A network for the fracture properties was necessary for the current investigation. As illustrated in Figs. 3 and 4, a network with nine neurons in the input layer and one in the output layer was utilized to estimate fracture intensity based on the gathered data.
The number of hidden layers, the number of neurons in each hidden layer, the activation function and the training process all affect the models accuracy (Vaghefi et al. 2020). There is no reliable method for determining these values. As a result, they are estimated by trial and error (Mahmoodi et al. 2018). Earlier research has shown that the Levenberg-Marquardt algorithm is one of the most efficient and acceptable training algorithms compared to the other standard training methods (Huang et al. 2006). As a result, the Levenberg-Marquardt training method was used in this study for fracture intensity estimation. In addition, the hidden layer used a sigmoid transfer function, whereas the output layer used a linear transfer function. According to Bishop (1995), no more than two hidden layers are usually required. Using trial and error, the current work increased the number of hidden layers from 1 to 5, with 1-40 neurons in each layer. Two hidden layers were found to have the maximum accuracy. It was also discovered that the model's accuracy changed with each iteration for a certain number of hidden layers and neurons. This has been reported in previous publications (Vaghefi et al. 2020). To get the best accuracy, the model was iterated 50 times for each variation in the number of hidden layers and neurons.

Evaluation of the models
To assess the model's efficacy and accuracy, the correlation coefficient (R), Kling Gupta efficiency (KGE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Willmott's Index (WI) were utilized as follows: where X mi is the observed value, X pi is the predicted value, R is the correlation coefficient of observed and predicted values, X m is the mean observed value, is the standard deviation ratio (SDR) of X mi and X pi , is the mean ratio of X mi and X pi , and N is the number of data points. The optimal model would be determined by its R, KGE, and WI values, as well as its RMSE and MAE values.

Results and discussion
The performance evaluation of the models The advantages of cascade neural networks are well known. First, no structure of the networks is predefined; that is, the network is automatically built up from the training data. Second, the cascade network learns fast because each of its neurons is trained independently of the other. However, a disadvantage is that the cascade networks can be over-fitting in the presence of noisy features. To overcome this problem, we used the random forest algorithm. A random forest is a collection of Decision Trees; each, Tree independently makes a prediction and the values are then averaged (Regression)/Max voted (Classification) to arrive at the final value. The accuracy of Random Forest is generally very high; its efficiency is particularly Notable in large datasets, and provides an estimate of essential variables in classification; Forests Generated can be saved and reused, unlike other models, it does not overfit with more features. Table 2 compares the accuracy of the ANN models in estimating fracture intensity. As can be observed, each model performed well and was accurate in the fracture intensity estimation derived from the image logs data. The FFNN had the lowest estimation error, while the CFFNN had the greatest, according to the indices used in the testing step. During the testing phase, the RMSE for the estimation of the fracture intensity for FFNN-7 was 0.395 1/m. The RMSE of the FFNN-7 model was lower than that of the CFFNN by 43.54, 54.71 and 64.94 percent.
Furthermore, in fracture intensity estimation, FFNN showed a KGE of 0.975 in the testing step, being as high as 2.67%, 13.02%, and 15.89% greater than CFFNN. This suggests that FFNN outperformed CFFNN. Figure 5  represents the scatter plots for the observed versus modeled data for the fracture intensity estimation. As is seen, the models with a correlation coefficient greater than 0.9 could explain the measurements with reasonable accuracy. FFNN shows the highest correlation of R = 0.995 in fracture intensity estimation. A comparison of measured and estimated fracture intensity using FFNN is shown in Fig. 6.

Fracture intensity model based on RF algorithm
A popular tree-based machine-learning methodology is the Random Forest (RF) method (Breiman 2001;Mohana et al. 2021). Random Forest, in addressing complex correlations  The FFNN-predicted intensity of fracture profiles for one of the test wells (FFNN-7). As is seen, the FFNN-predicted profiles are in good agreement with the image log observations Fig. 6 A comparison of measured and estimated fracture intensity by using FFNN between variables, has risen in favor in recent years because of its ability to correctly foresee complicated relationships between variables . The RF algorithm uses a unique sampling method called bootstrap sampling to increase the diversity of sample selection. The two forms of data created in this technique are out-of-bag (OOB) data and in-bag data. OOB data refers to the 1/3 of the original sample removed from the bag, whereas in-bag data relates to the remaining sample (Lei et al. 2018).
This bootstrap dataset is used to generate many decision trees. Numerous decision trees are created from this bootstrap dataset and combined to get much more accurate and stable prediction. RF does not depend on a single decision tree; instead, it takes predictions from the individual trees and predicts the outcome depending on the majority of votes. Cross-plots showing the correlation coefficient between the measured and estimated fracture density using the random forest algorithm in the training and test dataset are shown in Fig. 7. Garson (1991) devised a method (Eq. 5) based on the weight matrix for the estimation of the relative relevance of each input parameter. The relative influence of each input variable (mentioned in Fig. 8) on fracture intensity is graphically shown in Fig. 8.

Sensitivity analysis (SA)
where Ij = relative importance of jth variable. N i = number of input variables. N h = number of hidden neurons. W = connection weight.
(5)  The letters i, h, and o stand for input, hidden, and output layers, respectively, whereas the letters k, m, and n stand for input, hidden, and output neurons. Figure 8 shows how the FFNN and random forest models compare in identifying the essential inputs. The depth profile in the random forest model may be regarded to be the influencing variable introduced into the model by post-processing procedures.

Conclusions
In the current study, attempts were made to formulate petrophysical data into fracture intensity derived from image logs interpretation. Followings are concluded.
• The Feed-Forward Neural Network (FFNN), the Cascade Feed Forward Neural Network (CFFNN) and the Random Forest (RF) methods were used in the current work to forecast the fracture density using petrophysical log data including depth, gamma ray, neutron, density, effective porosity, total porosity, calcite volume and dolomite volume. • The models were developed utilizing the results of image logs interpretation. Evaluation of the models showed that FFNN and RF resulted in satisfactory results. There is a good agreement between the measured and estimated fracture intensity. Post-processing techniques were used to evaluate the significance of input variables. • The results indicated that the FFNN with the lowest error outperformed the CFFNN models in fracture intensity estimation. The comparative study indicates the superior capacity of the random forest to forecast fracture intensity in terms of the correlation coefficient, stability and robustness. • In the absence of expensive image logs such as full-bore formation micro imager (FMI), using the intelligent models can forecast fracture intensity from easily available well logging data. This will enhance the applicability of well logs for extraction of further parameters in addition to their routine job for petrophysical evaluation of hydrocarbon reservoirs. • The well log-derived fracture intensity data can be used to construct continuous fracture network (CFN) and discrete fracture network (DNF) models and study the impact of fracture models on production. Funding There is no funding.
Data availability Data will be available on request from the corresponding author.

Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.