Utilization of adaptive neuro-fuzzy interference system and functional network in prediction of total organic carbon content

This paper presents the application of two artificial intelligence (AI) approaches in the prediction of total organic carbon content (TOC) in Devonian Duvernay shale. To develop and test the models, around 1250 data points from three wells were used. Each point comprises TOC value with corresponding spectral and conventional well logs. The tested AI techniques are adaptive neuro-fuzzy interference system (ANFIS) and functional network (FN) which their predictions are compared to existing empirical correlations. Out of these two methods, ANFIS yielded the best outcomes with 0.98, 0.90, and 0.95 correlation coefficients (R) in training, testing, and validation respectively, and the average errors ranged between 7 and 18%. In contrast, the empirical correlations resulted in R values less than 0.85 and average errors greater than 20%. Out of eight inputs, gamma ray was found to have the most significant impact on TOC prediction. In comparison to the experimental procedures, AI-based models produces continuous TOC profiles with good prediction accuracy. The intelligent models are developed from preexisting data which saves time and costs. In contrast to existing empirical correlation, the AI-based models yielded more accurate TOC predictions. Out of the two AI methods used in this article, ANFIS generated the best estimations in all datasets that have been tested. The reported outcomes show the reliability of the presented models to determine TOC for Devonian shale. In contrast to existing empirical correlation, the AI-based models yielded more accurate TOC predictions. Out of the two AI methods used in this article, ANFIS generated the best estimations in all datasets that have been tested. The reported outcomes show the reliability of the presented models to determine TOC for Devonian shale.


Introduction
Oil or natural gas reserves are constantly depleting as a result of continued oil and gas exploitation, and the existing reservoir production levels are substantially decreasing [1][2][3][4][5]. Therefore, Source rock and unconventional reservoirs have increasingly piqued interest [6][7][8][9]. Comparing to conventional reserves, unconventional reservoirs are tighter, less permeable and more complex, that making their exploration more difficult and costly [10]. Significant unconventional resource discoveries have been reported around the world in recent years, particularly in the Middle East, North and South America, and North Africa, contributing to the global oil reserves [11,12].
Unlike the conventional reservoirs, unconventional resources are in-situ storing and generating; therefore, it is essential to quantify their potential for hydrocarbon generation. Moreover, unconventional resource characterization, development, and production are complex and expensive processes, all of which indicate the necessity of assessing their potentials precisely and cost-effectively [7,11].
Total organic carbon (TOC) is being utilized to quantify the potentials of hydrocarbon generation, and therefore it reflects the quality of the unconventional reservoirs [13][14][15][16]. Generally, TOC is determined experimentally by the rock pyrolysis test [17,18], and the number of tests conducted to quantify TOC is limited, because of the high experimental cost. Consequently, obtaining a comprehensive TOC assessment for the formation(s) of concern is quite challenging, which has a significant impact on reservoir evaluation [19].
Several scholars established empirical TOC correlations (summarized in Table 1) based on well logs and where the densities are in g/cm 3 Used data from Devonian shale and predicts TOC in volume percentage. The data were taken from seven wells located in Virginia, West Virginia, Kentucky and Ohio. The model needs only the organic matter free rock density (ρ B) and the bulk density of the formation (ρ) [26] TOC(wt.
Utilized data from 46 wells in Western Appalachian basin in the United States to present a modified model that determines the TOC in weight percentage. RO is ratio between the organic matter and organic carbon. ρ o is the density of the organic matter and ρ mi is the average bulk density [27] Δ log R = log 10  [20][21][22][23][24]. Out of the models presented, ΔlogR model is very common, and many authors presented modifications to enhance its TOC predictability [15,30,31]. Charsky and Herron [32] used data from various formations and wells to evaluate the reliability of Schmoker and ∆logR models and reported significate variation from the actual TOC values.
Artificial intelligence (AI) systems are capable of producing highly accurate models and they have been applied in various sectors such as healthcare [33], mining [34], construction [35] and energy [36]. The accuracy of the empirical correlations presented in Table 1 when applied to different datasets is a major concern, that's why AI techniques were used in numerous studies to predict TOC. Table 2 highlights the various researches that applied different AI tools for TOC estimation from well-logs. These well-logs include bulk density (RHOB), formation resistivity (FR), neutron porosity (CNP), gamma-ray (GR), Spontaneous potential (SP), sonic transit time (Δt), and spectrum logs of potassium (K), thorium (Th) and uranium (Ur). Different AI techniques have been utilized, namely artificial neurons network (ANN), fuzzy logic (FL), adaptive neuro-fuzzy interference system (ANFIS), functional network (FN), support vector machine (SVM) and Gaussian Process Regression (GPR).
TOC is a significant parameter for evaluating unconventional reservoirs. To estimate TOC, experimental investigation can be employed, but it is time-consuming, costly, and does not provide consistent information of TOC against depth. Alternatively, TOC can be determined using empirical models, however, the accuracy and the generalization associated with these correlations are major concerns. The objective of this paper is to test the effectiveness of two AI approaches in estimating the TOC in Devonian shale formations from logging data. These well-logs consist of bulk density, resistivity, sonic transient time, bulk gamma-ray, spectral GR logs of Th, Ur, and K and neutron log porosity.
The following section presents the methodology used to develop the AI models, including the description of the datasets, data preprocessing, description of the utilized AI methods, tests procedure and the different stages in the model development. This section is followed by the results and discussion, which reports the outcomes of different methods on different datasets and presents a comparison with existing correlations and shows the results of sensitivity analysis of different sets of inputs.

Methodology
In this study, two AI techniques, the adaptive neurofuzzy interference system (ANFIS) and the functional network (FN) were used to estimate the TOC from eight well-logs information.

Data description
Three different wells' TOC experimental data, as well as their corresponding well logs, were obtained. The AI models were trained, tested, and validated using 891, 291, and 82 data points from Well-I, Well-II, and Well-III, in order. All of these wells are in source rock rich in organic liquids known as Devonian Duvernay shale. This basin is located in Canada, Alberta (Fig. 1), and has oil reserves above 60 billion barrels and gas reserves above 400 trillion cubic feet [51,52]. Table 3 shows the statistical properties of the Well-I dataset.

Well logs
In well's logging, the in-situ properties of rocks around the wellbore are indirectly estimated from electric, acoustic and nuclear indicators. The interpretations of these indicators reflect the existence of hydrocarbon, petrophysical properties, and the lithology of the formation [54,55]. In this study the following well logs records were used: Formation resistivity (FR) is a measure of electrical resistivity in three increasing depths from the well which is mainly interpreted to know the fluids' saturations and hence the existence of hydrocarbon [56,57].
Sonic log is a measure of the time required for a sound wave to travel for a predetermined distance, which depends on the matrix elasticity and porosity [58], therefore, it is used in the identification of lithology, fractures and porosity.
Density log record the bulk density around the well, this density measure covers the matrix and the pores filled with fluid, which can be used to quantify the porosity fraction.
Neutron log is a log rely on a neutron source to measure the hydrogen index and consequently the porosity of the formation.
Gamma-ray log measures the natural gamma radiations and thus is used to distinguish shales from other sedimentary rocks.
Spectral gamma-ray log is a sophisticated measure for gamma-ray that uses the energy of gamma rays and identifies the elements that emitted them.

Samples testing
To determine the TOC of drilling cuttings from several wells, Rock-Eval 6 was used. Experimental procedures -followed what was presented by Chen et al. [59]-are shown in Fig. 2.

Data preprocessing
In the first step, outliers, unrealistic and incomplete data points were removed from the datasets that were used  to build the models. Using a built-in function in Matlab, any data points with a member that has a value that is far from the average with at least triple the standard deviation were designated as outliers. Figure 3 shows the criteria for detecting outliers.

AI methods
In this work, two AI approaches were used, Adaptive neuro-fuzzy inference system (ANFIS) and functional network (FN). Functional Network was introduced in the 1990s as an alternative to ANN [60,61]. FN uses both domain information and data knowledge, it uses adaptable generalized functional models that change with the learning process [62]. The Functional network contains different elements, such as an input layer, an output layer and a set of intermediate layers, layers of neurons and directed links. Castillo et al. (2000) [63] summarized the difference between ANN and FN as following: in FN, the neuronal functions are multi-argument and can be arbitrary. Figure 4 shows the difference between ANN and FN structures. Several successful applications of FN related to the oil industry were reported in the literature [64][65][66]. There are various feature selection techniques associated with the functional network such as [67,68]: Forward selection starts with minimum variables and adding features.
Backward elimination starts with all features and then reduces them.
Exhaustive search examine every point which significantly increases the computational time.
Adaptive neuro-fuzzy inference system (ANFIS) was developed in the 1990s and integrate the principles of neural networks and fuzzy logic (FL) [69,70]. In this method, ANN is used to set the fuzzy rules in FL [71]. This integration of the two methods provides an improved performance [72]. ANFIS has various reported applications  in the oil industry [64,73]. ANFIS structure combines the fuzzy inference system and a neural network as presented in Fig. 5.

Models' development
Well-I's dataset which contains 891 data points with wide ranges of values as displayed in Fig. 6, was used in models training and optimization. The impacts of various parameters inside the algorithms were examined to optimize the models.
To determine the best models, different runs were performed in each approach. This was accomplished by executing the AI runs inside multiple for-loops in MATLAB for each machine learning approach. In FN, five methods were used: forward-backward (FB), backward-forward (BF), backward elimination (BE), exhaustive search (ES), and forward selection (FS). Three types were used with each of these methods, one linear and two non-linear. While in ANFIS, epochs size and cluster radiuses were tested.
The correlation coefficient (R) and the average absolute percentage error (AAPE), were utilized as evaluation criteria for the developed models using Eqs. (1) and (2) respectively: where N is the size of dataset, X given and X Predicted are the measured and the AI-predicted TOC values respectively.
N With the datasets from Well-II (291 data points) and Well-III (82 data points), the generalization of the produced models was internally and externally tested. Similar to Well-I, these two wells are located in the same field. The Schmoker, Zhao et al., and logR models were used to compare the performance of AI-based models.

Results and discussion
Eight conventional and spectral well logs data were used to train AI models for TOC predictions. The training dataset contained 891 Well-I data points, while the testing dataset contained 291 Well-III data points. The outcomes of each technique are presented in this section.
Different methods of FN have been applied, and the best model was obtained when the Forward-backward method and non-linear type were used. The model yielded R values of 0.902 and 0.879 in training and testing respectively, with AAPE values ranging between 18.9% and 24.4%, as shown in Fig. 7. In Fig. 7, it is obvious that many points were far from the 45° line.
Using ANFIS, various epochs size and cluster radiuses were tested. The estimation yielded from this method was significantly better than FN. The correlation coefficients ranged between 0.899 and 0.983, while the AAPE values were 7.3 and 17.9 in training and testing respectively as shown in Fig. 8. This model has been achieved with 0.25 cluster radios and 100 iterations.
Several runs have been made in each method to achieve the reported results, each run test different parameters/methods inside the algorithms. Multiple for loops in Matlab have been used to test a wide range of possible combinations of algorithms' parameters while reporting the R and AAPE. Figures 9 and 10 present the results of different iterations for the ANFIS and FN respectively. They also indicate the chosen iterations as the best results achieved based on the correlation coefficient and AAPE. The optimized parameters for each method are reported in Table 4.
The two techniques produced a relatively good level of accuracy in predicting Devonian shale's TOC during training and testing, as indicated in the previous results. Dataset from Well-III has been hidden away from the AI tools throughout models' development stages as an additional check to guarantee that the new models

Sensitivity of the inputs
To examine the significance of each input parameter in TOC prediction, various sets of inputs were assessed. Seven sets were tested, the simplest one contains only the conventional well logs except of the GR, and the most comprehensive consist of all available well logs, as shown in Table 5.
The outcomes of different input sets are reported in Table 5. The best performance was achieved in Set 1 where all eight parameters were included, however, it has a significantly higher computational cost. Set 6 and Set 4 performances were followed, those sets excluded the sonic transient time and porosity respectively. This confirms the low correlation coefficient of  these two parameters with TOC as shown in Table 3. The least favorable case was Set 3 that contains only four variables. According to the results shown in Table 6, The GR and spectral GR have a great impact on the prediction of TOC, while the porosity and sonic transient time showed the least effect. It's noticeable that for all cases, ANFIS outperformed FN, however, the former has slightly higher computational time.

Models' limitations
The data in this study were acquired from wells in the same area, and the outcomes were restricted to Devonian shale. Therefore, the models' performance cannot be assured if they were utilized in another type of formation or with data ranges that are different from those used in this research.

Conclusions
Using artificial intelligence techniques and over 1250 data points, two models for TOC prediction from eight well logging information were established in this study. Two artificial intelligence techniques were employed, adaptive neuro-fuzzy interference system (ANFIS) and Functional network (FN). The following is a summary of the outcomes presented in this paper: • Out of the two AI algorithms utilized in this study, ANFIS yielded the best match in training and testing datasets with correlation coefficients of 0.98 and 0.90 and AAPE values of 7% and 18% for training and testing respectively. • Data from a different well was hidden entirely from the AI tools and used to verify the built models, the ANFIS model successfully predicted TOC with a 0.95 correlation coefficient and 10% AAPE. • The presented models were contrasted to three various empirical correlations. The empirical correlations yielded less favorable results with correlation coefficients under 0.85 and AAPE above 20%.
The presented models accurately predicted TOC from well-logs such as CNP, RHOB, GR, t, FR, K, Th, and Ur logs, allowing for continuous TOC profiles with depth without the requirement for core analysis or extra well interventions. To achieve reliable results, we propose using developed models with input parameters within Excluding resistivity GR, CNP, RHOB, Δt, K, Ur, Th