Introduction

The reservoir properties including porosity and volume of shale play important roles in the reservoir production (Cheng and Pan 2020). Engineers design the best plans for the further reservoir development stages, also optimize the hydrocarbon recovery with the help of precise knowledge of these petrophysical parameters (Solanki et al. 2021). The porosity (\(\mathrm{\varnothing }\)) is greatly affected by the amount of clay minerals inside the reservoir known as volume of shale (\({V}_{\mathrm{sh}}\)) (Iqbal and Rezaee 2020). One of the most crucial stages in the characterization of reservoirs is the estimation of the shale volume (Balaky et al. 2023). It can be very challenging to determine the amount of shale in many areas accurately (Hussain et al. 2023). Its overestimation causes the effective water saturation (\({\mathrm{S}}_{{\mathrm{w}}_{\mathrm{e}}}\)) to be estimated very low, and this eventually leads to the wrong assumption of productivity. On the other hand, in the case of underestimation of shale volume, the water saturation will be estimated more than its actual value and this makes a productive zone to be overlooked. Moreover, the underestimation or overestimation of this parameter can cause some miscalculation in the estimation of effective porosity which is used for the determination of net pay (Iltaf et al. 2023).

ANNs can efficiently solve nonlinear problems. The ANN combines connected units including the artificial nodes (neurons), input, output, and processing layers. Each node is able to receive or transmit a pulse from or to the other nodes. A weight is assigned to each neuron and updated during the learning process. Typically, there are one or several hidden (processing) layers of nodes in an ANN model. Each of these layers has input and output values, gives and receives data to or from the next and pervious layers, respectively. Eventually, data are weighted and mixed together to make a new input for the upcoming layer (Gong et al. 2019). The ANN is considered as algorithms with two crucial functions, i.e., classification and regression. The outputs or responses generated by regression are normally continuous values, and the regression application in the oil industry is to estimate the porosity (), permeability (k), volume of shale (\({V}_{\mathrm{sh}}\)), and water saturation (\({S}_{\mathrm{w}}\)) (Gong et al. 2019). The ANNs have played a significant role in the accurate estimation of reservoir parameters including porosity, permeability, water saturation (Okon et al. 2021), and volume of shale (Taheri et al. 2021; Hong and Tien 2022). The implementation of the intelligence models enables the reservoir engineers to tackle the challenging and time-consuming tasks more successfully. The ANNs help to fuse different data and acquire complete and accurate information (Saikia et al. 2020).

The development of shale formations has had a transformative effect, especially in the USA, leading to notable improvements in the industry (Alessa et al. 2022). Over the last decade, shale reservoirs have been the major focus of extensive discussion and research on hydrocarbon exploration and exploitation globally. The development of shale formations has been a turning point, particularly in the USA, resulting in significant improvements. Simultaneously, ML and artificial intelligence (AI) have been instrumental in driving rapid development across all industries by automating routine operations (Syed et al. 2022). Shale gas reservoirs have been explored at various depths, ranging from as shallow as 1000 ft to as deep as 12,000 ft, with a variable range of total organic content (TOC) from 1 to 12%. The quality of the shale formation is characterized based on a variety of factors, including petrophysical properties like TOC, thermal maturity, saturation, and geo-mechanical properties such as the percentage of quartz or carbonate in mineralogy, differential stress, and friability, (Sondergeld et al. 2010). Since the early 1980s, petroleum engineers have been using computer-aided petrophysical and geo-mechanical studies such as log analysis, interpretation, and integration (Doveton 1986). However, ML-based and AI geo-mechanical and petrophysical analyses have become more prominent over the past decade, resulting in faster and more successful development than ever before in the history of the oil industry. As an unconventional reservoir, shale formations’ geo-mechanical properties are affected by diagenetic changes resulting from the depositional environment, temperature, and pressure. These changes cause mineralogical alterations, leading to changes in rock composition that directly impact sediment compaction and lithification. This makes it challenging to predict the geo-mechanical properties of shale (Syed et al. 2022).

In a study focused on the gas permeability of shale, Sakhaee-Pour & Bryant (2012) demonstrated that the narrowness of pore throats in shale is predominantly below 10nm and is situated within organic material where \({\mathrm{CH}}_{4}\) is absorbed. Consequently, the gas permeability of these rock formations is notably influenced by the gas that is absorbed, along with the movement of gas sliding along the pore walls. This effect is particularly pronounced at higher pressures (such as the initial pressures) found in typical shale gas reservoirs. In such conditions, the impact of the absorbed gas layer takes precedence over the influence of gas slipping through the pore spaces. Besides, it was projected that the permeability of the reservoir matrix can experience a substantial increase over the well operation, potentially growing by a factor of 4.5 as production continues and pressure decreases. According to Sakhaee-Pour and Steven (2015), non-random spatial distributions of throat sizes in acyclic void models offer more accurate portrayals of the void space within samples. These models are particularly suited for cases where drainage experiments demonstrate a capillary pressure versus saturation trend that deviates from the plateau-like pattern. Furthermore, they made successful permeability predictions that aligned well with laboratory measurements. The models they developed may find utility in other porous media where drainage data do not display a plateau-like variation.

The examination carried out by Sakhaee-Pour and Li (2016), with potential far-reaching effects on comprehending hydrocarbon movement in shale formations, scrutinized drainage experiments conducted on core samples to elucidate the interconnected pathway topology within the pore space on a core-scale level. Their investigation across various shale varieties revealed that the path traversed within the pore space by the nonwetting phase, measured as the length of the pore space, adheres to a fractal pattern. This is in contrast to the pore volume, which does not inherently exhibit fractal characteristics. While the assessment of matrix permeability through mercury injection capillary pressure measurements is a customary procedure in the petrophysical analysis of rock formations, it remains unachievable for shale formations due to the absence of a practical and reliable model. In 2018, Tran et al. (2018) introduce a straightforward correlation, rooted in the acyclic pore model, to approximate shale permeability. This uncomplicated relation was subjected to testing using seven samples drawn from three distinct formations.

Tran and Sakhaee-Pour (2018a, b) asserted that their research holds significant implications for the characterization of reservoirs using conventional petrophysical measurements. The results of numerical simulations revealed that the gas flow’s effective pore-throat size is influenced by pore pressure. Furthermore, the measured permeability in the presence of liquid surpassed the nominal permeability, commonly known as the Hagen–Poiseuille model, without accounting for slippage effects. Tran and Sakhaee-Pour (2018a, b) utilized the acyclic pore model to incorporate the effective interconnections among shale samples on the core scale. Their research, focused on exploring the core-scale critical properties (\({\mathrm{T}}_{\mathrm{c}}, {\mathrm{P}}_{\mathrm{c}}\)) of shale gas, holds significant potential for advancing a practical reservoir model tailored to shale formations. The findings indicated substantial alterations in displacement-critical properties, while modifications were unnecessary for storage-critical properties.

Yu et al. (2018) established a study about pore size of shale based on acyclic pore model. Their investigation into diverse shale types revealed that the average size of pore bodies typically exceeds 20nm. As a result, there is no necessity to consider pore proximity or confinement. In contrast, the pore-throat size distributions across different shales generally lie below 20nm, necessitating adjustments to a transport property that pertains to the formation’s resistance against fluid flow. Another study conducted by Alessa et al. (2021), they investigated the comprehensive characterization of pore sizes within Midra shale. The research established that pore-throat and pore-body sizes exhibit both narrow and wide distributions, with average measurements approximately 22nm and 18nm, respectively. As a result, modifications are needed for transport properties influenced by pore-throat sizes in order to accurately represent subsurface conditions. Notably, properties like density, which are tied to the volume of pores in the matrix, can be reasonably estimated based on gas composition within broader channels. These findings hold relevance for advancing unconventional gas development, which is regarded as one of the cleaner fossil fuel alternatives.

In 2022, Alipour et al. (2022) introduced an empirical correlation designed to account for the nonplateau-like pattern and the estimated capillary pressure observed in shale formations. By a dataset of mercury capillary pressure measurements from 30 samples extracted from various US shale formations, their proposed model holds potential for analyzing two-phase displacement phenomena within shale environments. Alessa et al. (2022) introduced a simple formula to precisely ascertain the entry pressure. This relationship, offering a novel method of determination, was employed on real measurements from seven shale samples. Enhancements to its effectiveness were achieved through the integration of k-nearest neighbors (KNN), locally selective combination in parallel outlier ensembles (LSCP), and Savitzky–Golay (SG) filters. The optimal outcome emerged from the sequential amalgamation of the basic formula with unsupervised machine learning and noise-filtering techniques.

The conventional well-logging data which are used to estimate the volume of shale, namely, are gamma ray (GR) and its spectral components, SP log, density (RHOB) log, resistivity logs (LLD, LLS, ILD, MSFL), neutron (NPHI), and sonic (DT) log. Moreover, a combination of gamma ray-density, neutron-density, and sonic-density logs can be used in formulas to estimate the \({V}_{\mathrm{sh}}\) (Ehsan et al. 2019; Tali and Farman, 2021; Mohavvel and Jozanikohan 2022). The decrease in porosity due to the presence of clay minerals will lead in a poor reservoir quality (Zhou et al. 2022). The presence of shale has an effect on the petrophysical properties and logging tool responses, thereby it causes a significant reduction in the effectiveness of the reservoir porosity (Radwan et al. 2020; El-Gendy, 2022; Ismail et al., 2023; Saleh et al. 2023). Using the factor index, a linear relationship between a special factor and shale content can be obtained with the natural gamma ray index (Szabó 2011). To have the best prediction of the hydrocarbon accumulations, one needs to know the reservoir quality distribution factors such as (\(\varnothing\)) and \({V}_{\mathrm{sh}}\) (Mohammed 2020). Gamal and Elkatatny (2021) implemented a new approach developed by the machine learning techniques (ANN) to predict the porosity of the reservoir rock. Their approach overcomes all of the conventional problems in the domain of porosity estimation using empirical correlations, measurements of the core samples, and logging tools.

Taheri et al. (2021) conducted a study by seismic data from the Hendijan oil field to establish a correlation between seismic properties and shale volume values. The researchers employed three distinct methods, namely, the sparse spike inversion, model-based inversion, and band-limited inversion methods to select the seismic line between the wells. The results indicated that the model-based method yielded the most favorable outcomes. Besides, they utilized ANNs in conjunction with seismic properties to estimate the shale volume. In another research, Ali (2021) employed traditional petrophysical techniques, such as linear gamma ray, nonlinear gamma ray, and spontaneous potential with the aim of creating a dataset for training ML algorithms, including random forest (RF), extreme gradient boost (XGBoost), and k-nearest neighbor (KNN). Ultimately, the nonlinear gamma ray method was identified as the most effective among the classical approaches, while the XGBoost algorithm demonstrated superior performance, achieving a mean squared error (MSE) of 0.078 (Ali 2021).

The findings obtained from the study of Jozanikohan and Abarghooei (2022) offer valuable advantages for geoscientists in the upstream petroleum sector. They proved that by the conducted method, samples can be assessed before resorting to intricate and time-consuming chemical and mineralogical analyses, as the Fourier transform infrared spectroscopy (FTIR) method efficiently accomplishes both tasks with greater ease and reduced expenses. This technique proves especially beneficial for evaluating clastic reservoirs, shale oil, and shale gas targets, enabling a rapid evaluation of their potential. The study demonstrates the practicality of the approach using a set of Shurijeh core samples as an illustrative example. In recent years, there has been a noticeable trend in utilizing machine learning (ML) algorithms for shale volume estimation, marking a novel area of interest in the petrophysical evaluation stage. This development is evident from studies conducted in the past decade, such as the research by Syed et al. in the year 2022, wherein they observed an increasing application of ML in various shale-related investigations. In a separate study conducted by Mohammadinia et al. (2023), the aim was to propose simplified techniques for shale volume estimation in a reservoir located in southern Iran. Furthermore, they sought to compare the performance of various ML methods in estimating shale volume. The conventional methods employed for comparison included gamma ray (GR), density-neutron (DN), and density-sonic (DS), while the ML methods consisted of ANN, support vector machine (SVM), and RF. The authors deduced that ANN, SVM, and RF methods estimated the shale volume with much better performance.

Since there are no detailed published data of the Kashafrud reservoir studies, the current research has been performed to investigate and evaluate its reservoir parameters including the shale content (\({V}_{\mathrm{sh}}\)), and porosity (\(\varnothing\)) by the laboratory, and petrophysical methods, as well as the intelligent methods (such as ANN). The aim of this paper was to shed light on the possible role of machine learning to estimate two critical parameters of reservoir quality assessment, porosity and volume of shale, in the Kashafrud Formation. During this process, the high accuracy of estimated parameters by artificial intelligence was carefully evaluated. The performance of the artificial neural network was measured using a criterion of comparing the results of calculations obtained for both results obtained from conventional petrophysical methods and artificial neural network methods. The presence of valuable core data in this study significantly improved the process of comparison and conclusion.

Geological setting

The Kashafrud Formation is in the northeastern of Iran in a sedimentary basin of Kopet-Dagh (Fig. 1). The Kashafrud Formation was characterized as a reservoir by sedimentological and geochemical studies (Ershadinia et al. 2023). This sandstone formation, aging Aalenian-Bathonian (Middle Jurassic) mostly consists of the sedimentary rocks such as shale, sandstone, and conglomerate. A large area in the Kopet-Dagh basin, across the northeastern of Iran has been widespread by the Kashafrud Formation (Poursoltani & Gibling 2011).

Fig. 1
figure 1

The geographic location map of the studied area and well (Revised from Miri et al. 2018)

The Khangiran anticline is approximately located at 180 km northeast of Mashhad and 25 km west of Sarakhs city. Based on the geophysical information, the general trend of the structure is northwest-southeast and it is asymmetric. Also, the northern edge has a steeper slope than the southern edge. In the mentioned anticline, the existence of three separate gas reservoirs including two sweet ones in the Shurijeh Formation, and one sour gas reservoir in the Mozdoran Formation has been confirmed (Mashayekhi et al. 2022). The Khangiran Formation, aging Lower–Middle Eocene with a thickness of 500 m is mainly consisted of succession of olive green, silty, calcareous and clay shales of gray, green-gray, silty, sticky, and calcareous rocks (Ghorbanpour et al. 2023). The stratigraphic column of the studied well (Well A) in the Khangiran gas field has been drawn by Strater software ver. 5 (Fig. 2).

Fig. 2
figure 2

The stratigraphic column of the studied well in the Khangiran gas field, plotted by Strater software ver. 5

The sandstones of Kashafrud Formation are mostly from the arkosic and lithic arenite types, rich of the fragments from the volcanic and sedimentary sources (Poursoltani and Gibling 2011). The thickness of drilled Kashafrud Formation is 433 m. The drilled thickness in the upper parts includes succession of light gray, light brown, gray, medium to coarse sandstones, hard to slightly porous bituminous, calcareous and gray, green-gray, silty and slightly pyrite. The lower drilled parts mainly consist of light gray, brownish gray, light brown, silty, sandy, calcareous, soft and thin layers of light gray sandstone, medium grain, semi-hard to hard (Ghorbanpour 2023).

Materials and methods

Core data

In the present research, the dataset was collected from one well, i.e., well A (Fig. 1) in the Khangiran gas field, NE Iran. This well was drilled to investigate the hydrocarbon status of the bottom formations under Mozdoran (especially Kashafrud Formation) as well as the hydrocarbon production from the Mozdoran, and Kashafrud Formation. This well is drilled up to Kashafrud Formation (with a drilled thickness of 433 m).

Additionally, nine intervals were cored between depths of 3080.5 and 4397.5 m. During the drilling operation, nine core boxes of 0.9 m length were obtained. 10 core samples were then carefully selected and cut from the core #9 of well A. The laboratory measurements of the porosity (mercury prosimetry) and volume of shale (XRD test and densitometry) were performed on these 10 core samples.

Wireline logging data

In this study, 2263 petrophysical data were provided with a depth interval of 0.061 m. One set of well-logging data from an eastern Kopet-Dagh field’s gas producing well, including natural gamma ray (GR), sonic (DT), photoelectric (PEF), density (RHOB), neutron (NPHI), caliper (CALI), spontaneous potential (SP), and shallow & deep laterolog (LLS & LLD) were available from wireline logging process. Since there was a discrepancy between the depths of core samples and well logs, the depth matching was conducted by averaging between the upper and lower depths for each depth whose well-logging data was absent.

Methods

Core analyses

To analyze 10 core samples, the X-ray diffraction (XRD) was used to determine how much clay content existed in the Kashafrud Formation. The analysis was performed to fully identify the type of clay minerals and to calculate the laboratory shale weight percent. The results indicated that the constituent minerals in order of abundance in the studied samples were quartz, clay minerals, alkali feldspars, plagioclase, ankerite, and pyrite, respectively. In Fig. 3, the average weight percentage of each mineral in all samples is plotted separately.

Fig. 3
figure 3

The average weight percentage of samples’ minerals, obtained from the XRD analysis

The result of the XRD experiments is generally based on the weight percent and since it is necessary to make a comparison with the petrophysical data based on the volume percentage, one needs to have the density of each sample to convert the weight percent to the volume percent of clay minerals. The densitometry of the samples was performed by a 25-cc standard pycnometer by means of an organic fluid such as acetone. Therefore, the densitometry tests the samples were performed and each total weight percent of clay minerals were converted to the volume percent. The relevant information is listed in Table 1.

Table 1 The density, volume percentage, and total volume percentage of the clay minerals

The mercury porosimetry can detect the nanopores and macropores up to the size of 400 \(\mathrm{\mu m}\). The mercury porosimetry remained the preferred method for analyzing the microporous materials (Schlumberger & Thommes 2021). Using the mercury porosimetry method, the porosity of 10 samples was precisely measured in the laboratory.

The conventional petrophysical methods for porosity and volume of shale estimation

The most well-logging data can detect the clay minerals. Therefore, the estimation of the shale volume is possible from any logs. The definition of volume of shale in the literature is the ratio of the volume of fine grain particles such as silt and clay to the total volume of the rock (Shah et al. 2021). It has been proven that the gamma ray log and its spectral components (potassium, thorium, and uranium) are the best logs for the volume of shale estimation (Al Al-Azazi and Albaroot 2022; Khamees et al. 2022).

To determine the quantity of the clay minerals, the estimated volume of shale needs to be corrected. Below, Eqs. (1)(6) illustrate the conventional petrophysical relationships to estimate the shale volume from the natural gamma ray log including Bhuyan and Passey (1994), ​​Stieber (1973), Clavier (1971), Larionov-1 (according to the age of the Kashafrud Formation) (1969), and combination of gamma density logs. The symbols, values, and parameters used in the formulas are listed in List of symbols section.

$$I_{{{\text{GR}}}} = \frac{{{\text{GR}}_{{{\text{log}}}} - {\text{GR}}_{{{\text{min}}}} }}{{{\text{GR}}_{{{\text{max}}}} - {\text{GR}}_{{{\text{min}}}} }}$$
(1)
$$V_{{{\text{cl}}}} = 0.6I_{{{\text{GR}}}}$$
(2)
$$V_{{{\text{cl}}}} = \frac{{0.5 \times I_{{{\text{GR}}}} }}{{1.5 - I_{{{\text{GR}}}} }}$$
(3)
$$V_{{{\text{cl}}}} = 1.7 - \sqrt {3.38 - \left( {I_{{{\text{GR}}}} + 0.7} \right)^{2} }$$
(4)
$$V_{{{\text{cl}}}} = 0.33\left[ {2^{{\left( {2 \times I_{{{\text{GR}}}} } \right)}} - 1.0} \right]$$
(5)
$$V_{{{\text{cl}}}} = I_{{{\text{GR}}}} \times \left( {\frac{{\rho_{{\text{b}}} }}{{\rho_{{{\text{b}}_{{{\text{sh}}}} }} }}} \right)^{3}$$
(6)

After calculating the volume of shale using petrophysical and laboratory relationships, to measure the accuracy of the data, the values obtained from these two methods were compared. The results obtained from the petrophysical methods with error were calculated. Through this method, it is possible to match the volume percentages achieved in the laboratory with the values obtained through the experimental relationships and validation. According to the curve of the average percentage of errors (Fig. 4), the natural gamma ray (GR) was the criterion for further petrophysical studies and analytical methods such as the neural network.

Fig. 4
figure 4

The curve of average errors percentages for volume of shale. (1) Stieber/\({\text{I}}_{{{\text{GR}}}}\), (2) Stieber/\(I_{{{\text{SGR}}}}\), (3) Stieber/\(I_{{{\text{CGR}}}}\), (4) Clavier/\(I_{{{\text{GR}}}}\), (5) Clavier/\(I_{{{\text{SGR}}}}\), (6) Clavier/\(I_{{{\text{CGR}}}}\), (7) Larionov/\({\text{I}}_{{{\text{GR}}}}\), (8) Larionov/\({\text{I}}_{{{\text{SGR}}}}\), (9) Larionov/\(I_{{{\text{CGR}}}}\), (10) Gamma-Density logs/\(I_{{{\text{GR}}}}\), (11) Gamma-Density logs/\(I_{{{\text{SGR}}}}\), (12) Gamma-Density logs/\({\text{I}}_{{{\text{CGR}}}}\)

After the calculations, it was observed that the average error rate was considerably high due to the laboratory validations (89.46%). Thus, the intelligent methods (ANN) became the basis of the next step.

In the porosity calculation segment, to compare the performance and results of both conventional petrophysical and laboratory methods, the average errors percentages obtained from these two methods were calculated (Table 2). It was observed that the average error rate was high, standing at 58.3%. Thereby to reduce the error rate, the ANN was chosen to accurately calculate the porosity at different depths.

Table 2 The corrective relationships and average error for porosity

The estimation of the \(\varnothing\) and \({V}_{\mathrm{sh}}\), using the multilayer perceptron (MLP) artificial neural network (ANN)

The neural network is a simulation of the human brain in the form of an artificial system that consists of a myriad of processor organs which are known as neurons with a special order that is similar to the human mind. A neural network consists of an input layer to the apply features of problem, a hidden layer to process, and an output layer to provide the answer(s). All of the training algorithms aim to minimize the mean squared error (MSE) between the outputs of the predicted model and the observed outputs with respect to the training dataset (Adegbite et al. 2021). The methods based on the artificial intelligence proved their effectiveness and ability to provide robustness modeling on the basis of their high correlation coefficient between the actual and estimated volume of shale.

The application of MLP network for the porosity estimation

In the well under study, the core laboratory measured porosity contains 10 data. Since the artificial intelligence-based methods needs a large number of data to well train the network, the number of data has been increased to 686 based on the geostatistical algorithms. The selection of input data was performed in two ways. In the first approach, all the available logs were inserted to the MATLAB software ver. 2021 (Rajabi et al. 2021, 2023; Radwan et al. 2022; Abdelghany et al. 2023). In the other approach, some selected well-logging information chosen from the sensitivity analysis were inserted to the mentioned software as input data. In the both approaches, the input data was standardized to avoid one variable dominates the model.

In general, 70%, 15%, and 15% of the data were assigned for the training, validation, and testing, respectively. The Levenberg–Marquardt algorithm was used to train the MLP neural network. During several trainings, the main criteria for evaluation of the most appropriate network, was chosen to be the mean squared error (MSE) ​​and the correlation coefficient (R). The structure of the network consists of nine and six neurons (for the both approaches) in the input layer, one and two hidden layers, and one output layer. The input, output, parameters of the network, and their symbols are summarized in Table 3.

Table 3 The input and output data of the MLP network for estimating porosity

An outline of the optimal MLP network model for estimating porosity in the Kashafrud Formation can be seen in Fig. 5. All the possible mathematical functions for generation of the output were tested, and the results are listed in Table 4. There are several characteristics distinguishing each of neurons in the network, including the input weights, and the activation functions. Compared to the rest of the functions, the Tan-Sigmoid transfer function showed a better performance.

Fig. 5
figure 5

The outline of the optimal MLP network model for estimating porosity in the Kashafrud Formation

Table 4 The MLP network with a fixed topology (Tansig) replied to the different transfer functions

Since the input data ranged between 0 and 1, having a function that computes the output between zero and one was a logical reason for choosing the Tan-Sigmoid function. Therefore, the Tan-Sigmoid (tansig) was assumed as one of the most commonly used activation functions in the MLP networks. Equation (7) describes the Tan-Sigmoid function in which \(\upbeta\) indicates the slope parameter:

$$f\left( x \right) = \frac{1}{{1 + e^{ - \beta } }}$$
(7)

First, all available well-logs data were entered into the ANN (Table 5). As the proposed MLP network had a low R-value and a high MSE, it did not capture the laboratory data successfully. To improve the results, it was decided to limit the input data to the most relevant parameters. To find the most correlated parameters with the porosity, a sensitivity analysis in the form of a Pearson matrix (Table 6) was performed.

Table 5 Mean squared error values ​​and correlation coefficient for all logs for porosity estimation
Table 6 The Pearson correlation matrix to determine the selected well logs in the ANN to estimate porosity

To calculate the correlation coefficient (R), and the MSE (Eq. (8), the resulting estimates were compared to the actual measurements of the porosity in the laboratory.

$${\text{MSE}} = \frac{{\sum \left( {x_{{i_{{{\text{meas}}}} - X_{{{\text{i}}_{{{\text{est}}}} }} }} } \right)^{2} }}{N}$$
(8)

In which \(x_{{i_{{{\text{meas}}}} }}\) is the measured values, \(x_{{i_{{{\text{est}}}} }}\) is the estimated values, and N is the total number of observations. The input and output data were normalized as follows:

$${\text{Normalized}}\;{\text{value = }}\frac{{{\text{Original}}\;{\text{Value}} - {\text{Minimum}}\;{\text{Value}}}}{{{\text{Maximum}}\;{\text{Value}} - {\text{Minimum}}\;{\text{Value}}}}$$
(9)

The porosity estimation results illustrated that when both all and selected well logs were put into the network, 288, 62, and 62 data were assigned for training, validation, and testing, respectively. It was also observed that the estimation when all logs were inserted into model as the input, has high MSE and also low R-values (Table 5). Considering all other tested topologies, 6-7-1 was the chosen architecture, having the highest R-value and the lowest MSE (Table 7). The MSE and R values were the best model at epoch 103 (Fig. 6), when the optimal MLP model was gained for estimation of the porosity in the Kashafrud Formation. Table 8 compares the results of networks’ trainings with one and two hidden layers. MSE and R values of 2.78731 \({\mathrm{E}}^{-4}\) (Fig. 6) and 0.9999 (Fig. 7), respectively, were found with an optimal MLP model for the porosity estimation in the Kashafrud Formation.

Table 7 Mean squared error and correlation coefficient values for the selected well logs
Fig. 6
figure 6

Training, validation, and testing curves of the samples with the chosen topology 6–7-1

Table 8 The Pearson correlation matrix to determine the selected well logs in the ANN to estimate \({\mathrm{V}}_{\mathrm{sh}}\)
Fig. 7
figure 7

The estimation error curves for the samples with the chosen topology 6–7-1

Finally, as a major step, to estimate the porosity in the whole interval of the Kashafrud Formation from the core data that were not used since this step (10 samples), the optimal MLP network was employed. The average output which indicates the estimated porosity was equal to 0.13%.

The application of MLP network for the volume of shale estimation

The core laboratory measured \({\mathrm{V}}_{\mathrm{sh}}\) contains 10 data. Since the artificial intelligence-based methods needs a large number of data to well train the network, the number of data has been increased to 702 based on the geostatistical algorithms. It was observed that the \({\mathrm{V}}_{\mathrm{sh}}\) estimation when all logs were inserted into model as the input has high MSE and also low R-values (Table 11). In the selection of proper inputs for the ANN model, the Pearson Correlation Coefficient (Table 8) has been used extensively. An outline of the optimal MLP network model for estimating volume of shale is shown in Fig. 8. The input, output, parameters of the network, and their symbols are summarized in Table 9. Compared to the rest of the different transfer functions, the Tan-Sigmoid function showed a better performance (Table 10).

Fig. 8
figure 8

The outline of the optimal MLP network model for estimating \(V_{{{\text{sh}}}}\) in the Kashafrud Formation

Table 9 The input and output data used of the MLP network for estimating the volume of shale
Table 10 The MLP network with a fixed topology (Tansig) replied to the different transfer function

Similar to the previous estimation, before using the results of the Pearson correlation matrix, the values of R and MSE had significant errors (Table 11). However, after considering the sensitivity analysis, a significant improvement was seen in the trains (Table 12). The conventional methods for the estimation of \({\mathrm{V}}_{\mathrm{sh}}\) are used which produce inconsistent results. Though, comparing all the tested networks, 6-8-1 had the least mean squared error and the highest correlation coefficient (Table 12). Through this, MSE and R-values of 1.28701 \({\mathrm{E}}^{-9}\) (Fig. 9) and 0.9999 (Fig. 10), respectively, were found at epoch 1000 with an optimal MLP model for the \({V}_{\mathrm{sh}}\) estimation in the Kashafrud Formation.

Table 11 Mean squared error values ​​and correlation coefficient for all well logs
Table 12 Mean squared error ​​and correlation coefficient values for the selected well logs
Fig. 9
figure 9

Training, validation, and testing curves of the samples with the chosen topology 6-8-1

Fig. 10
figure 10

The estimation error curves for the samples with the chosen topology 6-8-1

Finally, as a major step, to estimate the \({V}_{\mathrm{sh}}\) in the whole interval of the Kashafrud Formation from the core data that were not used since this step (10 samples), the optimal MLP network was employed. The average output which indicates the estimated \({\mathrm{V}}_{\mathrm{sh}}\) was equal to 8.34%.

Results and discussion

In this study, the laboratory analyses including the Powder X-ray diffraction (PXRD), and densitometry were performed. Based on the results of XRD analysis, the minerals in order of abundance are quartz, clay minerals, alkali feldspars, plagioclase, ankerite, and pyrite. Since the percentage of the clay minerals in the XRD test is weight percentage, this result should be compared with petrophysical data expressed in terms of volume percentage. To calculate the volume percent, a densitometry test was performed and its results showed that the highest and lowest amounts of clay minerals were 12.8 and 6.7 volume percent, respectively (Table 1). Then, having the density, the laboratory \({\mathrm{V}}_{\mathrm{sh}}\) of all samples was obtained. The results are presented in Table 13.

Table 13 The values of density and laboratory volume of shale for each sample

Figure 11 plots the data from the density log in terms of the natural gamma ray. The lowest amount of the natural GR of the studied well was chosen as the clean sand baseline, and the highest amount of the natural GR was chosen as the clay baseline, based on the data of the natural GR of the studied field. The range of all samples lead to the clay baseline, and most core samples have a density between 2.68 and 2.72 g/cm3 and subsequently, their radioactivity is relatively high. Thus, this indicates a high amount of the clay minerals, and this was confirmed in the laboratory studies.

Fig. 11
figure 11

Density vs. natural gamma-ray

Furthermore, SEM studies were performed to determine the distribution pattern of the clay minerals. Based on these studies, the distribution pattern of clay minerals was mainly of pore filling type and in some limited cases, pore coating type was identified. The size of clay minerals varied from 1.4 μm to 40 μm. The performed EDAX analysis confirmed the minerals identified by XRD method. The most important achievements of this research include the following findings:

The results of volume of shale estimation showed that the natural gamma ray can be used as a criterion for further petrophysical studies and mathematical analytical methods (such as neural network) due to the lower error rate. Stieber calibration and combination of gamma-density logs also had the lowest mean error. The MLP neural network recorded an acceptable and appropriate performance for estimating the \(\varnothing\) and \({V}_{\mathrm{sh}}\) based on the selected logs. Though, when all the logs were imported as input, the error values ​​prevented the appropriate topology from being selected as the best performance.

The challenge with the utilized method is the effort required to carefully select the appropriate training data, which is a common requirement for all models that use real well-logging data. However, the ANN helps to fuse different data and acquire complete and accurate information. Furthermore, this approach minimizes computing time, saving both time and money that would have been spent on core sampling without any prior knowledge of the matrix material or pore fluid.

The most important results of this study including following:

  1. 1.

    The average laboratory value of \({V}_{\mathrm{sh}}\) for all the core samples was %8.88.

  2. 2.

    The average value of \({V}_{\mathrm{sh}}\) based on the petrophysical relationships in the entire under-studied well was %0.88.

  3. 3.

    The average value of \({V}_{\mathrm{sh}}\) based on the conducted MLP-ANN in the entire under-studied well was %8.34.

  4. 4.

    The average value of \(\varnothing\) based on the conducted MLP-ANN in the entire under-studied well was %0.13.

The aim of this study was to estimate the porosity and volume of shale in the Kashfrud gas reservoir in the Khangiran field. The validation conducted in the research was highly valuable as it compared the results obtained from two distinct approaches, enabling authors to accurately and reliably calculate the percentages of improvement in estimating the two parameters. The superiority of the present investigation compared to other similar published studies in the field of oil exploration is the integration of conventional petrophysical methods (calculation of parameters methods using traditional relationships), and intelligent ML methods (MLP—ANN) to improve the accuracy of reservoir parameters estimation.

The ANN developed in the study can be employed and examined to estimate porosity and volume of shale values in other wells within the gas field with reliable accuracy, when real well log or core samples data are not available. Given the high expenses associated with exploration operations in the oil industry, the method in this article presents a golden opportunity to save time and money by intelligent and modern techniques like ANNs to quickly and accurately estimate reservoir parameters using initial data from the target field. This approach can greatly assist petroleum engineers in tackling time-consuming and challenging problems related to the petroleum engineering field.

Conclusions

Traditionally, the porosity and volume of shale are estimated with a very high error rate, which was the main reason that the multilayer perceptron (MLP) artificial neural network (ANN) was conducted to reduce the error. The application of MLP resulted in the significant error percentage decrease in the estimation of two parameters including \(\mathrm{\varnothing }\) and \({\mathrm{V}}_{\mathrm{sh}}\) from 58.3% and 89.46% in the traditional petrophysical method to 2.78731 \({\mathrm{E}}^{-4}\) and 1.28701 \({\mathrm{E}}^{-9}\), respectively. According to the validation of obtained results from the application of the MLP method with the core analysis data, the porosity and volume of shale in the understudy field has been assessed to be highly accurate.

The correlation coefficient (R-value) and mean squared error (MSE) in the estimation process were improved considerably in comparison with conventional methods. Furthermore, the correlation coefficient for the both estimations were 0.9999, using the MLP method. Besides, the ability of MLP-ANN made great percentage of improvements, which were 99.95% for \(\varnothing\), and 99.99% for \({\mathrm{V}}_{\mathrm{sh}}\). The obtained results of this investigation using MLP-ANN made great percentage of improvements (99.95% for \(\varnothing\), and 99.99% for \({V}_{\mathrm{sh}}\)), which has greatly impacted the estimation of in place hydrocarbon.