Introduction

Application of well log analysis is one of the most popular methods for assessing the hydrocarbon reservoir producibility (Jassam et al. 2023; Wood 2023; Mahmood and Sadeq 2023). The prementioned assessment in carbonate reservoirs can often be more complex than the evaluations of clastic-type formations. Broad analog investigations and regional and local geological modeling lay the basis for reservoir calculations. Nevertheless, complex diagenetic processes are liable to monitor carbonate pore systems and, consequently, the moveability of fluids within the porous media (Herrick and Kennedy 1994, 1995; Asquith 1997; Kassem et al. 2022; Radwan et al. 2022a; Abdullah et al. 2023; Ullah et al. 2023; Bigdeli and Delshad 2023). The computation of trustworthy values of the water saturation in possible hydrocarbon-bearing carbonate reservoirs from borehole logging analysis should therefore be based on a detailed comprehension of rock formations. The cementation exponent (m) is one of the most varied parameters in carbonate rocks for the purpose of determining the formation resistivity factor (FRF) and, afterward, the water saturation modeling (Soleymanzadeh et al. 2018; Kolah-kaj et al. 2022; Najafi-Silab et al. 2023).

Formation resistivity factor is a vital petrophysical characteristic, in which it has been extensively utilized for characterization of reservoirs. The m exponent in Archie’s law, cementation exponent, is a critical coefficient in the water saturation estimation of reservoir rocks (Archie 1942). Numerous variables, including the reservoir temperature and pressure, secondary porosity, pore throat size, mineralogy, lithology, electrical conductivity of water, tortuosity factor, specific surface area, degree of cementation, and packing index, have a significant impact on the magnitude of the cementation exponent. Hence, it has turned out to be an area of interest to reduce the uncertainties related to cementation exponent calculation by indicating the influence of diverse parameters (Najafi-Silab et al. 2023).

There exist a large number of investigations for establishing standard formulae for the calculation of cementation exponent as a function of pore and rock types. Moreover, some researchers developed formulations that were regularly associated with petrophysical logging data. In so doing, these methods were frequently well- or field-specific (Ragland 2002; Syofyan et al. 2019; Penna and Lupinacci 2020; Mishra et al. 2022). Generally speaking, the outcomes obtained from these formulae were not applicable to other carbonate formations or other locations (Soleymanzadeh et al. 2021a, c; Najafi-Silab et al. 2023). Therefore, it is vital to develop accurate models for cementation exponent especially for deep and tight carbonate pore systems located in West Asia.

Machine learning (ML) is one of the most robust methods for modeling in petroleum engineering and geosciences. A large number of studies have been conducted in petroleum engineering (Johnson et al. 2023; Al-Musawi et al. 2023; Al-Janabi et al. 2021). These studies confirm the great strength of ML algorithms for modeling different parameters. Some of these parameters are permeability (Rostami et al. 2022), aqueous nano-fluid viscosity (Mahdavi-Ara et al. 2022), viscosity of viscoelastic surfactant (VES) (Mahdaviara et al. 2021), CO2 solubility in brine (Sayahi et al. 2021), trapping efficiency of carbon dioxide (Safaei-Farouji et al. 2022), pore pressure (Radwan et al. 2022b), geomechanical characteristics (Abdelghany et al. 2023), and seismic (Manzoor et al. 2023).

Anifowose et al. (2017) applied artificial neural network (ANN), multivariate linear regression (MLR), and support vector machine (SVM) for accurate estimation of cementation exponent in a Saudi Arabian carbonate reservoir with respect to the wireline logs. They found that MLR can give reasonable accuracy and closely match the core measurements. Using the ANN algorithm, Kadhim et al. (2017) establish a correlation to calculate cementation exponent with respect to porosity, permeability, and formation resistivity factor. The authors applied their model to Nasiriya oilfield across Yammama carbonate formation, and consequently, they reached satisfactory results with experimental core data. In recent study, genetic programming (GP), and hybrid artificial neural network and particle swarm optimization (ANN-PSO) were utilized by Mahmoodpour et al. (2021) to characterize cementation exponent in terms of porosity, permeability, and rock density. Their investigation revealed the higher performance of ANN-PSO than the proposed GP-derived correlation. Among this diverse family of algorithms, radial basis function neural network (RBFNN) and gene expression programming (GEP) have demonstrated considerable capacity for modeling complex phenomena (Rostami et al. 2019).

The main objective of this study is focused on the development of cementation exponent models for carbonate reservoirs, especially tight and deep formations. In other words, the main deviations of literature studies from actual data exist in samples with low total porosity. In current work, the authors tried a lot to generate two types of models, including white-box correlations and black-box method. For white-box approach, the most advanced technique, named gene expression programming, is used to derive two new correlations. For black-box approach, radial basis function neural network optimized with ant colony optimization (RBFNN-ACO) algorithm is utilized for the first time. Moreover, this is the first time that the authors applied total porosity and diverse pore types and descriptions as the input variables for predicting cementation exponent, especially for both regular and tight and deep carbonate samples (Ragland 2002). The proposed models for cementation exponent are also integrated into a comprehensive knowledge of the porous media in the potential reservoirs, but for the reason that the data were attained from cores from a diversity of geographical locations and are interrelated to specific pore descriptions, they should be more extensively appropriate to various regions and carbonate reservoirs. The main limitation of this study is the size of the databank applied for modeling. Through future experimental research and the collection of a more extensive database, the developed models in this study could be properly updated.

Data preparation and literature correlations

Data gathering

For constructing a comprehensive model, it is necessary to prepare a widespread database. According to relevant literature, porosity, pore and rock types, and petrophysical data have a considerable impact on the cementation exponent estimation (Borai 1987; Focke and Munn 1987; Watfa and Nurmi 1987; Herrick and Kennedy 1994, 1995; Asquith 1997; Ragland 2002; Syofyan et al. 2019; Penna and Lupinacci 2020; Mahmoodpour et al. 2021; Mishra et al. 2022). Total and effective porosity are the main outputs of petrophysical logging analysis. Among the petrophysical logging parameters, it is adequate to consider the total porosity for cementation exponent estimation. Therefore, total porosity and pore types can be considered the main affecting variables in estimating the cementation exponent (Ragland 2002; Syofyan et al. 2019; Penna and Lupinacci 2020).

In the present investigation, an all-inclusive database of reservoir rock characteristics of carbonate samples has been collected from the open-source literature (Ragland 2002). This databank includes cementation exponent (m) as the output, and various pore types, including moldic (MO), interparticle (BW), fracture (FR), intracrystalline (WI), intercrystalline (IX), fenestral (FN), non-fabric-selective dissolution (connected, NFS/CN), total porosity (PHIT), and non-fabric-selective dissolution (Isolated, ISVUG) as the input variable for modeling. Overall, 112 data points are acquired from the work of Ragland (Ragland 2002), in which 98 data points are taken at ten wells in the USA and two overseas locations. The remaining data (i.e., 14 data points) are collected from Reservoirs Inc.’s Carbonate Rock Catalog (Ragland 2002). Before modeling, the database has experienced two steps of preprocessing, including normalization and data splitting into test and train subsets.

Pore characteristics are crucial in dictating fluid properties and storage mechanisms within permeable formations. Thus, the configuration of pores significantly influences the petrophysical properties and the dynamics of multiphase flow in reservoir rocks (Al-Dujaili et al. 2021). The key pore descriptions existing in the database are non-fabric-selective dissolution, interparticle, and moldic porosity. Intercrystalline, intraparticle, fracture, and fenestral porosity contain negligible proportions of whole pore types in most of these core plugs. Large portions of the moldic porosity are seemingly created by dissolution of fossil fragments or fossils; the rest of the moldic pores appear to be after peloids, ooids, and other coated grains, and to a lesser degree, intraclasts and pisoids. Moldic porosity is the prevailing pore type in 37 core plugs. Those pores existing between grains are called interparticle pores. These types of pores establish the dominant porosity in 19 core plugs. In continuum, any pore spaces that have been established because of dissolution of rock material without considering the initial texture, are called non-fabric-selective dissolution (NFS) porosity. Caverns, channels, and vugs are known as NFS porosity, where so much rock material has been dissolved. In this manner, the original texture can no longer be recognizable (Choquette and Pray 1970). In the current work, so as to separate probable impacts on the values of cementation exponents and finally, water saturation computation, NFS pores that are connected (NFS/CN) were distinguished from isolated vugs (ISVUG) as far as could be differentiated in thin section. Only three core plugs contain substantial quantities of isolated vugs. Intraparticle, fracture, fenestral, and intercrystalline pores contain slight proportions of entire pore spaces in numerous core plugs, but only a small number of contain large amounts of these pore types (Ragland 2002).

Data normalization

For improving the model estimation and regression, normalization of the parameters with wide range of variation is recommended. In this step, input variables and output parameter are normalized (between 0 and 1). For this, the following approach is applied for normalization (Bakyani et al. 2016):

$${x}_{N}=\frac{x-{x}_{{\text{min}}}}{{x}_{{\text{max}}}-{x}_{{\text{min}}}}$$
(1)

in which, symbol x, and subscripts N, min and max denote for the considered input/output parameter, normalized, minimum, and maximum of the considered parameter for normalization, respectively. The idea of Eq. (1) is adapted from a previous study (Bakyani et al. 2016). This equation is a type of normalization which normalizes the data between 0 and 1. There is different type of normalizations with different formulas which can be used for normalizing variables including logarithmic normalization, normalization between −1 and + 1, exponential or power normalization, minimum–average–maximum normalization, and minimum–maximum normalization. In this study, Eq. (1) presents a proper approach for normalizing the input and output parameters.

For RBFNN-ACO, no data normalization is conducted and only data splitting is carried out. Because no improvement is achieved in model estimation and regression by means of data normalization.

Data splitting

Data splitting into test and train subsets is carried out using a random computer program in order to pick up a homogeneous distribution of the data for constructing the model. A snapshot of the cementation exponent variations with some of input variables is shown in the 2D heatmap diagrams as shown in Fig. 1.

Fig. 1
figure 1

2D heatmap diagrams showing the variation of cementation exponent with various input variables: (a) variation of cementation exponent versus moldic (MO) and interparticle (BW) porosities, (b) variation of cementation exponent versus moldic (MO) and non-fabric-selective dissolution (connected, NFS/CN) porosities, (c) variation of cementation exponent versus moldic (MO) and total porosity (PHIT)

Train and test subsets include 70% and 30% of the whole databank used for modeling, respectively.

Literature published correlations

A number of commonly used models in literature are listed in Table 1.

Table 1 List of common correlations from existing literature for estimation of cementation exponent

Developing intelligent model

Radial basis function neural network

Background

Artificial neural network (ANN) is a subcategory of artificial intelligence methods, which is inspired from biological nervous systems, for example, the human brain. ANNs have the ability to be learn principal behaviors and trends in the data and create a relationship between input and output parameters (Anifowose et al. 2017; Rostami et al. 2019; Mahmoodpour et al. 2021). These networks contain several enormously interconnected processing constituents termed as neurons or nodes as processing components that are organized in diverse layers to solve specific issues, for instance, problem classification, approximating function, classification, pattern identification, clustering, etc. The radial basis function neural networks (RBFNN) and the multilayer perceptron neural networks (MLPNN) are classified as the highly common ANN algorithms. The key difference between RBFNN and MLPNN is the method that the neurons process the data (Tatar et al. 2013).

The RBFNN is a type of feed-forward neural networks (FFNN), which is planed according to iterative function estimation and local base functions. Learning process of RBFNN is easier than MLPNN due to the naive and fixed architecture using three layers. Additionally, RBFNN can give a very well response to information that is not existing in learning stage (Tatar et al. 2013).

Theory

RBFNN has a three-layer structure, including input, intermediate or hidden, and output layers (Tatar et al. 2013). The schematic architecture of the RBFNN with m-hidden nodes and n-input variables is illustrated in Fig. 2. Each individual node in the intermediate layer holds a nonlinear activation function known as RBF whose final layer has inverse proportionality to the distance from the nodes center. The radial function form, node center, and the distance scale are the settings of model. According to the linear optimization approach, by minimizing the mean squared error (MSE), the RBFNN can attain a global optimal answer to the adaptable weights. Following equation presents the output of RBFNN for an arbitrary pattern x (as the input) (Tatar et al. 2013):

$$y_{i} (x) = \sum\limits_{i = 1}^{m} {w_{i} \phi_{i} (\left\| {x - x_{i} } \right\|} )$$
(2)
Fig. 2
figure 2

Structure of RBFNN model with m-hidden nodes and n-input parameters (Rostami et al. 2022)

In Eq. (2), the parameters \(\left\| {x - x_{i} } \right\|\), \(\phi_{i}\) and \(w_{i}\) denote for, respectively, Euclidean distance between center of radial function and input pattern, radial basis function (RBF), and weight of connection. According to Eq. (2), each input data can be converted into the output pattern. Weight coefficients, Euclidean distance, and RBF are the main elements for RBFNN modeling to calculate the optimized estimated outputs. For this, it is required to use the so-called function, termed as RBF, for conversion between input and output data. There are numerous types of RBF kernel in kernelized learning algorithms. The most commonly used RBF is Gaussian type, which is applied in the current work by following equation (Tatar et al. 2013):

$$\phi \left(\left\| {x_{i} - x_{j} } \right\|\right) = \exp \left( - \frac{{\left\| {x_{i} - x_{j} } \right\|^{2} }}{{2\sigma^{2} }}\right)$$
(3)

where the spread and center of Gaussian function are indicated by \(\sigma\) and \(x_{j}\), respectively. Using a proper evolutionary optimization algorithm, which will be delineated in the following section, the parameters of Gaussian function are adjusted.

By tuning \(\sigma\) and \(x_{j}\) as the parameters of the Gaussian function, the RBF kernel will be effective for using in Eq. (2). This will lead to better prediction of output pattern with minimized cost function (minimum error).

Gene Expression Programming (GEP)

Background

Gene expression programming (GEP) (Ferreira 2001), genetic programming (GP) (Cramer 1985; Koza and Koza 1992), and genetic algorithms (GA) (Holland 1975; Goldberg and Holland 1988) are the three main groups of EAs. The first member of EAs, which utilizes chromosomes of fixed length in the form of linear strings as the individual population, is called GAs (Ferreira 2001). The newer version of GAs is extended as GPs, which apply nonlinear strings of parse trees with variable length and shape to the individual population (Ferreira 2001).

Additional enhancement of the GPs and GAs leads to the introduction of the GEP algorithm, which employs both expression trees (nonlinear objects of different shapes and lengths) and chromosomes (linear strings of fixed length). In other words, phenotypes (expression trees) and Karva expressions or genotypes (chromosomes) are the main entities for GEP implementation. Chromosomes are composed of one or more genes with equal size. Each gene consists of two parts named as tail and head. Head part includes both terminals (constants and parameters) and mathematical operators (× , √, ± , log, /, Σ, etc.), and tail part is composed of only terminal symbols.

The data are decrypted and translated from the genotypes to the phenotypes. In this decryption procedure, the genes in the chromosomes are specified in sub-trees, and they are coupled together by means of a linking function, that is established in advance (e.g., × , ± ,) (Ferreira 2006; Alkroosh and Nikraz 2011; Hong et al. 2018). An example of the expression tree and the chromosome of a typical GEP architecture is exhibited in Fig. 3.

Fig. 3
figure 3

Illustration of a typical architecture for GEP algorithm

Generally, the GEP strategy is identified as a symbolic regression technique since it makes available an empirical equation for a problem via symbolic expression trees. Moreover, GEP strategy is different from conventional regression and curve-fitting methods (Mahdaviara et al. 2021). In classical approaches, at first an empirical equation is proposed for a challenge, and afterward the constants of this equation are adjusted by regression and fitting to actual data, even though in GEP algorithm the empirical equation and adjustable constants are tuned at the same time (Rostami et al. 2019; Mahdavi-Ara et al. 2022). GEP strategy primary combines the known functions randomly and afterward modifies the constants of output empirical equation via diminishing an objective function such as root mean square error (RMSE) between actual data and predictions. Subsequently, the mutation and crossover operators are utilized to establish a better population of solutions. This procedure is repeated until the ending condition, which is diminishing the objective function, is accomplished and the ultimate mathematical formula is prepared in the form of symbolic expression trees. Successful application of GEP in modeling complex phenomena is demonstrated in several research studies, such as aqueous nano-fluid viscosity (Mahdavi-Ara et al. 2022), viscoelastic surfactant viscosity (Mahdaviara et al. 2021), and absolute permeability of carbonate reservoirs (Rostami et al. 2019).

GEP procedure

Throughout a GEP strategy, different steps, including initialization, translation, execution, fitness evaluation, and replication, are performed. Subsequent procedure is performed throughout a GEP strategy:

  1. 1.

    Initialization: In this step, random generation of chromosomes to construct preliminary population is carried out.

  2. 2.

    Translation: Conversion of the generated genotypes (chromosomes) into the phenotypes (expression trees) is conducted in translation process.

  3. 3.

    Execution: In third step, execution of the programs containing genotypes is accomplished.

  4. 4.

    Fitness Evaluation: For checking the degree of fitness of every chromosome, a fitness function is defined and utilized. The modeling is accomplished successfully whenever the termination criterion is reached via each chromosome. Otherwise, so as to generate the new population, the optimal genotypes are chosen from the population. Afterward, tournament and roulette-wheel selection approaches can be utilized (Zhong et al. 2005).

  5. 5.

    Replication: To generate improved groups of genotypes, diverse operations such as mutation, transposition, and recombination are used for executing genetic enhancement and replication processes. In the last part, the process is repetitively carried out while waiting for achieving the termination standard.

The interested researchers are suggested to read the relevant articles for further explanations concerning the academic and applied perspective of the GEP mathematical strategy (Ferreira 2006; Zhong et al. 2017; Hong et al. 2018).

Ant Colony Optimization (ACO)

Originally, Dorigo (1992) introduced ant colony optimization (ACO) as a population-based method. The main idea of ACO algorithm is to zoom in on an intrinsic characteristic of ants, which seek for the shortest route between the nest and the food (Dorigo et al. 1996; Dorigo and Gambardella 1997; Stützle and Hoos 2000). In fact, by leaving pheromone as a chemical source, the population of ants is able to exhibit the satisfactory and shortest path (i.e., solution). For archiving feasible solutions, Gaussian format of composite probabilistic method should be applied, which is appropriate for discrete path modeling. For continuous path modeling, pheromone approach could be used, which leads to the best solution stored in above-mentioned archive (Heris and Khaloozadeh 2014). The ACO strategy is alternatively known as estimation of distribution algorithm (EDA) because of the evolutionary integration between the Gaussian probabilistic method and solutions archive in a continuous map (Larraanaga and Lozano 2001; Lozano 2006; Socha and Dorigo 2008).

In order to find the response vector x, \(x \in X \subseteq R^{{n_{x} }}\), the cost function (CF) should be reduced to minimum value. The succeeding process summarizes the step-by-step calculation in the ACO procedure and is as follows (Heris and Khaloozadeh 2014; Socha and Dorigo 2008):

  1. 1.

    Initially, the CF is computed for the \(N\) numbers of haphazardly selected answers for a typical problem from \(X\).

  2. 2.

    By presentation of the best and the worst preliminary responses by \(x_{1}\) and \(x_{N}\), correspondingly, the reproduced archive of responses is well organized.

  3. 3.

    In accordance to Eq. (4), every response will accept a weight:

    $$u_{i} \propto \frac{1}{{\sqrt {2\pi } \alpha N}}\exp \left[ { - \frac{1}{2}\left( {\frac{i - 1}{{\alpha N}}} \right)^{2} } \right]$$
    (4)

where the next equation is valid amid the all weights:

$$\sum\limits_{i = 1}^{N} {u_{i} } = 1$$
(5)
  1. 4.

    After that, it is necessitated to create Gaussian probabilistic method, in which it can be expressed by Eqs. (6) and (7):

    $$G^{j} (x[j]) = \sum\limits_{i = 1}^{N} {u_{i} N(x[j];\mu_{i} [j],\sigma_{i} [j])}$$
    (6)
    $$N(x;\mu ,\sigma ) = \frac{1}{{\sqrt {2\pi } \sigma }}\exp \left[ { - \frac{1}{2}\left( {\frac{x - \mu }{\sigma }} \right)^{2} } \right]$$
    (7)

In Eq. (6), \(x[j]\) denotes for \(j^{th}\) constituent of the \(x\) as response, and \(j\) characterizes the decision parameter. Eq. (8) demonstrates the mean value of the parameter, and Eq. (9) shows the standard deviation for the model as follows:

$$\mu_{i} [j] = x_{i} [j]$$
(8)
$$\sigma_{i} [j] = \frac{\xi }{N - 1}\sum\limits_{{i^{\prime} = 1}}^{N} {\left| {x_{i} [j] - x_{{i^{\prime}}} [j]} \right|}$$
(9)

where the exploitation/exploration stability is specified by \(\xi\) as a positive real coefficient.

  1. 5.

    For generating \(M\) numbers of new answers to the problem, \(g = (G^{1} ,G^{2} ,...,G^{{n_{x} }} )\) as a multidimensional method is exploited to estimate the value of CF for every offspring.

  2. 6.

    The \(M\) offsprings and n optimum solutions are selected to extend new solutions archive. It is noteworthy that ultimate response of the question is those found in the solution archive.

  3. 7.

    This technique is repetitively carried out while waiting for achieving the termination standard. If not, the aforesaid steps have to be iterated.

Results and discussion

Development of the suggested models

This investigation is predominantly intended to study the cementation exponent of the carbonate pore systems via the fact that it changes with numerous rock characteristic. As formerly specified in Sects. "Introduction" and "Data preparation and literature correlations," a number of existing publications clarified the features influencing the cementation exponent of carbonate rocks. A frequent trend in the existing investigations demonstrated the vigorous dependency of the cementation exponent to the total porosity, and various pore types/descriptions such as moldic, interparticle, and non-fabric-selective dissolution (connected) (Borai 1987; Focke and Munn 1987; Watfa and Nurmi 1987; Herrick and Kennedy 1994; Asquith 1997; Ragland 2002; Soleymanzadeh et al. 2018). Accordingly, the prementioned parameters were considered as the inputs of the modeling in this study; thereby, a broad range of databank have been collected from valid and open literature (Chilingarian et al. 1990) for correlating the above-mentioned variables.

For appraising the dependency of the cementation exponent to the input parameters, a sensitivity analysis was implemented by using the subsequent equation (Chok 2010):

$$r=\frac{{\sum }_{i=1}^{n}\left(\overline{{x }_{k}}-{x}_{k. i}\right)\left(\overline{y }-{y}_{i}\right)}{\sqrt{{\sum }_{i=1}^{n}{\left(\overline{{x }_{k}}-{x}_{k. i}\right)}^{2}{\sum }_{i=1}^{n}{\left(\overline{y }-{y}_{i}\right)}^{2}}}$$
(10)

in which, r shows the impact value. This parameter varies from −1 to + 1 for estimating the degree of alteration in cementation exponent with total porosity, and diverse pore types/descriptions including interparticle, moldic, intracrystalline, intercrystalline, fenestral, fracture, non-fabric-selective dissolution (connected), and non-fabric-selective dissolution (isolated). The values of r for −1, 0, and + 1 indicate, respectively, perfectly inverse, the non-relevancy condition, and fully direct relationships. The n and k values characterize the size of databank and the input type, correspondingly. The symbols \(\overline{y }\) and \(\overline{x }\) are the average target and average input values, separately. Figure 4 reveals the outcomes of the above-mentioned analysis for the RBFNN-ACO, GEP Model-I, and GEP Model-II. As apparently shown, the regarded input parameters have a low to high impact on the target estimation, which confirms the mentioned findings in experimental literature.

Fig. 4
figure 4

Outcomes of sensitivity analysis for the developed methods in this study concerning estimation of cementation exponent

As shown, three principal pore types, including moldic, interparticle, and non-fabric-selective dissolution (connected), are the main variables impacting on cementation exponent estimation. Intracrystalline and intracrystalline pore types have the lowest relevancies; therefore, these variables are omitted in both GEP Model-I and II. Though, owing to the stochastic nature of the carbonate rocks, some data may disobey the aforementioned rule of thumb. Consequently, the sophisticated method of cementation exponent estimation turns out to be more challenging.

For achieving the scope of the current investigation, two vigorous self-organizing and heuristics strategies, termed as gene expression programming (GEP) and hybrid algorithm of radial basis function neural network and ant colony optimization (RBFNN-ACO), were employed to thoroughly characterize the cementation exponent of carbonate reservoirs. For this, approximately 30% and 70% of databank are selected haphazardly to create the test and train subsets, correspondingly. The previously mentioned variables that are total porosity (PHIT), and diverse pore types/descriptions including interparticle (BW), moldic (MO), intracrystalline (WI), intercrystalline (IX), fenestral (FN), fracture (FR), non-fabric-selective dissolution (connected, NFS/CN), and non-fabric-selective dissolution (isolated, ISVUG) were considered as the self-governing input parameters:

$$m=f\left({\text{PHIT}},\mathrm{ MO},{\text{BW}},{\text{WI}},{\text{IX}},{\text{FN}},{\text{FR}},{\text{NFS}}/{\text{CN}},{\text{ISVUG}}\right)$$
(11)

In Eq. (11), cementation exponent is shown via symbol m. Based on the results of sensitivity analysis shown in Fig. 4, the effect of some variables including WI, IX, and FN in GEP Model-I, and WI, IX, and FR in GEP Model-II, are neglected due to the very low impact values of the mentioned variables on the output estimation.

Another vital issue is judging the actual data, which are placed in an anomalous distance regarding the bulk of data points. The presence of these abnormal data may be a result of numerous reasons as well as errors in measurements. Hence, recognition of the so-called outliers data is indispensable to evade unreliability and inaccuracy in the proposed methods (Rousseeuw and Leroy 1987; Goodall 1993; Gramatica 2007; Hemmati-Sarapardeh et al. 2016). There are some procedures, for instance, the Williams’ technique for identifying and removing the outliers. The interested researchers are recommended to read the scientific articles for additional explanation concerning the principals of outliers analysis (Rousseeuw and Leroy 1987; Goodall 1993; Gramatica 2007). In present investigation, the leverage technique combined with Williams’ plot is utilized to reveal the outliers. Figure 5 exhibits the leverage plot of the proposed RBFNN-ACO, GEP Model-I, and GEP Model-II, in which the standardized residual (R) of the proposed method is graphically shown against the leverage indices (Hat values).

Fig. 5
figure 5

The William’s plots showing standardized residual versus Hat value for the proposed models in this study: (a) RBFNN-ACO, (b) GEP Model-I, and (c) GEP Model-II

Assigning the points in diverse regions is accompanying with the succeeding diagnoses:

  • Applicability domain: \(H<\widehat{H}\) and \(-3\le R\le 3\) (the rectangular domain)

  • High and good leverage: \(H>\widehat{H}\) and \(-3\le R\le 3\)

  • High and bad leverage (outliers): \(R<-3\) or \(R>3\)

The parameter \(\widehat{H}\) shows the leverage limit which can be formulated in the following way:

$$\widehat{H}=\frac{3f+1}{p}$$
(12)

In Eq. (12), p and f symbols signify the size of databank and the number of input variable, correspondingly.

As it is proved in Fig. 5, about 95.54% of databank for RBFNN-ACO, nearly 94.64% of databank for GEP Model-I, and approximately 94.64% of databank for GEP Model-II, are located inside the valid region. This is another verification concerning the reliability and validity of the established RBFNN-ACO technique over the databank employed here.

In current work, RBFNN-ACO and GEP strategies have been applied to establish three models (i.e., one RBFNN and two GEP models) for estimating cementation exponent. Regarding the low number of data (i.e., 112 data points), we use K-fold cross-validation to ensure the model’s generalization. K-Fold cross-validation is applied through the training step with a k value equal to 6, to ensure our model’s generalization. It means that, for each 6-data unit, five data points are utilized for training and 1 data point is employed for K-fold cross-validation.

Figure 6 demonstrates the flowchart of RBFNN model optimized by the EA (here ACO algorithm). All-inclusive modeling is implemented according to the GEP methodology signified in the previous section. The model implementation is led to development of GEP Model-I and II, illustrated by the resulting mathematical expressions:

Fig. 6
figure 6

A flow diagram for RBFNN model augmented with evolutionary algorithm framework

GEP Model-I:

$${m}_{N}={A}_{1}+{A}_{2}+{A}_{3}$$
(13)

in which,

$${A}_{1}=\left[{a}_{1}+{\left({{\text{NFS}}/{\text{CN}} }_{N}-{{\text{BW}}}_{N}\right)}^{5}\right]\times {a}_{2}$$
(14)
$${A}_{2}=\left\{{\text{Arctan}}\left[{\text{Tan}}h\left({a}_{3}\times {{\text{ISVUG}}}_{N}\right)+{\text{Arctan}}\left({{\text{BW}}}_{N}\right)\right]\right\}\times \left[{\left({a}_{4}+{{\text{PHIT}}}_{N}\right)}^{3}+{{\text{MO}}}_{N}\right]$$
(15)
$${A}_{3}={\text{Arctan}}\left\{\left[{\text{Tan}}h\left({{\text{PHIT}}}_{N}^\frac{3}{5}-{\left({{\text{FR}}}_{N}\times {a}_{5}\right)}^{5}\right)\right]\times {{\text{MO}}}_{N}^{5}\right\}$$
(16)

GEP Model-II:

$${m}_{N}={B}_{1}+{B}_{2}+{B}_{3}$$
(17)

in which,

$${B}_{1}={({{\text{MO}}}_{N}+\frac{{b}_{1}}{{{\text{FN}}}_{N}\times {{\text{MO}}}_{N}-{b}_{2}^{{{\text{BW}}}_{N}}})}^\frac{1}{2}-{{\text{MO}}}_{N}$$
(18)
$${B}_{2}=\frac{{{\text{MO}}}_{N}\times \left({{\text{BW}}}_{N}+{{{\text{MO}}}_{N}}^{2}\times {{\text{PHIT}}}_{N}\right)}{{10}^{{{\text{NFS}}/{\text{CN}}}_{N}}}$$
(19)
$${B}_{3}=({{\text{ISVUG}}}_{N}-{b}_{3}^{{{\text{MO}}}_{N}}\times {{\text{ISVUG}}}_{N}+{{\text{MO}}}_{N}\times {{\text{BW}}}_{N})\times ({{\text{MO}}}_{N}-{{\text{BW}}}_{N})$$
(20)

where the constant of GEP Model-I and II is as follows:

a1 = 3.02957381, a2 = 0.08507157, a3 = −1.27518213, a4 = −0.65192152, a5 = 5.31283985,

and b1 = −0.06641905, b2 = 3.30365390, b3 = 3.54100203.

In Eqs. (13)–(20), the subscript N signifies the normalized parameter between 0 and + 1. As an example, the normalized cementation exponent (mN) can be defined as below:

$${m}_{N}=\frac{m-{m}_{{\text{min}}}}{{m}_{{\text{max}}}-{m}_{{\text{min}}}}$$
(21)

in which, subscripts N, min, and max denote for normalized, minimum, and maximum of the considered parameter for normalization.

In addition, the adjustable coefficients of the GEP mathematical strategy are briefly presented in Table 2. Obviously, the extended GEP strategy incorporates three genes and 30 chromosomes, in which via an addition ( +) operator, the genes connected together. The genes express innumerable operators for instance ± , Exp, × , /, Ln, X2, Pow 10, √, and X3.

Table 2  A brief overview of the adjustable coefficients applied in the GEP mathematical strategy

Validity analysis of the established methods

Statistical quality measures

The validity of the suggested RBFNN-ACO model and empirically derived correlations by GEP strategy have been examined via standard statistical measures in this way:

$${\text{MAD}}=\frac{\sum_{i=1}^{N}\left|{m}_{i}^{{\text{exp}}}-{m}_{i}^{{\text{pred}}}\right|}{N}$$
(22)
$${\text{SD}}=\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}{\left(\frac{{m}_{i}^{{\text{exp}}}-{m}_{i}^{{\text{pred}}}}{{m}_{i}^{{\text{exp}}}}\right)}^{2}}$$
(23)
$${\text{APRD}}\%=\frac{100}{N}\sum_{i=1}^{N}\left(\frac{{m}_{i}^{{\text{exp}}}-{m}_{i}^{{\text{pred}}}}{{m}_{i}^{exp}}\right)$$
(24)
$${\text{AAPRD}}\%=\frac{100}{N}\sum_{i=1}^{N}\left|\frac{{m}_{i}^{{\text{exp}}}-{m}_{i}^{{\text{pred}}}}{{m}_{i}^{{\text{exp}}}}\right|$$
(25)
$${\text{MSE}}=\frac{1}{N}\sum_{i=1}^{N}{\left({m}_{i}^{{\text{exp}}}-{m}_{i}^{{\text{pred}}}\right)}^{2}$$
(26)
$${\text{RMSE}}=\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left({m}_{i}^{{\text{exp}}}-{m}_{i}^{{\text{pred}}}\right)}^{2}}$$
(27)
$${R}^{2}=1-\frac{\sum_{i=1}^{N}{\left({m}_{i}^{{\text{pred}}}-{m}_{i}^{{\text{exp}}}\right)}^{2}}{\sum_{i=1}^{N}{\left({m}_{i}^{{\text{pred}}}-\overline{{m }_{i}^{{\text{exp}}}}\right)}^{2}}$$
(28)

in which, the error parameters R2, SD, MAD, APRD, RMSE, and AAPRD stand for the determination coefficient, standard deviation, mean absolute deviation, average percentage relative deviation, root mean square error, and average absolute percentage relative deviation, respectively. The superscripts exp and pred symbolize the experimentally measured and predicted values of the cementation exponent, correspondingly. Table 3 indicates the parametric statistical values of RBFNN-ACO, GEP Model-I, and GEP Model-II. In accordance with this table, in spite of the hardships accompanying with the estimates of the cementation exponent in carbonate pore system, the proposed RBFNN-ACO, GEP Model-I (i.e., Eq. (13)), and GEP Model-II (i.e., Eq. (17)) take advantage of reasonable and adequate precision/accuracy for anticipating cementation exponent.

Table 3  Assessed statistical quality metrics for the constructed GEP and RBFNN-ACO methods in this study 

Besides, the RBFNN-ACO model exhibits the most accurate and consistent statistical entities. For example, the magnitudes of the parameters R2, AAPRD%, SD, and RMSE for the RBFNN-ACO model are 0.80, 6.28%, 0.09, and 0.17, for the GEP Model-I are 0.79, 6.39%, 0.09, and 0.17, and for the GEP Model-II are 0.71, 7.45%, 0.11, and 0.21, respectively.

Visual analysis

Along with the former parametric analysis, a graphical assessment of the performance of the suggested methodologies is arisen from a number of visualization tools including relative error distribution, index plot for comparing measurements and model estimates, and cross plot. The cross plots of the established RBFNN-ACO, GEP Model-I, and GEP Model-II for the test and train subsets are indicated in Fig. 7. As it is obvious, all the test and train subsets are scattered close to the 45° line (Y = X), which indicates the high coincidence between the experimental data points, and the predicted points by means of RBFNN-ACO and GEP (as heuristic approaches).

Fig. 7
figure 7

The cross plot comparison showing the predicted cementation exponent versus the measured cementation exponent for the developed models: (a) RBFNN-ACO, (b) GEP Model-I, and (c) GEP Model-II

Also, Fig. 8 exhibits the relative deviation of the computed data from the corresponding targets value, which offers more confirmation concerning the accuracy of the established RBFNN-ACO, GEP Model-I, and II. The points positioned adjacent to the zero-horizontal line signify higher match to the corresponding actual data. In accordance with Fig. 8, the bulk of the data points is dispersed along with the zero-horizontal line, which delineates the fact that the proposed methodologies are robust for the estimation of cementation exponent in heterogeneous carbonate pore systems. It should be declared that the RBFNN-ACO model presents the most accurate cementation exponent estimates for the reason that the estimated values are exceedingly collected nearby the Y = X or 45° line in Fig. 7 and zero-horizontal line in Fig. 8. Furthermore, the range of error for RBFNN-ACO, GEP Model-I, and GEP Model-II are principally about −20 to + 20%, −30 to + 20%, and −40 to + 25%, correspondingly.

Fig. 8
figure 8

The error distribution diagram showing the percentage of relative deviation versus the measured cementation exponent considering training and test subsets for the developed models: (a) RBFNN-ACO, (b) GEP Model-I, and (c) GEP Model-II

Figure 9 illustrates the other standard plot for examining the accuracy of the RBFNN-ACO and GEP methods. For the proposed models, the test and train subsets have an acceptable match with the actual dataset. This is an alternative verification concerning the capability of the developed methods here for forecasting the cementation exponent in the carbonate and heterogeneous pore systems. As observed, the RBFNN-ACO model demonstrates the most consistent forecasts than the GEP Model-I and II as compared with the actual data points.

Fig. 9
figure 9

Index plot assessment showing the predicted and measured cementation exponent versus data index for the developed models: (a) RBFNN-ACO, (b) GEP Model-I, and (c) GEP Model-II

The profits of the RBFNN-ACO and GEP Model-I are not restricted to the greater precision and the lack of overfitting problems. RBFNN-ACO could be simply applied to other geographical locations straightforwardly. With the new data obtained in future, the RBFNN-ACO could also be easily updated. For larger databank, it is sufficient to tune the coefficients of the GEP Model-I.

Comparing with existing methods

The performance of the proposed models in this study is evaluated through comprehensive comparison with available literature models.

The reliability of the proposed models against literature correlations is assessed via diverse statistical error parameters such as MAD, SD, APRD%, AAPRD%, and RMSE, which are inserted in Table 4. Based on Table 4, the following order with respect to accuracy (i.e., AAPRD) is established among the proposed models and literature published correlations:

Table 4 Comparing statistical parameter values for newly suggested approaches versus conventional correlations over the entire dataset

RBFNN-ACO > GEP Model-I > GEP Model-II > Borai’s Eqn. (Borai 1987) > Nugent’s Eqn. (Ragland 2002) > Shell Eqn. (Neustaedter 1968; Watfa and Nurmi 1987) > Asquith’s Eqn. (Asquith 1997) > Focke’s and Munn’s Eqn. (Focke and Munn 1987).

Thereby, RBFNN-ACO gives the highest accuracy for estimating cementation exponent in carbonate pore systems. Among the published literature correlation, performs better with respect to the other literature correlations.

Using cross plot tool for comparison analysis, the deviation of the cementation exponent data computed via diverse methodologies from the actual points is displayed in Fig. 10. The occurrence of the data pertinent to the established RBFNN-ACO, GEP Model-I, and GEP Model-II close to the Y = X or 45° line indicates the supremacy of these approaches with respect to other published correlations. Nevertheless, there are inconsistencies in the case of the literature models, which present methodical deviation in calculating cementation exponent.

Fig. 10
figure 10

Cross plot assessment showing the predicted cementation exponent versus the measured cementation exponent in order to compare the proposed RBFNN-ACO and GEP models with the current correlations in open literature

The same consequences are also attained by Fig. 11, which specifies the distribution of relative error for each model from the actual data. The data points relevant to the RBFNN-ACO, GEP Model-I, and GEP Model-II methods are dispersed adjacent to the zero-horizontal line, which discloses the better results of the suggested models here. The main reasons for the inaccuracy involved in the estimates of Asquith’s Eqn. (Asquith 1997) and Focke’s and Munn’s Eqn. (Focke and Munn 1987) are their development based on restricted database, limited range of parameters, low impact value of total porosity on cementation exponent, and ignoring main pore types in their model development.

Fig. 11
figure 11

Error distribution diagram showing the distribution of the relative deviation versus the measured cementation exponent in order to compare the proposed RBFNN-ACO and GEP models with the current correlations in open literature

The cumulative frequency diagram of Fig. 12 describes one more graphical technique for checking the advised models in this study as compared to the commonly used literature correlation over the whole databank employed in this study. As shown, for absolute percentage relative deviation equal to 20%, the 96.5% of RBFNN-ACO estimates, 95.5% of GEP Model-I estimates, 92.0% of GEP Model-II estimates, 78.6% of Borai’s Eqn. [1] estimates, 70.5% of Nugent’s Eqn. [4] estimates, 64.3% of Shell Eqn. [2, 3] estimates, 42.9% of Asquith’s Eqn. [7] estimates, and 41.9% of Focke’s and Munn’s Eqn. [11] estimates, have absolute percentage relative deviations equal or less than 20%.

Fig. 12
figure 12

The cumulative frequency analysis showing cumulative frequency versus absolute percentage relative deviation for the established RBFNN-ACO and GEP models as compared with the available models in literature

Consequently, the suggested models in the current work give the best results in comparison with the literature correlation. This means that for a typical value of the absolute percentage relative deviation, the higher the cumulative frequency leads to the greater robustness of the model.

Figure 13 indicates the actual cementation exponent versus total porosity as compared to the proposed models here and traditional literature correlations for tight carbonate samples (total porosity less than 10%) throughout the deep formations. It is obvious that RBFNN-ACO gives the highest match with the measured m values.

Fig. 13
figure 13

Comparison analysis showing the variation of the cementation exponent versus the total porosity to compare the proposed RBFNN-ACO and GEP models with the literature correlations for tight carbonate samples (total porosity less than 10%) through the deep formation

Figure 14 reveals the alteration of measured m values against moldic porosity in comparison with the proposed and traditional models for tight and deep carbonate sample. As can be seen, the proposed RBFNN-ACO, GEP Model-I, and GEP Model-II are highly suitable for estimating cementation exponent in tight and deep carbonate pore systems.

Fig. 14
figure 14

Comparison analysis showing the variation of the cementation exponent versus the moldic porosity to compare the proposed RBFNN-ACO and GEP models with the literature correlations for tight carbonate samples (total porosity less than 10%) through the deep formations

This demonstrates another advantage of the proposed models in this study over the traditional literature correlations. The main challenge of existing literature models is the prediction of cementation exponent in tight and deep carbonate, in which this issue has been tackled properly by the RBFNN-ACO algorithm and GEP mathematical strategy.

At long last, the alteration of the absolute relative deviation (ARD) percentage values over the wide ranges of the moldic porosity and non-fabric-selective dissolution, connected is appraised for all studied models here. The outcomes of the analysis are outlined in the 2D heatmap diagrams of Fig. 15. The dark red and dark blue colors show the ARD values of smaller than 5% and greater than 50%, correspondingly. Other colors deal with ARD values between 5 and 50%. As shown, the main regions of the heatmaps sketched for RBFNN-ACO, GEP Model-I, and GEP Model-II are homogeneously enclosed by dark red shading [Fig. 15a–c].

Fig. 15
figure 15figure 15figure 15

2D heatmap diagrams showing the variation of absolute relative deviation (ARD) Percentage with moldic porosity and non-fabric-selective dissolution (connected) porosity for the different models including (a) RBFNN-ACO, (b) GEP Model-I, (c) GEP Model-II, (d) Borai’s Eqn. (Borai 1987), (e) Nugent’s Eqn. (Ragland 2002), (f) Shell Eqn. (Neustaedter 1968; Watfa and Nurmi 1987), (g) Asquith’s Eqn. (Asquith 1997), and (h) Focke’s and Munn’s Eqn. (Focke and Munn 1987)

Similarly, the main percentage of the Focke’s and Munn’s Eqn. (Focke and Munn 1987) heatmap attribute to the yellow, light, and dark blue colors leading to the worst performance of the this correlation.

Conclusions

The current investigation is commenced to characterize the cementation exponent of heterogeneous carbonate pore systems by applying two self-organizing and heuristic approaches known as radial basis function neural network optimized with ant colony optimization (RBFNN-ACO) and gene expression programming (GEP) for the first time through the open literature. Accordingly, the key outcomes of the present study are as follows:

  1. 1.

    The RBFNN-ACO model presents the lowest error for estimating cementation exponent with the R2 and AAPRD% values of 0.80 and 6.28% in carbonate and heterogeneous pore systems, respectively.

  2. 2.

    Among the examined literature correlations, Borai’s Eqn. with AAPRD = 12.3% and Focke and Munn’s Eqn. with AAPRD = 31.3% demonstrate the best and the least performance, respectively.

  3. 3.

    Therefore, the superiority of RBFNN-ACO and GEP models over literature correlation can be proven with respect to the various statistical quality measures presented in this study.

  4. 4.

    In tight (i.e., total porosity less than 10%) and deep carbonates, the RBFNN-ACO has more satisfactory match with actual data than the Borai’s Eqn.

  5. 5.

    Moldic porosity with + 70% impact value, non-fabric-selective dissolution (connected) porosity with −30%, and interparticle porosity with −23% have the highest effect on cementation exponent.

  6. 6.

    The so-called outliers detection, namely Williams’ plot, indicates that about 95% of the databank and RBFNN-ACO estimates are trustful and located in the applicability box.

  7. 7.

    In conclusion, it can be suggested that the established robust approaches introduced in current work are critically vital for examining porous media flow in carbonate reservoir simulations.