Prediction of permeability of highly heterogeneous hydrocarbon reservoir from conventional petrophysical logs using optimized data-driven algorithms

Permeability is an important parameter in the petrophysical study of a reservoir and serves as a key tool in the development of an oilfield. This is while its prediction, especially in carbonate reservoirs with their relatively lower levels of permeability compared to sandstone reservoirs, is a complicated task as it has larger contributions from heterogeneously distributed vugs and fractures. In this respect, the present research uses the data from two wells (well A for modeling and well B for assessing the generalizability of the developed models) drilled into a carbonate reservoir to estimate the permeability using composite formulations based on least square support vector machine (LSSVM) and multilayer extreme learning machine (MELM) coupled with the so-called cuckoo optimization algorithm (COA), particle swarm optimization (PSO), and genetic algorithm (GA). We further used simple forms of convolutional neural network (CNN) and LSSVM for the sake of comparison. To this end, firstly, the Tukey method was applied to identify and remove the outliers from modeling data. In the next step, the second version of the nondominated sorting genetic algorithm (NSGA-II) was applied to the training data (70% of the entire dataset, selected randomly) to select an optimal group of features that most affect the permeability. The results indicated that although including more input parameters in the modeling added to the resultant coefficient of determination (R2) while reducing the error successively, yet the slope of the latter reduction got much slow as the number of input parameters exceeded 4. In this respect, petrophysical logs of P-wave travel time, bulk density, neutron porosity, and formation resistivity were identified as the most effective parameters for estimating the permeability. Evaluation of the results of permeability modeling based on root-mean-square error (RMSE) and R2 shed light on the MELM-COA as the best-performing model in the training and testing stages, as indicated by (RMSE = 0.5600 mD, R2 = 0.9931) and (RMSE = 0.6019 mD, R2 = 0.9919), respectively. The generalizability assessment conducted on the prediction of permeability in well B confirmed the MELM-COA can provide reliable permeability predictions by achieving an RMSE of 0.9219 mD. Consequently, the mentioned methodology is strongly recommended for predicting the permeability with high accuracy in similar depth intervals at other wells in the same field should the required dataset be available.


Introduction
As a property of a porous medium, permeability indicates the capability of the medium to host a fluid flow. For a reservoir, a higher permeability means a higher production rate of the hydrocarbon flow from the reservoir. Therefore, a knowledge of the variations of this parameter across the reservoir zone is key to determine the optimal location of production wells and select the optimum path for the wells. Permeability is a function of porosity and pore geometry and structure, and since the pore geometry is determined by the effective confining stress, the permeability may change, especially in the near-well zone, as the production from the reservoir continues. Indeed, as the reservoir produces, the effective stress increases, thereby making the reservoir rock compacter and less porous, ending up with lower levels of permeability. When it comes to injection process, the effective stress decreases and the opposite occurs to the permeability. The complexity of studying the hydraulic behavior of the reservoir doubles for the carbonate reservoirs where the heterogeneity is relatively high due to the presence of vuggy pores developed upon dissolution of the constituent minerals (Bust et al. 2011), and this adds to the difficulty of accurate prediction of permeability and presentation of a highly generalizable model for such reservoirs. There are various methods for evaluating the permeability of a hydrocarbon reservoir. These can be classified under either of three broad classes, namely core studies, image analysis at different rock scales (X-ray imagery, focused ion beam imagery, and optical microscopy), and laboratory experiments (routine core analysis (RCAL)/ special core analysis (SCAL)) (Al-Bazzaz and Al-Mehanna 2007; Ganat 2020; Alyafei 2021). These techniques are time intensive and costly as they need imaging efforts and laboratory experiments. Nevertheless, their results represent only a limited number of points along the well. Well-testing data have been used for obtaining average reservoir permeability (Mohaghegh et al. 1995). Petrophysical assessment of acquired well logs like nuclear magnetic resonance (NMR) and dipole shear sonic imager (DSI) has been tried to obtain reservoir permeability (Babadagli and Al-Salmi 2004). Although this latter method produces continuous logs of permeability along the studied interval, it needs time and money to perform the required analyses. Therefore, researchers have tried to relate the permeability to other directly measured parameters. Two classes of methods have been developed on this basis. These include physical-basis and machine learning methods.
In general, the main physics-based models used for predicting the permeability include index model, Kozeny-Carman model, Timur model, and Herron model (Male et al. 2020;Huang et al. 2021). In these models, the permeability is primarily seen as a function of effective porosity, with the function developed using the results of experimental works on a set of cores. These methods suffer from serious limitations. Some of the main limitations include their inability to capture the effect of complex nonlinear stress-strain relations, failure to consider elastoplastic behavior of the rock under injection/withdrawal conditions, inadequacy for unconventional reservoirs, and difficulty in determining particular parameters in these reservoirs, and limited capability for predicting the permeability in carbonate reservoirs due to their heterogeneous particle size distribution (Huang et al. 2021).
Depending on the nature of the method used to assess rock permeability, different parameters can be used as input variables to a machine learning model (MLM). In laboratory core studies, attempts have been made to use porosity, specific surface area of the pore, and residual water saturation for estimating the absolute permeability (Gholinezhad and Masihi 2012;Mahdaviara et al. 2020b;Kamali et al. 2022). This has been while field studies have used petrophysical logs for modeling the permeability. In their research, Mulashani et al. (Mulashani et al. 2022) used the gamma ray (GR), density (RHOB), effective porosity, shale volume, and neutron porosity (NPHI) logs for estimating the permeability. In another piece of work, P-wave travel time (DTCO), NPHI, formation resistivity (RT), GR, and RHOB logs were utilized to predict the permeability (Farouk et al. 2021). Otchere et al. (2021) based their work on the logs of caliper, DTCO, GR, NPHI, photoelectric (PEF), RHOB, resistivity log (RT) as well as rate of penetration (ROP) to model the permeability. Zhange et al.(2021) combine the petrophysical method of rok typing (flow zone indicator (FZI) or FZI*) for estimating the permeability from DTCO, RHOB, and GR logs.
A review of the studies performed on the permeability modeling shows that the attempt for formulating models for predicting the permeability with higher accuracy and generalizability is still an active field of research since no single universal model has been achieved yet. Accordingly, application of new modeling techniques, especially in the field of artificial intelligence, with its proven superiority to empirical models, has been seen as a priority. In this respect, we used multilayer ELM (MELM) for modeling the permeability based on petrophysical logs. The MELM not only offers the benefits of the ELM, including its high convergence rate (Huang 2015) thanks to the absence of an iteration phase for optimizing the weights and biases (Huang et al. , 2011Ebtehaj et al. 2016), simple learning process (Zhang et al. 1 3 2016), improved efficiency Ding et al. 2014), good accuracy and high performance (Huang 2015;Huang et al. 2015), and limited need to optimization, as compared to the LSSVM (Zhang et al. 2016), but also covers the weakness of the ELM in handling big data by devising multiple hidden layers Zhang et al. 2016). On the other hand, due to the random initialization of weights and biases in the ELM, it is susceptible to getting stuck in local optima (Cao et al. 2012). Since the same limitation is suffered by the MELM, the present research hybridizes the MELM with a couple of optimization algorithms (cuckoo optimization algorithm (COA), PSO, and GA) to come up with universal optimum. Based on the literature review performed (see Table 1), four major differences between the present study and previously published studies can be distinguished: (1) as it can be seen from Table 1, CNN and MELM algorithms have not so far been applied in previous works for the prediction of permeability, and the improved version of the SVM, LSSVM, as a predictor algorithm was used and its prediction performance was compared with those of the MELM and CNN models; (2) the simple forms of learning algorithms have often been used to predict permeability while it has been shown in previous studies that the hybrid of optimization and predictor algorithms has led to improved results; (3) the input parameters used in the present study, unlike some previous studies, are among the most common logs which practically are recorded in any well. Therefore, the proposed models of the present study can be readily used; 4) most of the models developed in previous studies were based on the results of laboratory tests of permeability measurements on core samples. The core samples taken after being released from this stress will expand;so the permeability measured in the laboratory is not fully representative of rock permeability under reservoir conditions. In addition, the core samples used in the laboratory are basically free of natural fractures and vugs. Therefore, the models developed based on core permeability measurements typically are associated with some degree of uncertainty, especially in carbonate reservoirs that have high heterogeneity.

Research methodology
Workflow of permeability prediction methodology Figure 1 diagrammatically illustrates the methodology applied, in the present research for modeling the permeability. This methodology contains eight steps which are briefly explained in the following: Step 1: The methodology starts by compiling two banks of data, one for the modeling and testing of the permeability prediction (well A data) and one for evaluation and  Matinkia et al. (2022b) verification of the prediction performance of the best-performing model developed in prediction permeability. The compiled databanks for this study contained eight petrophysical parameters recorded versus depth, namely depth, caliper (HCAL), spectral gamma ray (CGR), photoelectric (PEF), neutron porosity (NPHI), density (RHOB), formation resistivity (RT), and compressive wave travel time (DTCO).
Step 2: In the second step of the methodology, a data cleaning is performed to identify and eliminate the outliers in the well A dataset using the so-called Tukey method. In the following, the procedure applied and the necessity of data preprocessing (cleaning) are explained.
A high-accuracy high-generalizability model cannot be achieved unless a powerful modeling algorithm coupled with high-quality data is available. This implies that a preprocessing step must be taken on the input data before the actual modeling of permeability. This begins by evaluating the data quality and attenuating the effect of noise on the data. Noise attenuation techniques can be broadly categorized into two classes, namely noise suppression methods and outlier elimination techniques. In the former techniques, the count of data points remains the same before and after the process. Proper implementation of such a technique requires identification of the share of noise in the measurements, which is a function of measurement environment conditions, measurement device quality, and the operator's experience. In the latter approaches, the data points that significantly diverge from the rest of the data points are identified as outliers and hence eliminated from the dataset. This means that the number of data points becomes smaller upon applying such a technique. Since the application of the first approach required so more information that was unavailable in this study, we used the so-called Tukey method to identify and eliminate the outliers in this study. To this end, one must calculate the first and third quartiles (Q 1 and Q 3 , respectively) for the studied parameters and proceed to evaluate the inter-quartile range (IQR) via Eq. (1). In this equation, the value of k is determined according to the data collection conditions and the importance of the problem in question. The smaller the value of k is, the more conservative the collected data is, and the larger amount of data is considered as outlier data. The value of this coefficient is considered equal to 1.3 based on the evaluation of the data and also the sensitivity analysis on it (refer to the appendix). Next, LIF 3 and UIF 4 can be obtained by adding and subtracting the IQR to and from Q 1 and Q 3 , respectively, as expressed in Eqs. (2, 3), respectively. Finally, the data points corresponding to values below LIF or above UIF are flagged as outliers and hence omitted from the rest of the analysis. Step 3: In the third step, the cleaned databank of well A was randomly divided into two sets of data, namely a training set (i.e., these data are for training the permeability prediction models considered) that included 70% of data records of well A databank and a testing set (i.e., the data records used as unseen data for testing the trained permeability prediction models) that contained 30% of data records of well A databank. Since the hyperparameters are initialized on a random basis and the division of available data into training and testing subsets are performed randomly, this algorithm leads to a different solution at each iteration. Accordingly, seeking to reduce the effect of these random phenomena on the final outcome, we used the so-called tenfold cross-validation (10FCV). This algorithm divides the original dataset into 10 clusters containing an identical number of data points. It then runs the estimator algorithm 10 times, with a different cluster taken as testing data and the remaining 9 clusters taken as training data in each run. As a final output, the algorithm reports an average error over the 10 runs.
Step 4: In the fourth step resolving the effect of the variable scale of the data, which ultimately can affect negatively the weights and biases as well as the feature selection results, all the data are normalized before conducting the feature selection. For this purpose, minimum and maximum values (Zmin and Zmax, respectively) of each feature (Z) are calculated, and then, normalization is practiced by Eq. (4) to come up with normalized values in the range of [-1, 1].
Step 5: As a second step in the preprocessing of the input data, the selection of a subset of the most effective input parameters for the modeling can significantly reduce the processing time compared to a case where all features are taken as input to an estimator model. In addition, the use of lessnoisy features translates into a higher accuracy in the modeling process. Accordingly, the redundant features can be eliminated from the rest of the analysis. Therefore, following the data normalization, a feature selection technique is used, in the fifth stage, to identify and choose the most effective set of features to be passed to the permeability modeling.
Numerous methods have been presented for this purpose, which can broadly be classified under three classes filters, wrapper methods, and embedded techniques. With a filter, feature selection is based on a statistical index, like the correlation coefficient. This is while a wrapper method undertakes the modeling with different sets of features and then compares the results to choose the best subset of features. This makes the wrapper methods generally slower than the filters. An embedded approach uses a mechanism that reflects a combination of filters with wrapper techniques. Osman et al. (2018) showed that the wrapper techniques are generally more accurate than the filters. Therefore, in the present research, we opted for a combination of MLP with the nondominated sorting genetic algorithm II (NSGA-II) to come up with a wrapper procedure for feature selection.
In this hybrid algorithm, minimization of the number of selected features and the root-mean-square error (RMSE) of the permeability estimator model were as set as objectives to the NSGA-II. This algorithm started by randomly selecting a set of features and taking them as input to the MLP. The MLP algorithm then triggered a training and testing process to feedback to the NSGA-II on the error of modeling with the selected set of features. Adopting the crossover and mutation operators in successive iterations of the NSGA-II, this algorithm introduces a new set of features into the MLP that leads to a lower error than that in the previous iteration. This iterative process continues until a predefined stopping criterion is met. Finally, hybrid MLP-NSGA-II provides the user with a set of solutions from which the user can select the most appropriate one considering the problem conditions and constraints. In order to avoid repeating the relevant explanations on these algorithms, readers are recommended to refer to the previously published research studies (Deb et al. 2002;Yegnanarayana 2009;Yusoff et al. 2011;Anemangely et al. 2017Anemangely et al. , 2019Matinkia et al. 2022a).
Step 6: In this step, as predictor models, MELM and LSSVM hybridized with metaheuristic algorithms of COA, PSO, and GA were applied to the training data. In addition to the mentioned hybrid methods, we adopted straightforward LSSVM and convolutional neural network (CNN) methods based on the training data. To this end, the most appropriate structure of MELM and the most efficient kernel function were first determined. In total, eight models were trained to accurately predict absolute permeability based on conventional well-logging data. The hybridization process of the predictor and optimizer algorithms is described in details in Sect. 2.2. In addition, a brief introduction to these algorithms' theoretical description is given in Appendices 2, 3, 4.
Step 7. In this step, the trained models were evaluated on the testing data of well A in terms of the accuracy and generalizability in permeability prediction. To this end, five statistical error metrics were applied (Sect. 2.3). Comparing the error achieved by the eight permeability prediction models developed, the best model (i.e., the model which provided the lowest and highest values of RMSE and R2, respectively) was identified.
Step 8. Since the generalizability of intelligent predictive models determines their potential practical applicability, the accuracy and generalizability of the best-performing permeability prediction model identified were precisely assessed on the new unseen databank from another well, well B, drilled in the same field as well A, applying the error metrics presented in Sect. 2.3.

Hybridization of predictor and optimizer algorithms
Not only the proper selection of an architecture for MELM and a kernel function for LSSVM, but also the optimal setting of hyperparameters of these algorithms play a crucial role in the accuracy and generalizability of the developed model. Application of metaheuristic optimization models instead of classic competitors has proved the better accuracy of the metaheuristics than the classic algorithms (ÖZKA-RACA 2018). Therefore, once finished with determining a proper structure for MELM and a suitable kernel function for LSSVM, we herein used COA, PSO, and GA for determining optimal values of hyperparameters. For this purpose, hyperparameters of the predictor algorithms were defined as decision variables to the optimizer algorithms. On this basis, at each iteration, the optimizer algorithm introduces a set of values of hyperparameters for the MELM and LSSVM. The error of modeling with these values on the training data was then feedbacked to the optimizer algorithm, so that the optimizer could determine the hyperparameters in such a way to reduce the modeling error at the next iteration. This iterative process continued until the stopping criterion (usually in the form of achieving a maximum number of iterations) was satisfied. The MELM and LSSVM models that were obtained upon the final iteration of the optimizer algorithm were applied to the testing data. This process is schematically shown in Fig. 2.

Validation of models
The trained models were validated by applying them to unseen data (30% of the entire data) and analyzing the modeling error and R 2 . To this end, percent deviation (PD) at each data point (i) was calculated from the measured ( Perm measured ) and predicted ( Perm Predicted ) values of permeability via Eq. (20). The value of PD was then used to calculate average percent deviation (APD) and average absolute percent deviation (AAPD) for each model using Eqs. (21, 22), respectively. Using the error (Er) at each data point and mean error (Er mean ) over all data points (n), standard deviation (SD) was obtained using Eq. (23). The so-called RMSE and R 2 were also obtained from Eqs. (5, 10), respectively. It is worth noting that these modeling validation criteria were further used in the training stage.

Studied data
In this study, we used full-set log data including depth, caliper (HCAL), spectral gamma ray (CGR), photoelectric (PEF), neutron porosity (NPHI), density (RHOB), formation resistivity (RT), and compressive wave travel time (DTCO). The logs were acquired against the Fahlian Formation at two wells in a hydrocarbon field in southwestern Iran. Due to the confidentiality of the information, one of the wells was named A, and the other well was named B. Over the studied depth intervals, a total of 1530 and 560 data points were acquired at a spacing of 0.1524 m for wells A and B, respectively. Given the high level of Fahlian Formation, which is composed of limestone, formation microimager (FMI) data were further used for estimating the permeability. The obtained permeability was then corrected for water saturation and calibrated against laboratory data from core analyses.
Figures 3, 4 demonstrate variations of the petrophysical logs and permeability along the studied depth range in the Fahlian Formation for wells A and B, respectively. Figure 3 demonstrates variations of the petrophysical logs and permeability along the studied well in the Fahlian Formation. As seen in this figure, in the depth intervals where washouts are evident (indicated by a sudden increase in the caliper readings, e.g., consider the 4400-4420 m depth interval), the petrophysical log readings are strongly affected. This indicates the necessity of a preprocessing step for noise attenuation of the data. Tables 2, 3 report five statistical indicators for the data collected from the Fahlian Formation at the studied wells A and B. Since the number of data in well A is more than that of well B, and also the range of changes of parameters' values in well A is wider than that of well B and covers the range of values of well B, well A is used as modeling data (training and testing) and well B was selected to validate the generalizability of the trained models.

Results and discussion
In this section, first, the development and evaluation of the permeability prediction models are discussed and followed by the evaluation of the generalization of the models proposed in the validation well (well B). Finally, to assess how the detection and elimination of the outlier with the Tukey method enhances the permeability prediction performance  1 3 of the proposed model, the best-performing algorithm will be trained using uncleaned primary data of well A (data containing outliers), and then obtained model will be applied to the data of well B to predict permeability.

Developing and evaluating permeability prediction models
Firstly, the data were investigated by the Tukey method to address the outliers. Accordingly, we ended up eliminating 261 data points that were recognized as outliers. Following the outlier elimination stage, the relationship between the input parameters and the target parameter was conducted based on the heat map of the correlation matrix (Fig. 5). As seen in Fig. 5, the DTCO and NPHI were found to be most directly correlated to permeability. Although an increase in PEF and depth led to higher permeability values, the correlation coefficient was smaller than the DTCO and NPHI. At the other end of the spectrum, RHOB exhibited the strongest inverse correlation to the permeability, as compared to other petrophysical logs over the studied depth interval. Although higher values of RT, SGR, and HCAL were also associated with lower permeability, the correlation was much stronger for the RHOB.
In order to find the best set of features using MLP-NSGA-II, one should begin by finding the proper architecture (numbers of layers and neurons per layer) for the MLP. This was practiced by trial and error. That is, several models were built with different numbers of layers (from 1 to 3) and an equal number of neurons per layer considering the entire set of available petrophysical logs as input. The results of validation with this method indicated that an MLP with two layers provides higher accuracy than those with either one or three layers. Therefore, a sensitivity analysis was conducted on the two-layer MLP to find the optimal number of neurons in each layer. For this purpose, taking all petrophysical logs as input, permeability was modeled with the help of two-layer MLPs with different numbers of neurons (from 1 to 8). Noteworthily, 10FCV was devised to achieve reliable results. Figure 6 shows the modeling error for different models in the form of a contour map. The figure suggests that the value of RMSE decreases with increasing the number of neurons per layer. From some point on, however, any further increase in the number of neurons per layer imposes an opposite effect on the RMSE, making the RMSE increasing. The lowest error was obtained for the model with 5 and 4 neurons in the first and second layers, respectively. Accordingly, the MLP with this structure was combined with NSGA-II to come up with a hybrid model for feature selection.
After optimizing the MLP structure, one should further optimize the values of the control parameters of the NSGA-II in such a way to enable achieving a high accuracy level. For this, the MLP-NSGA-II was devised with different values of mutation and crossover rates. Here again, the 10FCV was used to split the entire dataset into training and testing subsets in such a way to ensure reliable results. Based on the results of these analyses, the best values for the mutation and crossover rates were found to be 0.04 and 0.52, respectively. Moreover, the population size and the maximum number of iterations for the NSGA-II were set to 90 and 70, respectively. Figure 7 depicts variations of RMSE and R 2 when applying the MLP-NSGA-II algorithm to the training data for modeling the permeability. As the figure suggests, the R 2 increased abruptly with increasing the number of input features from 1 to 3, while the increase in R 2 was subtle as the number of input features increased further from 3 to 6. For models with 6 or more input features, the increase in the R 2 was pretty much negligible. It is also observed that the rate of reduction in error was relatively high as the number of input features increased from 1 to 4, while the reduction in error was insignificant as the number of input features exceeded 4. Knowing that the R 2 is susceptible to be affected by outliers, we herein opted for RMSE as the final criterion for selecting the optimum number of inputs. In this way, 4 input features were used for modeling the permeability. Table 4 presents the best combinations of features for various counts of input variables to the MLP-NSGA-II. Given that the previous step showed that the optimum number of input features was 4, the rest of the modeling process was done with DTCO, NPHI, RHOB, and RT as input features.
In order to apply the hybridized forms of MELM on the data of the selected features, one should begin by determining the optimal structure (i.e., counts of hidden layers and neurons in each layer) for the MELM to ensure achieving a model with the lowest prediction error. This was herein done by a trial-and-error procedure. Application of this procedure  showed that a structure with three hidden layers containing 6, 5, and 7 neurons in the first to third layers, respectively, could end up with the lowest errors. In this way, the biases and weights of the MELM with this structure were optimized by means of the metaheuristic optimization algorithms. Prior to applying the hybrid LSSVM-based algorithms on the data of the selected features, one had to determine the most appropriate kernel for the LSSVM. To this end, the LSSVM algorithm was applied to the training data with different kernels (i.e., linear (Lin), polynomial (Poly), MLP, and RBF kernels). Figure 8 shows a bar plot demonstrating the obtained values of RMSE upon applying the LSSVM algorithms with different kernel functions. This figure suggests that the LSSVM with RBF kernel produced the smallest levels of error. Therefore, we used this kernel function for modeling the permeability with the LSSVM algorithm. Now, not only the parameter but also 2 (variance of RBF kernel) should be optimized using the metaheuristic optimizers considered. These adjustable parameters control the generalizability, learning power, and predictive power of the formulated LSSVM.
Once the optimal structure of MELM and proper kernel function of LSSVM were determined, one should further optimize the adjustable parameters of COA, PSO, and GA considering the problem at hand before hybridizing them with the MELM and LSSVM. Once again, a trial-and-error procedure was adopted to set the mentioned parameters. The optimal values of the parameters of the mentioned algorithms are reported in Table 5. In the relevant analyses, the optimization algorithms converged to a solution in less than 160 iterations on average. Given this finding, we limited the number of iterations to 200. Figure 9 shows the variations of RMSE at different iterations of the hybrid MELM and LSSVM algorithms for modeling the permeability using the training data. As the figure suggests, except for the LSSVM-GA, variations of error are very limited at iterations beyond the 100 th iteration. On average, the convergence rate was higher with COA rather than the other two optimization algorithms. Focusing on the MELM-based hybrid model, MELM-COA produced the most accurate results. Among the LSSVM-based hybrid models, however, the highest accuracy levels were obtained with the LSSVM-COA. As a general conclusion, it can be  stipulated that the MELM-based hybrid models provided for smaller errors than the LSSVM-based hybrids as far as permeability prediction from the training data was concerned, with the MELM-COA generating the highest accuracy. Shown in Fig. 10 are the cross-plots of measured and predicted permeabilities based on the training data using the hybrid and straightforward schemes of LSSVM and CNN algorithms. As this figure suggests, for all algorithms, the best-fit line to the data points closely coincides with the Y = T line, indicating that the developed models are, on average, free of overestimation or underestimation problems. Despite the presence of small deviations between the two lines at low or high values of permeability, especially for the CNN model, these deviations are pretty subtle and negligible. Nevertheless, the two lines coincide well for the MELM-COA model. This latter model exhibits generally less dispersion in the data points, i.e., the data points are well concentrated around the Y = T line. Dispersion of the data points was higher with the simple and hybrid forms of LSSVM rather than the MELM-based hybrid models. This suggests that the LSSVM-based models were generally less accurate than the MELM-based models. Table 6 shows quantitative values of error and R 2 for comparing the hybrid models against the simple LSSVM and CNN. This table indicates that the MELM-COA model is of higher accuracy than the other models. Moreover, hybrid forms of LSSVM showed smaller RMSE values than the simple LSSVM, reflecting the high power of metaheuristics for finding and optimizing the hyperparameters. Figure 11 shows the cross-plots of measured and predicted permeabilities based on the testing data using the hybrid LSSVM and MELM as well as straightforward forms of CNN and LSSVM. As this figure suggests, the dispersion of the points around the Y = T line is less with the MELM-COA and CNN models, as compared to other models. This indicates the superior accuracy of these two models compared to other models. A more accurate comparison between these two models shows that the underestimation of lower values of permeability is more pronounced with the CNN model rather than the MELM-COA model. This highlights that, even in the testing stage, the MELM-COA model tends to produce smaller errors than the other models. Quantitative evaluation of error and R 2 in the testing stage is presented in Table 7, further confirming the higher accuracy of the MELM-COA rather than the other models. This indicates the wider generalizability of this model compared to the rest of the studied models. Figure 12 shows the histogram of modeling errors achieved for various models upon their application to the testing data. The figure further depicts a fitted normal distribution function. In this figure, it is clear that MELM-COA produces the lowest average error ( ), as compared to other models. The lower standard deviation of this model compared to the other models indicates more limited dispersion of data points around the mean value. Knowing that the closer the best-fit normal distribution function to error to a standard normal distribution, the better the performance of the corresponding model, it can be stipulated that the MELM-COA model can overperform the other models for similar depth intervals along another well in the same field.
Generally, the hybrid forms of the MELM algorithm have a lower error value than the hybrid forms of the LSSVM, separately for each of the metaheuristic optimization algorithms. Considering the sensitivity of LSSVM to outliers and that the MELM possesses superior characteristics compared to LSSVM (see the advantages/disadvantages of the predictor algorithms in Appendix 4), such a better prediction outcome was expected to be presented by the hybrid forms of MELM. Comparing the results of the models based on the optimization algorithms shows that the error of the COA is generally lower than the other two algorithms. This result is consistent with the poor performance of GA and PSO in local search (Zhuo and Yu 2019;El-Mihoub et al. 2021) and maintaining a balance between the local and global search of the COA (Mareli and Twala 2018). Figure 13 shows the comparison of the measured and predicted values of permeability using the MELM-COA model on all the data of well A (in the presence of outliers). As can be seen in Fig. 13, the MELM-COA model precisely predicted the permeability changes trend along the depth interval studied in well A. However, the model underestimated the eliminated high permeability values and overestimated the low permeability values detected and eliminated as outliers in preprocessing stage. The RMSE and coefficient of determination values for permeability prediction by applying the MELM model developed with clean data of well A (without outliers) were obtained to be equal to 1.9525 and 0.9921, respectively.

Validation of the trained models
To evaluate the generalizability of the models, the trained models were applied to the data of well B to estimate the permeability. In Table 8, the comparison of the error values and the coefficient of determination for the results of applying permeability predictive models on the validation data is presented. As can be seen in this table, the MELM-COA algorithm was able to predict the permeability in the studied depth interval of well B with the least error. However, other models showed satisfactory prediction results. These findings confirmed the high accuracy and generalizability of the proposed models in predicting permeability on unseen data of well B.
The cross-plot of the measured and predicted permeability values obtained by the MELM-COA model is shown in Fig. 14. As can be seen in this figure, the MELM-COA model presented outstanding capability in predicting the permeability values across the studied interval of the validation well, witnessed by the absence of underestimating or overestimating in predictions made. Figure 15 depicts the comparison of the measured and predicted values by the MELM-COA model in the studied depth interval of well B. These findings further confirmed the substantial prediction performance accuracy of the proposed MELM-COA, distinguished by the closely matched predicted and measured values of permeability.

Conclusions
In this study, absolute permeability was modeled by integrating MELM and LSSVM, as estimator algorithms, with COA, PSO, and GA, as optimizer algorithms. Modeling was further performed by simple LSSVM and CNN for the sake of comparison. For this purpose, we began by evaluating a dataset including petrophysical logs (HCAL, SGR, PEF, NPHI, RHOB, RT, and DTCO) and permeability measurements from well A penetrating a carbonate reservoir to identify outliers using the Tukey technique and then remove them from the input data. Next, NSGA-II was applied to the training data (70% of the entire data set, i.e., 888 data points) to select the most appropriate features. Once finished with applying the algorithms on the training data, the developed models were applied to the testing data (i.e., 381 data points). To assess the generalizability of the developed models, these models were used to predict permeability in another well (well B) from this field and in the same formation. The following conclusions were drawn: • The Tukey technique recognized 261 data points as outliers. Accordingly, these points were omitted for the rest of the analysis. • Applying NSGA-II algorithm for the selection of the features with high influence degree on the targeted parameter, DTCO, NPHI, RHOB, and RT were selected as the most influential input features. • Testing the hybrid methods as well as the conventional LSSVM and CNN on the training data, it was figured out that the MELM-COA algorithm generates the smallest RMSE (0.5600 mD) and the highest R 2 (0.9931), as compared to other models. • The MELM-based hybrid models presented smaller errors than those of the LSSVM-based hybrid models. • The hybrid LSSVM-based models were less erroneous than the simple LSSVM, indicating the better perfor-

Outlier detection and elimination analysis
To determine the value of coefficient k in Eq. (1), in addition to the evaluation of data acquisition reports, a sensitivity analysis was performed to determine the appropriate value of this coefficient by modeling the permeability using the MELM-COA algorithm. For this purpose, the value of coefficient k between 1.2 and 1.7 was considered variable and the results of the output models were evaluated on modeling data (training and testing) and validation well data (well B). Table 9 shows the results of this evaluation. As can be seen in this table, as the value of this coefficient decreases, the RMSE of the training phase decreases. However, the RMSE values achieved for the test and validation data show that by reducing the value of this coefficient to 1.3, it decreases, but for the k value equal to 1.2, the error value increases. Since the model adjusts its parameters with training data, it is not far-fetched to expect a reduction in RMSE by reducing the value of k and, consequently, an increase in the volume of data identified as outliers obtained due to the reduction in the complexity of the problem. The trend of error reduction in the test and validation data shows that the outliers are correctly identified and the removal of the identified data does not affect the generalizability of the obtained model. The increase of the RMSE of the MELM-COA model in the value of k equal to 1.2 is indicative of the decrease in the diversity of data values due to the identification and removal of data that were not outliers, and in a way, it has caused the trend of permeability changes in the training data to be too smooth. Therefore, it can be said that the value of 1.3 is the most appropriate value for K. The results of this analysis are in good agreement with the wells' conditions and the report of petrophysical logging.

Convolutional neural network
The IAs are known to be sensitive to the presence of noise in the data. This indicates that they need noise attenuation and feature selection steps (Taye et al. 2020). This is while a CNN includes these two processes within itself. With the emergence of deep learning, CNN became a popular instrument for feature selection out of time series signals as well as identification, prediction, and classification purposes (LeCun et al. 2015). A CNN contains not only dense layers of neurons, very like an MLP NN, but also convolutional and pooling layers for feature selection. The term convolutional (or what is referred to as shifting calculation process) refers to the kernel movement on the input signal. In general, the pooling layer follows the convolutional layer for reducing the dimensions of the convolutional output, helping reduce overfitting problems. Output of a convolutional layer can have a depth of more than 1. Accordingly, flattening is applied to the output of the convolutional layers to form a flat structure that can be introduced into the dense layer. Figure 16 demonstrates the workflow of CNN. A major drawback of the CNN comes in the face of its need for adjusting many parameters. In addition to the optimization of the weights and biases, which is herein practiced by a backpropagation algorithm, one needs to set the number of neurons in the dense layer depending on the problem conditions. In this study, based on a sensitivity analysis of the size of the kernel and the count of convolutional filters, their optimal values were found to be 9 and 200, respectively. Figure 17 shows the results of the sensitivity analysis conducted for selecting the appropriate size of the kernel and the count of convolutional filters.

Multilayer extreme learning machine
The ELM is inspired by a feedforward network with a single hidden layer that was first introduced by Liang et al. (2006) for solving complex problems, demonstrating its superior accuracy and convergence rate compared to the conventional NN. In the ELM, the use of a uniform random distribution for setting the weights for the intermediate layer and the   generalized Moore-Penrose inverse analytic derivative for setting the weights for the output layer replace the timely iterative process in the conventional NNs for determining the weights and biases of neurons (Yeom and Kwak 2017). Acknowledging the numerous advantages of ELM, Xiao et al. (2017) presented a multilayer version of it for addressing the modeling problems with large datasets. Figure 18 shows the procedure for determining the weights of layers and biases of neurons with a MELM algorithm with k hidden layers in the training phase. As demonstrated by this flowchart, the algorithm begins by randomly setting the weights between the input layer and the first hidden layer as well as the biases of the neurons in the first hidden layer. Based on these random values and predefined activation functions, outputs of the first hidden layer (H) are set. At the next step, all other hidden layers (from the second to the kth) are then seen as a single hidden layer (i.e., multiple hidden layers of ELM are seen as an ELM with two hidden layers) and the expected output of this hidden layer (H 1 ) is determined based on the weights set for the output layer ( ). Following with the algorithm, the combined layers are decomposed and the weights and biases of the second hidden layer are evaluated based on the H 1 and H (which serve as input to the second hidden layer). Considering the weights and biases for the second hidden layer, one can obtain outputs of the second hidden layer. Next, the third to kth hidden layers are combined and the same process repeats until the weights and biases of the kth hidden layer are calculated. It is worth noting that if the maximum and minimum values in the Matrix H go beyond 1 and -1, respectively, one must normalize all values to [− 0.9, 0.9].

Least squares support vector machine
Introduced by Suykens et al. (2002), LSSVM was originally developed to reduce the computational complexity of the SVM, where inequality constraints are replaced by equation constraints obtained by solving a quadratic programming problem. This accelerates the LSSVM over the SVM. Let x i ,y i k i=1 be a given training dataset where x i R N are inputs with N dimensions and y i R denotes the corresponding dependent variable values. Function approximation seeks to find a fundamental relationship between the input  (Mehrad et al. 2022) and output values. In the feature space, an LSSVM model can be expressed as Eq. (11).
(11) y(x) = w T (x) + b where (.) maps the input data into a feature space of larger dimensions, w R n is an adjustable weight vector, and b R is a numerical threshold. The error between the estimated values ( ŷ ) and the actual values (y) is calculated through Eq. (12).
Setting the weights in such a way to achieve the smallest quadratic error on the training data for the LSSVM defines an optimization problem. For k training datasets, this problem can be expressed as Eq. (13).
In this equation, is referred to as the regularization parameter and is there to establish a balance between the model complexity and the estimation accuracy. The value of this parameter coupled with the kernel coefficients play a critical (12)  Accordingly, the weight vector can be obtained using Eq. (19).
The weights are herein declared as linear combinations of the Lagrangian multipliers for the input training data. Substituting Eq. (19) in Eq. (11) leads to Eq. (20) where K x i ,x is the kernel. The performance of LSSVM model is largely related to the choice of the kernel and the values of the hyperparameters (Duan et al. 2003). The most popular kernel functions include linear, polynomial, MLP, and RBF kernels.

COA
Inspired by the life of a family of birds called cuckoos, COA was developed by Rajabioun (Rajabioun 2011) for addressing nonlinear continuous optimization algorithms. In this algorithm, the values of the variables are expressed in an array called habitat. Herein a habitat denotes a sample containing possible solutions to the considered optimization problem. For an optimization problem with N var decision variables, a habitat is denoted by a 1 × N var array and can be expressed as habitat = [x 1 ,x 2 ,x 3 , … ,x N var ] . The profitability of a habitat is evaluated by the cost function shown in Eq. (21).
Knowing that this algorithm is an inherently maximization algorithm, a cost function of the following form is used for the problems where the actual objective is minimization: In this algorithm, in proportion to the selected population size (N pop ), N pop × N var matrix is built. Then, a random number of cuckoo eggs (between 5 and 20) is assigned to each habitat. The cuckoos are supposed to lay these eggs in the nests belonging to hosting birds within an egg-laying radius (ELR). The ELR is defined using the lower and upper limits of the variables (var low and var high , respectively), ELR control coefficient (α), number of eggs per cuckoo, and the total number of eggs, using Eq. (23). Figure 19 shows the flowchart of this algorithm. Like any other evolutionary algorithm, the COA begins by generating an initial population of cuckoos. This initial population lay eggs in the nest of hosting birds. The eggs with maximum similarity to those of the hosting bird will have the opportunity to grow and make mature cuckoos, with the other eggs identified by the hosting bird and hence destroyed. The cuckoos evaluate the habitats for chances of survival. Indeed, the habitats where the largest number of cuckoo eggs can grow into mature cuckoos represent the maximum profitability (this is an inherent maximization algorithm). In other words, the cuckoos tend to lay eggs in a habitat with the maximum survival rate. The eggs that survive and grow into mature cuckoos comprise communities of cuckoos in their habitats. Each community of cuckoos lives in a particular habitat. The best habitat is the one that offers maximum food availability and survival chance to the cuckoos; this is then selected as migration destination to other communities. The immigrant communities try to reside in the nearest possible habitat to the best habitat. An ELR is determined for each cuckoo based on the number of her eggs and her distance to the best habitat. Then the cuckoos lay eggs in the nests within this ELR on a random basis. This process continues until the best point (corresponding to the highest profit in terms of food availability and survival rate) is identified. A majority of cuckoos with converge around this point.

Particle swarm optimization
Inspired by group behavior of animals like birds and fishes, PSO is a well-known metaheuristic algorithm that was first introduced by Kennedy and Eberthart (1995). Thanks to benefits like simplicity, easy implementation, robustness to control parameters, and computational efficiency, as compared to other mathematical and metaheuristic algorithms (Lee and Park 2006), the PSO has been found a broad set of applications. In this algorithm, each person is defined to have five properties: position (x), velocity (V), best personal position (Pb), best global position (G best ), and cost. Figure 20 shows the flowchart of the PSO. As seen on this figure, firstly, the velocity and position of each person (i) are initialized and the best personal position is set to be the present person's position. The person's position is then evaluated using a predefined cost function. Comparing the costs among different persons in the swarm, one can identify the best position in the swarm (associated with the cheapest cost). This position is then set as the best global position for all persons (information sharing). The    Simple learning (Su et al. 2020) Nonlinear transformation during training  Not getting stuck in local optimal points and overfitting ) Handle huge amounts of data, Due to the use of more layers Being able to get answers quickly and at a low cost ) Accelerating the development of deep learning  Randomly assigning weights and biases of nodes in the first layer LSSVM Computationally efficient (Shabri and Suhartono 2012) Relatively fast (Thissen et al. 2004) Mathematically tractability (Kadkhodazadeh and Farzin 2021) High accuracy (Kadkhodazadeh and Farzin 2021) Direct geometric commentary (Kadkhodazadeh and Farzin 2021) Sensitivity to outliers (Liu et al. 2016) Being memory consuming especially in large numbers of training data, due to the loss of sparsness of the SVM algorithm (Liu et al. 2016) CNN Reduced the number of learning parameters using weight sharing and local connectivity (Zhang et al. 2017) Good at learning local features (Zhang and Xiang 2018) Resistant to overfitting, due to less number of learning parameters (Zhang et al. 2017) More data should be provided to achieve an efficient model (El-Sawy et al. 2016) Due to the use of operators such as maxpool, it has a low speed For multiple layers, training is time extensive Computationally expensive (Yamashita et al. 2018) next iteration (t + 1) begins by updating the persons' positions based on their velocity, previous position, best personal position, best global position, personal (c 1 ) and global (c 2 ) learning coefficients, and random numbers (r 1 and r 2 ) (Eq. 24). The new position of each person is determined using his/her previous position and the new velocity (Eq. 25). The new position is then evaluated using the cost function and should it correspond to a lower cost than that of the person's previous best personal position, the best personal position is updated. The equivalent cost of the best global position is also updated by comparing the costs of the best personal positions. This iterative process goes on till the stopping criterion (e.g., a maximum number of iterations) is achieved, with the final best global position reported as the solution.

Genetic algorithm
Introduced by Johan Holland and his colleagues in the 1960s, the GA is based on Charles Darwin's biological evolution model and what we know as the natural selection process (Yang 2020). Despite the ability of this algorithm to solve a variety of complex problems, it needs proper setting of the mutation and crossover rates and the population size. If these control parameters are not set properly, the algorithm ends up with a pretty low convergence rate or even leads to a meaningless solution (Yang 2020). Resembling many other metaheuristic algorithms, this methodology starts with the random initialization of a population of chromosomes. Calculating a fitness value for each chromosome, chromosome pairs are then selected out of the population based on their fitness values. Offsprings are created by crossover operation and then subjected to mutation operation. By replacing the newly built population for the previous population, the fitness value of the chromosomes is reevaluated and this iterative process continues until the stopping criterion is satisfied. To improve the convergence rate of the GA, elitism is deployed, where the best chromosomes of each generation are selected for the next generation without subjecting them to neither crossover nor mutation operations. Figure 21 shows the flowchart of GA for an optimization problem. Table 10 shows the advantages and disadvantages of the predictive algorithms considered in the present study. As can be seen in this table, the MELM algorithm theoretically (24) V i (t + 1) = wV i (t) + c 1 r 1 Pb i (t) − x i (t) + c 2 r 2 G best (t) − x i (t) (25) x i (t + 1) = x i (t) + V i (t + 1) seems to be superior to the other two algorithms, LSSVM and CNN, in terms of characteristics. Moreover, the drawbacks of the MELM model can be addressed by combining it with optimization algorithms.