Predicting shear wave velocity from conventional well logs with deep and hybrid machine learning algorithms

Shear wave velocity (VS) data from sedimentary rock sequences is a prerequisite for implementing most mathematical models of petroleum engineering geomechanics. Extracting such data by analyzing finite reservoir rock cores is very costly and limited. The high cost of sonic dipole advanced wellbore logging service and its implementation in a few wells of a field has placed many limitations on geomechanical modeling. On the other hand, shear wave velocity VS tends to be nonlinearly related to many of its influencing variables, making empirical correlations unreliable for its prediction. Hybrid machine learning (HML) algorithms are well suited to improving predictions of such variables. Recent advances in deep learning (DL) algorithms suggest that they too should be useful for predicting VS for large gas and oil field datasets but this has yet to be verified. In this study, 6622 data records from two wells in the giant Iranian Marun oil field (MN#163 and MN#225) are used to train HML and DL algorithms. 2072 independent data records from another well (MN#179) are used to verify the VS prediction performance based on eight well-log-derived influencing variables. Input variables are standard full-set recorded parameters in conventional oil and gas well logging data available in most older wells. DL predicts VS for the supervised validation subset with a root mean squared error (RMSE) of 0.055 km/s and coefficient of determination (R2) of 0.9729. It achieves similar prediction accuracy when applied to an unseen dataset. By comparing the VS prediction performance results, it is apparent that the DL convolutional neural network model slightly outperforms the HML algorithms tested. Both DL and HLM models substantially outperform five commonly used empirical relationships for calculating VS from Vp relationships when applied to the Marun Field dataset. Concerns regarding the model's integrity and reproducibility were also addressed by evaluating it on data from another well in the field. The findings of this study can lead to the development of knowledge of production patterns and sustainability of oil reservoirs and the prevention of enormous damage related to geomechanics through a better understanding of wellbore instability and casing collapse problems.

The value of variable x in a specific data record x l i Introduction Petroleum geomechanics forms a critical part of reservoir engineering and wellbore construction models (Rhett 1998;Bazyrov et al. 2017; Akbarpour and Abdideh 2020; Mohamadian et al. 2021). Interactions of stress fields with subsurface lithologies and the formed structures require a comprehensive understanding of the mechanical behavior of the lithologies associated with gas and oil fields. Such an understanding helps to overcome many problematic drilling and field development challenges and reduce operational costs (Hudson et al. 2005;Rajabi et al. 2022a).
The development of geomechanical models depends on the availability of reliable data from laboratory analysis. This involves mechanical tests on wellbore core samples recovered from the subsurface sedimentary columns penetrated during gas and oil field exploration and development (Khoshouei and Bagherpour 2021;. However, due to the high cost and time associated with wellbore coring operations, few oil or gas field wells are actually sampled by coring. This means that the availability of geomechanical measurements from cores is severely restricted. Consequently, estimates and extrapolations for these parameters have to be used. Many empirical relationships have been developed to compensate for this shortcoming based on the use of petrophysical well-log data (Eberhart-Phillips et al. 1989;Jørstad et al. 1999;Sohail et al. 2020). The basic input requirement for many geomechanical empirical relationships is shear wave velocity (V S ) (Ghorbani et al. 2021). Moreover, for cost reasons and the limited geomechanical considerations associated with many historical wells, most wellbore logging suites do not record V S using the advanced and expensive dipole sonic log.
Due to subsurface heterogeneities, geomechanical variables commonly vary across gas and oil reservoir formations and along the wellbore profiles (especially in directional/ horizontal wells). Consequently, V S prediction is often required based on a few core measurements combined with well-log variables recorded continuously along the wellbore profiles. Machine learning (ML) methods provide an alternative method to make more reliable V S predictions than those provided by empirical relationships (Ashraf et al. 2020;Vo Thanh et al. 2020;Ali et al. 2021;Thanh et al. 2022;Vo-Thanh et al. 2022).
The compaction of reservoir and consequential subsidence associated with the Ekofisk field (North Sea) caused a great deal of additional cost to the field owners, which could have been avoided by evaluating the potential behavior of subsurface formations to engineering operations by applying appropriate geomechanical studies (Dusseault 2011). That field case highlights the necessity of conducting careful geomechanical studies for effective field development, thereby preventing extra operational costs (Fourie and Vawda 1992). However, providing appropriate geomechanical studies requires geomechanical data from the sedimentary sections of interest. Such data can be obtained in two ways. The first method is to measure the required data through the time intensive and costly geomechanical laboratory experiments on the available core plugs. This method provides non-continuous geomechanical data (limited to some specific points distributed across the sedimentary section) (Stark et al. 2014). The second method provides geomechanical data indirectly from petrophysical data, from which valuable rock properties, including porosity, density, and shear/compressional velocity, can be usefully determined (Medetbekova et al. 2021). The latter method is cost-effective since it does not require time consuming experiments and provides a continuous geomechanical dataset across the logged section of a wellbore (Tokeshi et al. 2013). Among the petrophysical logs required for this method, V S tends not to be routinely recorded in every well drilled in oil and gas fields, due to the additional operational cost associated with the specific logging tool required to record it (Wang et al. 2020). As a result, establishing predictive models for indirect evaluation of V S can be very useful for conducting geomechanical studies. Additionally, V S data is valuable for assisting decision-making in the selection of drilling locations and wellbore trajectories to ensure they achieve maximum well stability, preventing sand production, and the selection of appropriate zones for hydraulic fracturing (Fourie and Vawda 1992;Stark et al. 2014).
There are two conventional ways commonly used to estimate V S . These are (i) predictive models based on rock physics, and (ii) empirical correlation-based relationships (Wang et al. 2019). Modeling methods use the physical properties of rocks to develop petrophysical models to predict V S . Indeed, in rock physics modeling, V S is obtained by studying different rock physics models to calculate rocks' effective elastic parameters. The factors that are typically considered in rock physics modeling are, porosity, pore shape, fluid inclusion properties, and matrix mineralogy (Wang et al. 2020). Several different physic-based models have developed for so far V S estimation (Xu and White 1995;Sun et al. 2008;Zhang et al. 2012;Guo and Li 2015;Darvishpour et al. 2019;Zhang et al. 2020;Ali et al. 2021). Theoretically, the rock physics model-based methods are not limited in application to specific geographic areas or petroleum basins, because they adequately address many of the drawbacks of empirical equations. Nevertheless, most of the modeling methods based on rock physics involve very complicated estimation processes due to their need to make assumptions about the shape of pores. Such assumptions tend to reduce, to some degree, the validity of the estimation results. Besides, in such models the matrix elastic parameters, compositions, and the mixing mode must be taken into account, together with the effects of pore shapes and the fluid constituents, to achieve accurate V S predictions. As a result of these difficulties, the models based on rock physics are of low efficiency and their complexity limits their appeal for real-world drilling and field development applications. The empirical correlation methods have been widely used to estimate V S from compressional wave velocity (V P ) since they are quick and simple to apply, and relatively reliable methods (Wang et al. 2020) (Bailey and Dutton 2012;Lee 2013;Ojha and Sain 2014;Oloruntobi et al. 2019;Oloruntobi and Butt 2020). The reliability of empirical correlation equations originates from the fact that most of the factors affecting V P also influence V S in a similar manner but to different degrees (Xu and White 1995;Oloruntobi and Butt 2020). Table 1 lists some of the most commonly used empirical equations developed for V S prediction involving various relationships with V P . Vs signals recorded can be influenced by earthquake effects (Güllü and Pala 2014;Güllü and Jaf 2016;Güllü and Karabekmez 2017). The fact that most empirical correlations for V S prediction oy iolve V P (Table 1) limits their accuracy and tends to make them field or basin specific. The results of these empirical equations are considerably influenced by lithology type, which may lead to inadequate prediction accuracy (Akhundi et al. 2014;Güllü and Jaf 2016). Besides, the lack of generalizability to other fields and their poor fit with real data across an entire sedimentary section limits the confidence with which such relationships can be applied (Güllü and Pala 2014;Güllü and Jaf 2016;Gholami et al. 2020;Oloruntobi and Butt 2020;Rajabi et al. 2021;Rajabi et al. 2022a). In recent years, the much-improved computational efficiency and prediction accuracy achieved by various ML methods has resulted in various ML methods being applied to predict V S from well-log input data (Eskandari et al. 2004;Rezaee et al. 2007;Rajabi et al. 2010;Bagheripour 2013, 2014;Gholami et al. 2014;Maleki et al. 2014;Oloruntobi et al. 2019;Gholami et al. 2020;Wang et al. 2020;Zhang et al. 2020). The datasets used in those models are typically verified with just a few core measurements and in some cases, include seismic data, with details listed in Table 2 (Al-Dousari et al. 2016). However, as ML and deep learning (DL) methods improve and more extensive datasets become available from around the globe, much scope remains to improve on V S prediction accuracy (Wang et al. 2020;Wood 2020). Moreover, the possibility exists to make the methodologies more robust and generalizable within hydrocarbon fields and across sedimentary basins.
In this paper, three recently developed techniques are developed and evaluated to predict V S for several wells drilled in a giant oil field with both carbonate and sandstone reservoirs using data from standard well logs (Fig. 1). These include two HML techniques: multi-hidden layer extreme learning machine hybridized with a particle swarm optimizer (MELM-PSO); and MELM hybridized with a genetic algorithm (MELM-GA). The third technique is the DL model convolutional neural network (CNN). The main novelty and features of this study are to develop, apply, and compare Vs predictions from these three techniques applied to a large multiple-well dataset from a giant oil field. The Vs prediction performance of the DL and HML algorithms is also compared, for the same dataset, with commonly used empirical Vs prediction models. Recent research has applied machine and deep learning algorithms, as robust computational tools to many engineering fields in order to solve a wide range of problems. Furthermore, full-scale comparison is performed between the hybrid machine learning models and a deep learning model. This identifies the most effective and accurate model for predicting the shear wave velocity. As a verification measure, we also address possible concerns about ensuring the integrity and repeatability of the proposed machine learning practical models by applying them to data from another well in the field. As a fast and very low-cost solution compared to other available methods, the technique involves only minor disadvantages. Execution constraints (appropriate computer system processing power) represent a constraint related to the number of data records and log variables that these models can process. Additionally, the quality of the recorded standard logs is important, and poor quality recorded log data will result in higher Vs prediction errors. The method's advantages outweigh their disadvantages, and the HML and DL models developed can be defined as reference classes or libraries for general use.

Work flow
A work flow diagram (Fig. 2) summarizes the sequence of construction and evaluation steps involved in applying the DL and HML algorithms to predict V S and establish the prediction accuracy achieved. The process sequence begins with compiling a dataset and statistically assessing the value distribution of each of the component data variables. The maximum and minimum values for each data variable (attribute) are used to normalize the variable values so that they fall within the range of −1 and + 1. Normalization is achieved using Eq. (1) and is important because it avoids scaling biases in the learning processes adopted by the DL and HML algorithms (Kamali et al. 2022). where: x l i = the value of attribute l for data record i; (1)  The normalized data records are then assigned to either a training subset or a testing subset. Trial and error tests indicate that an approximate 70%:30% split of data records between training and testing subsets works well for most reasonably sized datasets. The testing subset of data records is held independently of the training subset and is not involved in the algorithms' training processes. A K-fold method is used to sample the training subset for validation purposes. Statistical measures of accuracy are then used to assess the V S prediction performance of each DL and HML algorithm evaluated to establish their relative V S prediction capabilities.

Machine-learning (ML) algorithms
ML algorithms are now usefully applied to solve many oils and gas operational and prediction challenges including drilling, reservoir performance, and geomechanical characterization (Gullu 2017;Ashraf et al. 2020;Ashraf et al. 2021;Ranaee et al. 2021). ML algorithms are well suited to evaluating problems involving multiple variables with nonlinear relationships and complex value distributions (Gullu 2017;Hazbeh et al. 2021b). Recently, some researchers work on the shear wave velocity based on machine learning algorithms. Artificial neural network (ANN), extreme learning machine (ELM), support vector machine (SVM) and other algorithms based mainly on regression / correlation relationships have been successfully applied to progressively improve the prediction performance of variables relevant to the petroleum industry (Farsi et al. 2021b). Some of the researchers work on the Vs based on the ML work (Weijun et al. 2017;Azadpour et al. 2020;Zhang et al. 2020;Zhang et al. 2021;Olayiwola et al. 2021;Zhong et al. 2021;Ebrahimi et al. 2022).

Single machine-learning (SML) algorithms
Extreme learning machine (ELM) ELM is a rapidly executed feed-forward neural network (Huang et al. 2006). It can be usefully applied to reduce learning time, improve accuracy, and increase generalizability (Huang et al. 2006;Huang et al. 2011;Huang 2014;Wang et al. 2014;Cheng and Xiong 2017;Naveshki et al. 2021;Zhang et al. 2022). ELM differs from an ANN, utilizing back-propagation or other optimization algorithms, in that all the ELM's internal Fig. 1 Schematic diagram outlining the technique to predict Vs data from a standard suite of well logs by applying deep learning prediction model learning parameters are randomly determined. This saves computational time as during ELM training, the parameters associated with the hidden layer (weights and biases) do not need to be adjusted. The output weights are determined by the inverse Moore-Penrose function applied to the hidden layer to output matrix (Yeom and Kwak 2017). The structure of a simple ELM (with a single hidden layer) is shown in Fig. 3.
ELM performance for complex problems can be improved by introducing more than one hidden layer. The multi-layer ELM algorithm is configured as follows: Step 1: Determine the number of hidden layers (l) and neurons in each layer.
Step 2: Assuming (X, Y) = (xi, y i ) = (i = 1,2, 3, …, Q) as training data; where X is the matrix of input variable values for each data record and Y is the output variable vector including all data records.
Step 3: Each hidden layer has n neurons and an activation function g (x). Weights between layers i and (i-1) and biases applied to layer i are randomly generated.
Step 4: Step 5: Calculate the H matrix with Eq. (2): Step 6: If i is less than l, calculate Eq. (3) and return to step three. Otherwise go to the next step.
Step 7: The output weights are calculated based on the Moore-Penrose inverse by applying Eq. (4):   (Abad et al. 2021a) Step 8: The output prediction is calculated with Eq. (5).: Genetic algorithm (GA) GA is an evolutionary algorithm developed in the 1960's and inspired by the principles of genetics, involving functions that mimic inheritance, mutation, selection, and combination. It establishes an initial population of randomly generated artificial "chromosomes" (Mohamadian et al. 2021). Each chromosome is evaluated through several evolutionary iterations with a cost function, which is progressively minimized. To determine the attributes of the next generation of "chromosomes" the value of the current generation is ranked (elitism) and only the best performing ones are "selected" to participate in reproduction. Crossover and mutation operations, with an assigned degree of randomness, are then involved in producing the next generation. The degree of randomness helps the GA from avoiding being trapped at local minima, enabling it to thoroughly explore the feasible solution space. Figure 4 illustrates the GA process in the form of a flowchart. Figure 5 illustrates the PSO algorithm in the form of a flowchart. PSO searches the feasible solution space using a population (swarm) of particles, the adjusted movements of which are inspired by those of flocks of birds or shoals of fish. The positions of the initial population are set randomly in the search space, which is defined by the minimum and maximum values of the decision variables. The particle is moved in different directions and at different speeds between the lower limit (V min ) and the upper limit (V max ) from one iteration to the next. The designated positions of each particle is recorded and their best historical individual position is stored as a "personal best" (P b ) and used in partially determining the movements going forward.

Particle swarm optimization (PSO) algorithm
The position of all particles is evaluated by the objective function (cost function), and the particle with the lowest cost function value is identified in each iteration as the best global position (G b ). In each iteration, a new velocity (V i (t + 1)) for each particle (i) is calculated based on the previous velocity (V i (t)) and the distance of the  Flowchart showing the execution sequence of a particle swarm optimizer (PSO). Modfied with permission from ref. (Rashidi et al. 2021) particle's current position (x i (t)) in the solution space compared to its best historical personal position and the best global position achieved by the swarm so far (Eq. 6). Subsequently, the new position of each particle (x i (t + 1)) is calculated based on its prevailing position and the new calculated velocity (Eq. (7)).
The new position of each particle is then re-evaluated with the cost function. The PSO algorithm is well suited   to efficiently explore continuous solution spaces without becoming easily trapped at local minima.

HML algorithm configurations
Multi-layer extreme learning machine (MELM) hybridized with optimizers MELM performance depends on the number of hidden layers included, and the number of neurons in each of those layers. The MELM structure varies according to the complexities of the dataset (Rashidi et al. 2021). The more complex the problem, the greater the number of hidden layers and neurons. On the other hand, the more layers and neurons involved, the longer the computational time. Therefore, optimizing the MELM structure can lead to a high-precision model with an efficient learning process and relatively short computational requirements. A trial-and-error method can be used to determine the appropriate structures of multilayer ANN and MELM, but this can be very time consuming. Therefore, in this study the PSO algorithm is used to determine the number of MELM hidden layers and number of neurons in each layer. On the other hand, due to the process of randomly selecting of hyperparameters for MELM, different answers may be obtained each time the algorithm is implemented. To solve this problem, the MELM algorithm is combined with the optimizer (GA or PSO) to firstly identify the optimum hyperparameter values (Fig. 6). GA and PSO optimization algorithms have adjustable hyperparameters (control values) that influence the efficiency of their performance. Trial-and-error methods were used to determine these control values (Tables 3 and 4). A total of 50 iterations of the optimizers were used to identify the optimum number of layers and neurons in the MELM, whereas 200 iterations (Tables 3 and 4) of the optimizers were used to optimize the weights and biases of the MELM-GA and MELM-PSO hybrid models (Abad et al. 2022).
The K-fold cross-validation technique was applied, with a tenfold set up, to achieve more stable and reliable V S prediction results in determining the number of MELM layers and neurons. This divides the entire dataset into ten equal portions. The model is then evaluated ten times with each execution using nine portions of the data records as the training subset, and one portion of the data records as the validation subset (Fig. 7). Each of the ten portions is therefore used once as the validation subset.   Table 5 shows the provisional V S prediction results for different MELM structures, established by trial and error, using the tenfold cross-validation technique. They indicate that MELM with between 2 and 6 hidden layers and with between 6 and 10 neurons achieves the lowest RMSE for V S predictions. In order to save computational time, the optimizers were therefore constrained to vary MELM layers between values of 2 and 6 and the number of neurons between 6 and 10.

Convolutional neural network (CNN)
CNN have demonstrated their capabilities in diverse applications in recent years, including prediction and learning applications related to image recognition (Krizhevsky et al. 2017), reading comprehension (Yu et al. 2018), and reinforcement learning in game strategy (Silver et al. 2016;Abad et al. 2021a). CNN uses convolutional (weight sharing) layers instead of the traditional fully connected layers of neural networks such as ANN and ELM (Abad et al. 2021b). This compresses the layers and neurons of CNN compared to fully connected networks and often enables them to generate higher resolution predictions with less training data records for specific problems. Figure 8 shows a generic CNN structure. It has several parallel filters acting on the input data records that can be configured to extract different features. The input vector is filtered by each of the CNN filter layers, with each layer producing its own output vector; Therefore, the dimensions of the network increase with the number of filter layers selected. A pooling layer is then used to reduce the dimensions and normalize the selected variables, feeding that data into the concatenate layer. This information is then fed into the dense layer (s) to generate the final output. This dense layer (like the multilayer perceptron neural network) is made up of a number of neurons, the number of which is determined by the user (trial and error) or an optimizer. The model is executed to establish the weights and biases for the neurons in the dense layers that achieve the highest dependent variable prediction accuracy.
There are a number of hyperparameters that need to be set when developing a CNN model. For the CNN constructed in this study to predict V S , based on trial-anderror, the number of filters was set to 200, A kernel size (convolutional window length) of 3 was selected, the "relux" activation function was applied and the number of neurons in the dense layer was set to 100.

Statistical measures of prediction accuracy
V S prediction performance comparison between the HLM, DL and empirical models evaluated are conducted by calculating widely used statistical measures of prediction accuracy as expressed in Eqs. 8,9,10,11,12,13,14 and 15. Percentage deviation (PD) or relative error (RE) Average percentage deviation (APD): Absolute average percentage deviation (AAPD): Standard Deviation (SD):

Root Mean Square Error (RMSE):
Coefficient of Determination (R 2 ):  These indicators of prediction accuracy are best considered together rather than individually as they all reveal complementary information and insight into the prediction performance of the algorithms considered. RMSE is used as the objective function for the HML and DL models, making it the single most important measure, as those algorithms are configured to minimize RMSE.

Marun field description
To predict V S , well log data from three wellbores drilled in the Marun oil field: MN#163, MN#225 and MN#179, are evaluated. This giant oil field is located onshore southwest of Iran (Fig. 9). It was discovered in 1963 and is one of the largest oil fields in the Zagros Basin with two producing oil reservoirs; the Asmari (Oligocene to Early Miocene) and Bangestan (Upper Cretaceous) formations. Collectively, these reservoirs contain in-place oil resources of some 46 billion barrels. In addition, the Khami (Lower Cretaceous) formation forms an underlying natural gas reservoir with some 462 trillion cubic feet of gas-in-place.

Data collection and data distribution
Well-log datasets compiled for wells MN#163, MN#225 and MN#179 sample the Asmari carbonate reservoir. Data records from two of the wells (MN#163 and MN#225) were used for supervised training and validation of the DL and HML algorithms in terms of V S prediction accuracy. Data from well MN#179 was then used as an independent testing subset to test the models for V S prediction accuracy with data previously unseen by the trained and validated model.
The well-log variables used as input features for the V S prediction models are gamma ray (GR); compressionalwave velocity (V P ); bulk density (RHOB); neutron porosity (NPHI); shallow resistivity (RES-SHT); medium resistivity (RES-MED); deep resistivity (RES-DEP) and caliper (CP). Table 6 statistically summarizes the distributions of the nine variables involved (8 input plus V S as dependent variable) sampled from the Asmari reservoir sections penetrated by the three wells: MN#163 (3793 data records), MN#225 (2829 data records) and MN#179 (2072 data records), constituting 8694 data records in total. The ranges of the data variables covered by the Asmari reservoir well-log samples are substantial (Table 6). For instance, the V S range evaluated extends from 1.40 km/s to 3.15 km/s across the three wells considered. This highlights the lithological variety within the Asmari reservoir including, limestone, dolomite, shale, siltstone, sandstone and evaporite layers.
The best subset of input variables was selected based on evaluation of correlation coefficients between each input variables and the measured V S values. The input variables displaying the highest correlation coefficients were selected for V S modeling. Figure 13 shows that four input variables, V P , GR, RHOB, and NPHI, have the highest correlation coefficients with V S . The HML and DL model were initially built using these four selected features. The impact of the other potential input variables was then evaluated by adding them, one at a time, to the selected feature subset to predict V S . The result of that analysis revealed that by adding the variables RES-DEP and RES-SHT, to the four originally selected features based on correlation coefficient, generated more accurate V S predictions. Therefore, these six features were used to build the HML and DL models finally evaluated.

Identifying the best performing algorithm for VS prediction
Tables 7 and 8 display the V S prediction accuracies based on the training (70%) and validation (30%) subsets, respectively, selected from the 6622 data records available for wells MN#163 and MN#225. This represents the supervised training and learning performance for the HML and DL algorithms. The performance of five empirical relationships used for predicting V S from V P (Table 1) are also shown for each of these data subsets. Table 9 displays the V S prediction accuracies for the supervised and trained HML and DL algorithms applied to all 6622 data records for wells MN#163 and MN#225. The performance of five empirical relationships (Table 1) are also shown for comparison.
Close inspection of the models' V S prediction results (Tables 7, 8 and 9) reveals that the DL CNN model achieves exceptionally high V S prediction accuracy when applied to the two subsets and all data records for the two wells involved in supervised learning. (e.g., from Table 9 CNN: RMSE = 0.0456 km/s; AAPD = 1.477%; R 2 = 0.9808). The HML models also achieve high V S prediction accuracy, for the two subsets and full supervised learning dataset, but they do not match that of the CNN model. The MELM-PSO model performs slightly better than the MELM-GA model. The recorded V S prediction performance (RMSE) therefore ranks the DL, HML models and empirical equations as follows: CNN > MELM-PSO > MELM-GA > Castagna et al. > Eskandari et al. > Pickett > Brocher > Carroll. It is very clear from Tables 7, 8 and 9 that the DL and HML models substantially outperform all five of the empirical models used to predict V S using relationships with V P . This outcome highlights the value of using information from a suite of well logs rather than just relying on V P data to predict V S . Figure 10 displays the predicted versus measured V S values for the data records in each subset and the full supervised learning dataset evaluated by the HML and DL models. The superior prediction performance of the DL CNN model is apparent as it involves no substantial outlier predictions. On the other hand, MELM-PSO and MELM-GA models do involve a few substantial outliers (only about 5 data records out of 6622 total data records). Figure 11 reveals that the most commonly used empirical models (Table 1) provide workable V S prediction accuracy (R 2 ~ 0.86) for this dataset but are substantially less reliable than the DL and HML models. The results in Table 1 show that the RMSE for an empirical equation is substantially greater than the RMSE for the CNN and HML models. The Castagna et al. (1993) relationship performs better than the other empirical models evaluated for the Asmari reservoir (Tables 7, 8 and 9). Figure 12 displays the relative percentage error (PD%) for V S predictions for each of the 6622 data records (wells MN#163 and MN#225) constituting the training and validation subsets. These are displayed sequentially for the high performing DL and HML models. The PD% range for DL model (~ −20% < PDi < ~ 15%) is substantially better than for the HML models (~ -70% < PDi < ~ 45%) but for most data records PD% is < ± 5%. The PD% range for the empirical relationships is much greater, and for most data records PD% is > ± 15%. The Castagna et al. (1993) model performs better (~ −20% < PDi < ~ 25%) than other empirical models and the Carroll (1969) relationship performs the worst for the PD accuracy measure (~ −120% < PDi < ~ −20%).
A plot of V S RMSE versus iteration number (Fig. 13) for the DL and HML algorithms identifies that all three algorithms converge to highly accurate solutions rapidly. The MELM-PSO and MELM-GA models converge at similar rates and after fewer iterations than the CNN algorithm. Although it takes more iterations, the CNN does achieve the lowest RMSE solutions by outperforming the HML algorithms after 100 iterations.

Relative influences of the input variables on VS
Spearman's correlation coefficient (ρ), expressed on a scale of −1 to + 1 (Gauthier 2001), is calculated (Eq. 16) to establish the nonparametric relationships between the input variables and V S . where: T i = T input variable value of data record i; T = mean value for variable T; Q i = Q dependent variable (V S ) value of data record i; Q = mean value for dependent variable Q;  Figure 14 identifies, using the ρ values calculated for all 6622 of the supervised learning datasets, that V P has, as should be expected, the greatest influence on V S . On the other hand, CP has the least influence on V S . The input variables NPHI, GR and RHOB also show substantial influences on V S , whereas the resistivity variables show negligible influences on V S .

Development and generalization of CNN model applied to other marun field wells
The best V S prediction model (DL CNN) established for the Asmari reservoir, trained based on supervised learning using the dataset compiled from wells MN#163 and MN#225, is applied to data previously unseen by the trained and validated model. It does so by evaluating the dataset compiled for Marun oil field well MN#179 (2072 data records; Tables 7, 8 and 9). The statistical measures of accuracy achieved for these MN#179 data records using the same eight well-log data input variables are listed in Table 8. These results confirm high V S prediction accuracy using the prediction model trained and validated with data from the other two wells. Figure 15 plots the measured versus predicted V S values predicted by the CNN model trained with MN#163 and MN#225 data records and applied to all 2072 data records from wells MN# 179. The prediction performance is very good, confirming its reliability. This makes it suitable for application in other wells drilled into the Asmari reservoir in the Marun oil field for which V S well log data has not been recorded. To apply the trained model to other wells, a standard suite of well logs is required for the wells of interest. Fortunately, such a suite of well logs is available for most of the existing wells in the field (Table 10). Figure 16 shows the V S prediction performance of the DL CNN model applied to the dataset from well MN#179 in terms of percentage error (PD%) for each data record arranged in order of sample depth through the Asmari reservoir. While most of the PD errors for these data records are is < ± 5% in the lower 500 samples (equivalent to the lower 100 m of the Asmari section) several PD errors of between 5 and 15% are recorded. These outlying values in the lower part of the Asmari section drilled in well MN#179 are worthy of further analysis, but their prediction accuracy remains within reasonable error limits. The DL CNN model described and evaluated here could be used in a similar way to predict V S in other fields but, of course, it would need to initially be recalibrated with some direct V S measurements from at least one well in each of the fields / reservoirs to which it is applied.

Recommendations for future research works
Evaluation of the effect of the inclusion of other drilling parameters such as standpipe pressure and mud flow rate as input parameters along with well logging to predict V S can be further investigated. According to the current findings, adding more related input parameters could provide models with higher prediction efficiencies. Involving other optimizers, such as genetic algorithms and firefly algorithms, in the development of a high-performance hybrid predictive model for V S prediction can also be considered in future research work (Choubin et al. 2019;Ghorbani et al. 2020b;Kalbasi et al. 2021;Mohamadian et al. 2022;Rajabi et al. 2022b). The application of the proposed method should be investigated in a wide range of applications, e.g., various energy, ecological and natural research applications (Ghorbani et al. 2017;Ghorbani et al. 2019;Taherei Ghazvinei et al. 2018;Ahmadi et al. 2020;Band et al. 2020a;Band et al. 2020b;Emadi et al. 2020;Lei et al. 2020;Shamshirband et al. 2020;Barjouei et al. 2021;Hazbeh et al. 2021a). From computational fluid, pressure and hydrological modeling to environmental simulation for instance (Ghalandari et al. 2019b;Rezakazemi et al. 2019;Seifi et al. 2020;Farsi et al. 2021a;Mahmoudi et al. 2021) the proposed methodology can be effective. For the future research the comparative analysis with other machine learning methods, e.g., (Asadi et al. 2019;Ghalandari et al. 2019a;Ghorbani et al. 2020c;Joloudari et al. 2020;Mosavi et al. 2020;Sadeghzadeh et al. 2020;Shabani et al. 2020;Abdali et al. 2021;Mosavi and  Safaei-Farouji 2021) would be essential to bring an insight into the true potential of the proposed method. To improve the accuracy and the performance of the proposed method further deep learning, ensemble and hybrid methods for instance, those suggest in (Band et al. 2020b;Dehghani et al. 2020;Ghorbani et al. 2020a;Mosavi et al. 2020;Nabipour et al. 2020;Mousavi et al. 2021;Shamsirband and Mehri Khansari 2021) can come to the consideration.

Summary and conclusions
A large dataset of well-log data records compiled for the Asmari reservoir section penetrated by three Marun oil field wells (MN#163, MN#225 and MN#179) onshore Iran is compiled to predict shear wave velocity (V S ). The performances of two hybrid machine learning prediction models (MELM-PSO and MELM-GA), one deep learning model (CNN), and commonly used empirical models to predict V S are compared using the compiled dataset. For supervised training of the MELM-PSO, MELM-GA, and CNN models data from two wells (MN#163 and MN#225; 6622 data records split 70%:30% between training and validation subsets) were initially evaluated. To independently test the best-performing trained model (CNN), 2072 data records of MN#179 previously unseen by the trained and validated model were also evaluated. accuracy based on supervised learning using data records from wells MN#163 and MN#225 (RMSE = 0.0456 km/s; R 2 = 0.9808 when applied to all 6622 data records). • The hybrid machine learning algorithms MELM-PSO and MELM-GA, also provided highly credible V S predictions RMSE = 0.05 to 0.06 km/s; R 2 ~ 0.96 when applied to all 6622 data records), whereas the empirical model achieved V S prediction accuracy of RMSE > 0.11 km/s and R 2 < 0.87. • Applying the trained and validated CNN model to the previously unseen 2072 data records from the Asmari reservoir penetrated by well MN#179 achieved V S prediction accuracy of RMSE = 0.068 km/s and R 2 = 0.97. • This impressive prediction performance confirms that the CNN model trained with supervised data from two wells can be applied to accurately predict V S in other Asmari reservoir sections in the Marun oil field from basic well log variables where V S logs have not been recorded.
• Properly trained deep learning and hybrid machine learning models, such as those evaluated, offer a better method of predicting V S from multiple well-log variables, in a supervised context and with data previously unseen by the trained and validated models, than the commonly used empirical models based solely on V P data.