Hybrid connectionist model determines CO2–oil swelling factor

In-depth understanding of interactions between crude oil and CO2 provides insight into the CO2-based enhanced oil recovery (EOR) process design and simulation. When CO2 contacts crude oil, the dissolution process takes place. This phenomenon results in the oil swelling, which depends on the temperature, pressure, and composition of the oil. The residual oil saturation in a CO2-based EOR process is inversely proportional to the oil swelling factor. Hence, it is important to estimate this influential parameter with high precision. The current study suggests the predictive model based on the least-squares support vector machine (LS-SVM) to calculate the CO2–oil swelling factor. A genetic algorithm is used to optimize hyperparameters (γ and σ2) of the LS-SVM model. This model showed a high coefficient of determination (R2 = 0.9953) and a low value for the mean-squared error (MSE = 0.0003) based on the available experimental data while estimating the CO2–oil swelling factor. It was found that LS-SVM is a straightforward and accurate method to determine the CO2–oil swelling factor with negligible uncertainty. This method can be incorporated in commercial reservoir simulators to include the effect of the CO2–oil swelling factor when adequate experimental data are not available.


Introduction
Due to the growing concern about global warming and the ongoing demand for energy resources, CO 2 -based enhanced oil recovery (EOR) methods have been attracting both the scientific and industrial interests. When CO 2 is injected into depleted oil reservoirs, different mechanisms contribute to oil production (Farajzadeh et al. 2009;Godec et al. 2013;Kuznetsova and Kvamme 2002;Ma et al. 2016). These mechanisms depend on the operational conditions and oil composition. The most common oil production mechanisms in CO 2 -based EOR methods are oil viscosity reduction, oil swelling, condensation, vaporization, and interfacial tension (IFT) reduction Ahmadi et al. 2015;Bachu 2016;Czarnota et al. 2017;Farajzadeh et al. 2009;Li et al. 2013bShelton et al. 2016;Yang et al. 2012). Reducing the level of CO 2 emissions in the atmosphere by the use of geological CO 2 storage in depleted oil reservoirs as well as its role in the oil recovery processes highlights the importance of further studies of CO 2 injection operations and the corresponding PVT behavior (Ahmadi et al. 2016a, b;Bachu 2016;Davis et al. 2010;Jamali and Ettehadtavakkol 2017;Kim and Santamarina 2014;Li and Fan 2015;Liu and Wilcox 2012;Luo et al. 2013;Orr et al. 1982;Sell et al. 2012;Shelton et al. 2016;Yang et al. 2012;Yu et al. 2015;zeinali Hasanvand et al. 2013).
According to Rojas and Ali (1986) and Tunio et al. (2011), there are four effective mechanisms contributing to oil production using CO 2 -enhanced oil recovery strategies including (1) oil viscosity reduction, (2) oil swelling, (3) oil and water density reduction, and (4) vaporization and extraction of a portion of oil. It is clear that when CO 2 is dissolved in the oil phase, the oil swells and its viscosity reduces. Hence, the variation in the swelling factor allows CO 2 to substantially expand oil, which eventually improves the oil displacement and recovery (Perera et al. 2016). The immiscible CO 2 -EOR technique is dominated by the oil swelling phenomenon and oil viscosity reduction. The degree of oil swelling and oil viscosity change are Edited by Yan-Hua Sun & Mohammad Ali Ahmadi m.a.ahmadi@mun.ca dependent on different parameters including CO 2 solubility in oil, pressure, temperature, and API degree of oil samples. CO 2 solubility is generally considered as the most significant factor that influences the efficiency of CO 2based EOR techniques, particularly under low-pressure conditions. For instance, this mechanism was confirmed through implementation of pilot-scale tests in Turkey (Bagci 2007;Issever and Topkaya 1998;Perera et al. 2016). Experimental investigations and numerical reservoir simulations on binary systems including hydrocarbons and CO 2 were conducted to improve the hydrocarbon recovery (Bachu 2016;Bessières et al. 2001;Diep et al. 1998;Do and Pinczewski 1991;Fukai et al. 2016;Jamali and Ettehadtavakkol 2017;Kim and Santamarina 2014;Kiran et al. 1996;Kwak and Kim 2017;Li et al. 2013aLi and Fan 2015;Luo et al. 2007Luo et al. , 2013Lv et al. 2015;Mulliken and Sandler 1980;Shelton et al. 2016;Yang and Gu 2005). Most of these studies investigated the oil swelling effect primarily as a result of CO 2 dissolution in the light fractions of oil. Bessières et al. (2001) and Kiran et al. (1996) examined the variation in the volume of several CO 2alkane systems. They concluded that the excess volume follows a sigmoidal change with the composition/concentration of CO 2 . The oil swelling effect was determined by the volume swelling coefficient defined by Yang and Gu (2005) and Yang et al. (2012). These investigations reveal that with an increase in the pressure (and consequently the solubility of CO 2 in oil), the volume swelling coefficient of the oil phase increases. Yang et al. (2012) studied the behavior of oil swelling through qualitative analysis of the dispersion of CO 2 in oil. Experiments at reservoir conditions (high temperature, high pressure, and live oil composition) are, however, challenging. A swelling/extraction experiment is a well-known technique to record composition and reservoir fluid volume changes due to CO 2 dissolution in reservoir oil at a given temperature. Swelling experiments are typically carried out in a high-pressureresistant visual PVT cell with a constant volume, which is first filled with a specific volume of dead or stock-tank oil (Tsau et al. 2010). Depending on the number of steps required to reach the desired pressure, CO 2 is injected gradually to achieve a proper pressure increase. The main assumption of the swelling experiment is neglecting the vaporization of intermediate components of oil into the CO 2 phase until reaching the minimum miscibility pressure. The oil volume change owing to the swelling effect at each pressure step is recorded and the amount of CO 2 dissolved in the oil is measured. An increase in the pressure results in vaporization of a part of oil components, and the oil-rich phase shrinks. It should be noted that the phase behavior of the CO 2 and oil system can be visually detected using a swelling test. Various parameters including the bubble point pressure, solubility of CO 2 , and swelling factor are usually employed to tune the equation of state (EOS) for the phase behavior modeling purposes (Tsau et al. 2010). Different sizes of visual PVT cells can be utilized for swelling experiments; these cells include 140 mL (Hand and Pinczewski 1990), 170 mL (Harmon andGrigg 1988), 190 mL (Orr et al. 1981), and 450 mL. Holm and Josendal (1982) recommended that a volume of 30% of the cell volume should be considered as the sample size for the swelling test. Therefore, the proper volume range is 40-100 mL of the crude oil sample to perform the swelling tests using the corresponding PVT cells. The most important issue with the sampling size is the time needed to achieve an equilibrium condition after each pressure change. The mixing process of large volumes of gas and oil at a given pressure seems to be another major concern in such a swelling test. Thomas and Monger-McClure (1991) studied the effect of the CO 2 -oil swelling factor on oil recovery from light oil reservoirs using cyclic CO 2 injection. They correlated the oil incremental value to the CO 2 -oil swelling factor. Based on the results, an increase in the CO 2 -oil swelling factor led to an increase in the amount of produced oil (Thomas and Monger-McClure 1991). Dong et al. (2001) determined the CO 2 -oil swelling factor by comparing the measured densities of the dead oil sample, reservoir live oil, and mixture of CO 2 and reservoir oil. Ghedan (2009) claimed that at high CO 2 concentrations the CO 2 -oil swelling factor will be 1.25-1.6; in most of the cases, the CO 2 content should be greater than 50%. Ning et al. (2011) carried out several multiple contact experiments (MCEs) to figure out the contribution of oil swelling as well as reduction in oil viscosity to the oil production from Alaska North Slope viscous oil. Heidaryan and Moghadasi (2012) investigated the influence of swelling and viscosity reduction on oil production using both experimental and theoretical methods. Based on their research outcome, they concluded that the optimum value of the CO 2 -oil swelling factor should be 1.7 to reach the maximum oil production from the reservoir (Heidaryan and Moghadasi 2012).
Through a systematic research work, Sugai et al. (2014) experimentally determined oil swelling factors in porous media using two different types of micromodels (e.g., fine beads and coarse beads). They investigated the effect of interfacial area on the oil swelling and CO 2 -oil swelling factors. They used a digital camera to take images to determine the amount of oil trapped in the micromodels at different times. They obtained the swelling factor from the tests after a constant saturation degree in the porous systems was confirmed. In addition, they employed an oil-CO 2 simple contact model in a visual cell to determine CO 2 -oil swelling factors at different pressures via utilizing a digital camera and an image processing method. They compared CO 2 -oil swelling factors from both types of the experiments to decide what other parameters should be taken into account to further improve the accuracy and reliability of the existing approach. According to the experimental results, they concluded that an increase in the interfacial area results in increasing the oil swelling. In other words, the swelling factor in the case of the fine bead micromodel was larger than that in the coarse bead micromodel due to an increase in the interfacial area (Sugai et al. 2014). Or et al. (2016) experimentally investigated the contribution of CO 2 -oil swelling and viscosity reduction to the oil recovery through implementation of CO 2 gas foaming in heavy oil reservoirs. It was concluded that CO 2 foam swelling increases with an increase in the pressure drawdown in a well. Also, further swelling of foamy oil can mobilize the residual oil towards the producer well, especially in the immobilized zone . Habibi et al. (2017) carried out experiments on CO 2 -oil systems to determine the interaction between CO 2 and oil in tight rock samples. They conducted constant composition experiments (CCEs) to determine the CO 2 -oil swelling factor and other measurable fluid and thermodynamic characteristics. Also, they performed CO 2 cyclic injection experiments to determine the amount of oil recovery. The CO 2 -oil swelling factor in their study was defined as ''the volume of the oil after CO 2 injection divided by the volume of the oil before CO 2 injection into the cell.'' In their experiments, increasing CO 2 concentration from 48.4% to 71.1% resulted in an increase in the CO 2 -oil swelling factor from 1.21 to 1.39, respectively. According to their experimental data, the oil swelling and expansion, CO 2 dissolution into the oil, and CO 2 diffusion into core samples are the main mechanisms contributing to the oil production (Habibi et al. 2017).
There are a few studies that have developed a reliable correlation or a deterministic model for predicting CO 2 -oil swelling factors. Welker (1963) proposed a very simple correlation to estimate the CO 2 -oil swelling factor. Their correlation suffers from the lack of applicability, particularly for light and intermediate crude oil samples. Simon and Graue (1965) developed a graphical method to determine the oil swelling factor. Their method was developed based on limited data samples from heavy crudes. Chung et al. (1988) proposed a simple correlation to estimate the oil swelling factor for CO 2 /heavy crude oil systems. Emera and Sarma (2006) developed a correlation to forecast the oil swelling factor for both light and heavy crude oils. However, they utilized a limited number of data points while developing their correlation. Table 1 demonstrates a summary of correlations and models to calculate the CO 2oil swelling factor. Vapnik (1998) proposed the support vector machine (SVM) as an extended version of conventional artificial intelligent tools. SVM is a practical method which has been widely used for classification, regression, and pattern recognition (Cortes and Vapnik 1995). The principle idea of SVM is to transform the nonlinear input space to a higher-dimensional feature space to find a hyperplane via nonlinear mapping (Baylar et al. 2009;Cortes and Vapnik 1995). It is based on the statistical learning theory (SLT) and structural risk minimization (SRM) concepts (Mehdizadeh and Movagharnejad 2011). SVM tools obtain the solution via solving the quadratic programming (QP); the SVM always results in a global optimum solution, unlike other regression techniques such as neural networks, as the QP problem is a convex function (Vong et al. 2006). However, it suffers from computational burden.
The LS-SVM has not been used to model the CO 2 -oil swelling factor in the literature, to the best of our knowledge. This study employs the least-squares support vector machine (LS-SVM) paradigm, as a hybridized version of the original SVM method, to calculate the CO 2 -oil swelling factor. A genetic algorithm (GA) is utilized as an optimization technique to optimize the hyperparameters of the LS-SVM model. Through the comprehensive literature review, extensive experimental data are used for model development and validation.

Theory
2.1 Least-squares support vector machine (LS-SVM) Suykens and Vandewalle (1999) proposed the least-squares support vector machine (LS-SVM) model as an alternate formulation of the SVM regression. The LS-SVM enjoys similar advantages as SVM. Also, it requires solving only a set of linear equations instead of a quadratic programming (QP) problem, which is computationally less demanding. Given the training set fx k ; y k g, k ¼ 1; 2; . . .; N, where x k 2 R n is the kth input data in the input space and y k 2 R represent the output variable for the given input variable (i.e., x k ) and N refers to the number of the training samples. Using a nonlinear function uðÁÞ, which maps the training set in the input space to a high (and possibly infinite)dimensional space, the following regression model is constructed: in which, x denotes the weight vector and b signifies a bias term. Note that the superscript ''n'' refers to the dimension of data space and ''n h '' is attributed to the higher-dimensional feature space (Vong et al. 2006). When the LS-SVM is applied, a new optimization case will be generated. The implemented strategy deals with the following optimization problem: subject to the following equality constraint: where c represents the regularization parameter, which compromises between the model's complexity and the training error (Mehdizadeh and Movagharnejad 2011), and e k is the regression error. The Lagrangian is constructed as follows in order to find the solution of the un-constrained optimization problem: where a k stands for the Lagrange multiplier or support value. To attain the solution of Eq. (4), differentiating the equation with respect to x; b; e k ; a k gives: After removing the variables x and e, one acquires the Karush-Kuhn-Tucker system as follows: In Eq. (9), Þ is the kernel function and must meet Mercer's condition (Li et al. 2008). Three typical choices for the kernel function are: The resulting formulation of LS-SVM model for function estimation becomes: Developed for oils at T = 80°F and 20°API \ oil gravity \ 40°API Welker (1963) Graphical correlation: This model is a function of CO 2 solubility, oil molecular weight (MW), and oil density at 60°F. Not recommended for high-pressure ranges 12°API \ oil gravity \ 33°API Simon and Graue (1965) SF ¼ q l qÀS S = CO 2 solubility, g/cm 3 q = oil density without CO 2 at the same temperature and 1 atm pressure, g/cm 3 q l = solution density, g/cm 3 16.89°API oil gravity Chung et al. (1988) For MW [ 300 12°API \ oil gravity \ 37°API Emera and Sarma (2006) where s refers to the slope, d stands for the polynomial degree, r 2 is the kernel sample variance, and b; a ð Þ represents the solution to the linear system of equations shown in Eq. (9).
In the literature, some comprehensive descriptions of the SVM are available (Burges 1998;Suykens and Vandewalle 1999;Vapnik 1998). The theory of LS-SVM is systematically explained by a number of researchers (Suykens and Vandewalle 1999;Suykens et al. 2002). Also, Liu et al. (2005aLiu et al. ( , b, 2007 provide a detailed comparison of the SVM and LS-SVM methods.

Genetic algorithm
Genetic algorithm (GA) is a stochastic method to solve optimization problems involving a fitness criterion, survival of the fittest, and different genetic operators, including crossover and mutation to satisfy a pre-defined fitness quantity, resembling the Darwinian evolution by natural selection (Niazi et al. 2008). The significant feature of the GAs and the other similar evolutionary algorithms is that they are derivative-free. The stochastic nature of the algorithm with dynamic evaluation of the fitness function brings a powerful systematic random search engine. This approach is an alternative to derivative-based methods to deal with problems in which the fitness function is nondifferentiable, discontinuous, highly nonlinear, with multiple local optima, or stochastic (Reihanian et al. 2011).

Data gathering
Extensive data points for the CO 2 -oil swelling factor have been extracted from the literature Chung et al. 1988;Mosavat et al. 2014;Tsau et al. 2010;Wei et al. 2017). The statistical parameters for these data samples are reported in Table 2. As it is clear from Table 2, the data samples contain a broad range of crude oils from heavy oils to extra-light oil samples. The collected data also cover a wide range of temperature, pressure, and CO 2 solubility.

Methodology
In this paper, four parameters are considered as input variables to the LS-SVM model. These parameters are (1) CO 2 concentration in oil (mole fraction of CO 2 ), (2) pressure, (3) temperature, and (4) the oil API degree. The output variable from the LS-SVM model is the CO 2 -oil swelling factor.
A total number of 225 data samples were extracted from the literature to develop our LS-SVM model to estimate the CO 2 -oil swelling factor. The data samples were divided into two data sets. The first set (also called the training data series) contained 80% of the total data points to construct the LS-SVM model. The second set of data contained 20% of the entire data points employed to validate the LS-SVM model.
The radial basis function (RBF) was selected because of its promising performance and simplicity as it only contains one adjustable parameter and has been successfully applied (Ahmadi 2015;Keerthi and Lin 2003;Reihanian et al. 2011). In the model development using LS-SVM with the RBF kernel function, according to Eqs. (9) and (10), the optimization of c and r 2 is a crucial task. It was confirmed that the optimal magnitudes of these two vital parameters are required to better design a LS-SVM model towards greater precision and generalization (Vong et al. 2006).
According to Ahmadi and Ebadi (2014), Ahmadi et al. (2014a, b), and Fazeli et al. (2013), the application of nonpopulation-based optimization methods such as simulated annealing and Levenberg-Marquardt (LM) is not recommended due to their drawback in handling the nonlinearity in SVM methods. GA was applied in this research study to optimize the parameters of LS-SVM (c and r 2 ) and the Table 2 Statistical parameters of the data points Chung et al. 1988;Mosavat et al. 2014;Tsau et al. 2010;Wei et al. 2017)   average absolute relative deviation (AARD). The flowchart for the hyperparameter optimization using a GA algorithm is depicted in Fig. 1. The optimization procedure was repeated several times to attain the most plausible solution corresponding to the global optimum of the fitness function. As a result, values of r 2 and c were obtained: 0.268829 and 33.4091, respectively.

Results and discussion
This study presents a new deterministic approach to obtain the swelling factor with higher accuracy. The oil swelling factor for the system of CO 2 and light oil versus pressure at different temperatures is demonstrated in Fig. 2 Mean-squared error (MSE) and coefficient of determination (R 2 ) are employed in this statistical analysis as the performance evaluation criteria for the LS-SVM model in estimating the CO 2 -oil swelling factor. The expressions to obtain MSE and R 2 are given below: where N represents the number of data points, y actual i denotes the ith observation (real data), y predicted i is the ith output from the model, and y actual signifies the average magnitudes of observations. The values of MSE and R 2 are tabulated in Table 3 for training, testing, and overall data stages. The GA-LS-SVM predictions are satisfactory if R 2 and MSE are close to 1 and 0, respectively. As can be seen in Table 3, these criteria were fulfilled.  Figure 5a shows a comparison between the estimated and experimental data in the training phase. Figure 5b demonstrates a comparison between the actual and predicted CO 2 -oil swelling factor behavior against the data index in the testing phase. As illustrated in Fig. 5, there is an excellent match between the oil swelling factor estimated from the LS-SVM method and those from experiments. Figure 6 illustrates the regression plot between the CO 2 -oil swelling factor determined by LS-SVM model and the experimental data points. Figure 6a depicts the scatter plot for results obtained in the training phase of the LS-SVM model. As shown in Fig. 6a, the linear fit to data y = 0.9892x ? 0.0103 has a high correlation of coefficient (R 2 = 0.9944), meaning that the training phase of the LS-SVM model is performed very well. The results achieved over the testing (validation) phase are displayed in the form of a scatter plot in Fig. 6b, based on the developed LS-SVM tool. As depicted in Fig. 6b, the high value of the correlation coefficient (R 2 = 0.9931) between the predicted and experimental oil swelling factor shows the superior performance of the LS-SVM model. Figure 6c illustrates the regression plot for the whole data set. The predicted swelling factor values are found to be scattered around the y = x line, indicating that the LS-SVM model that is optimized by GA predicts the swelling factor very well. Figure 7 represents a comparison between the CO 2 -oil swelling factor determined by the LS-SVM model and the real data versus pressure at different temperatures. As shown in Fig. 7, the LS-SVM model follows the trend of experimental data points for an immediate oil of 29.4°API gravity. As the experimental data points show, at a constant pressure, the magnitude of swelling factor lowers with increasing the temperature. This behavior was confirmed by the LS-SVM model. This implies that the proposed LS-SVM model for determination of CO 2 -oil swelling factor is valid and acceptable in terms of technical and conceptual prospects. Figure 8 shows the relative error distribution for both the training and testing phases in developing the LS-SVM model. According to Fig. 8, the maximum relative error between the outputs of the LS-SVM model and the experimental CO 2 -oil swelling factors is within ± 5% for the training phase. Also, the maximum relative error between the CO 2 -oil swelling factor calculated by the LS-SVM model and experimental data points is within ± 15% for the testing phase. Simon and Graue (1965) proposed a graphical method for determination of the CO 2 -oil swelling factor. In this method, the minimum value of the CO 2 -oil swelling factor is equal to 1 and the maximum value is equal to 1.38. Also, the Simon and Graue technique offers acceptable values for swelling factor within the limited ranges of API, temperature, and CO 2 solubility (Table 1). Hence, this graphical method is not able to provide reliable outputs over wide ranges of the input parameters. Figure 9 demonstrates the scatter plot of the results obtained by the graphical method proposed by Simon and Graue (1965) versus the experimental values of the CO 2 -oil swelling factor. As it is clear from Fig. 9, the linear fit line has a low correlation coefficient (R 2 ). Also, the linear fit has a negative slope, concluding that the value of oil swelling factor at the lower boundary is overestimated.  Figure 10 presents a comparison between the objective function values calculated by Emera and Sarma (2006) correlation and the real data of the CO 2 -oil swelling factor. Based on Figs. 9 and 10, the linear fit of the data obtained by Emera and Sarma (2006) correlation has a higher value of correlation of coefficient in comparison with the method proposed by Simon and Graue (1965). This is because the correlation introduced by Emera and Sarma (2006) was developed using a wider range of data points. However, this correlation still suffers from the common drawback for the most empirical correlations which can offer reliable outputs within limited ranges of input parameters ( Table 1). As illustrated in Fig. 10, the Emera and Sarma (2006) correlation underestimates the magnitudes of the swelling factor in the middle range of the data. Table 4 reports the maximum absolute error (MAE) and the average absolute relative deviation (AARD) for three different models based on the experimental data available for the CO 2 -oil swelling factor. The MAE of the LS-SVM model is lower, compared to the Emera and Sarma (2006) and Simon and Graue (1965) methods. This superior performance is attributed to the high predictive capability of the developed tool, proper procedure for the training phase, and careful selection of data samples. Using a broader range of data samples enables us to develop a more precise and reliable technique to calculate the CO 2 -oil swelling factor.
It should be noted the correlation proposed by Emera and Sarma (2006) is currently being used in the Computer Modelling Group (CMG) reservoir simulator package. It suggests that the LS-SVM strategy introduced in this research work can be included in the commercial reservoir simulators for various applications such as simulation of gas injection processes in the petroleum industry.
Appropriate statistical methods for identifying the applicability of the model are required for outlier detection. Recognition of outliers is to determine which data points may differ from the bulk of the data present in the data bank under study (Gramatica 2007;Rousseeuw and Leroy 2005). For examining the capability of the LS-SVM model, the approach of Leverage Value Statistics has been carried out (Goodall 1993;Gramatica 2007). A graphical method (William plot) is used for outlier determination in this research work. The William plot depicts the standardized residual of the outputs versus corresponding hat (H) values. Further details on the mathematical backgrounds and computational procedure of the William method can be found in the literature (Goodall 1993;Gramatica 2007;Rousseeuw and Leroy 2005). Figure 11 represents the William plot for the results obtained from the LS-SVM model while estimating the CO 2 -oil swelling factor. Having the majority of data points in the ranges of 0 H 0:055 and À3 R 3 reveals that the LS-SVM model is convincing and reliable in terms of statistical criteria. In addition, it conveys the message that the entire data are located within the acceptable domains, again confirming the LS-SVM model offers accurate and satisfactory predictions.
Analysis of variance (ANOVA) was used to determine the relative importance of all the input parameters which are incorporated in this modeling strategy to develop the connectionist tool for estimation of CO 2 -oil swelling factor. The relative significance of the independent variables including API oil gravity, temperature, pressure, and CO 2 concentration (mole fraction) on the swelling factor is demonstrated in Fig. 12. As it is clear from the results, the most significant independent parameter is the API degree of the oil samples, temperature holds the second rank, and the CO 2 concentration exhibits the least impact on the target parameter.
To show the effectiveness of the developed model for a real case, we consider sample AC with the composition reported in Table 5. A swelling test was performed on this sample with different mole fractions of CO 2 . As mentioned previously, one of the methods for swelling factor determination is using EOSs. Thus, the Peng-Robinson EOS as a well-known and robust EOS was utilized to calculate the CO 2 -oil swelling factor of sample AC. Figure 13 displays a comparison between the outputs obtained from the LS-SVM model, Peng-Robinson EOS, and experimental data from a swelling test performed on sample AC. As illustrated in Fig. 13, both LS-SVM and Peng-Robinson EOS predict the CO 2 -oil swelling factor with reasonable accuracy. In this case, the LS-SVM underestimates swelling factor; however, using Peng-Robinson EOS results in overestimating the swelling factor. The residual oil saturation, which directly corresponds to the oil recovery factor is inversely proportional to the swelling factor in CO 2 -based EOR processes. Hence, an accurate magnitude of the CO 2 -oil swelling factor increases the precision and reliability of the modeling and simulation studies, which are conducted to capture the main recovery mechanisms and to determine the production 0 0.4 0.8 performance of CO 2 -EOR strategies for both heavy oil and conventional oil reserves. The present study introduces an accurate and simple-to-use approach to calculate the CO 2oil swelling factor, which is an influential parameter throughout CO 2 injection operations. The precise value of this parameter helps engineers and researchers obtain the residual oil saturation and oil and water relative permeability curves with greater reliability for various oil reservoir development stages (e.g., optimization of operational conditions and economic analysis).

Conclusions
We used the least-squares support vector machine (LS-SVM) to estimate the CO 2 -oil swelling factor where the extensive experimental data were utilized. The genetic algorithm (GA) was employed to tune the model parameters. The following conclusions based on the research outputs are made: • The feasibility and performance of the LS-SVM technique with a RBF kernel function were evaluated using the available experimental data on CO 2 -oil swelling factors.    Fig. 12 Relative importance of the independent variables affecting the swelling factor • GA was implemented to determine the optimal extent of the model parameters; namely, regularization factor and variance used in the kernel function which were obtained to be: c = 33.4091 and r 2 = 0.268829, respectively. • The hybridized GA-LS-SVM provided excellent results in predicting the CO 2 -oil swelling factor. The performance of the hybrid model was evaluated by R 2 = 0.9953 and MSE = 0.0003, which reveal high accuracy and reliability of the developed model. • The relative importance of input variables including API gravity of oil, temperature, pressure, and CO 2 concentration (mole fraction) on the CO 2 -oil swelling factor was investigated using a common statistical approach, ANOVA (analysis of variance). The API gravity of oil, temperature, pressure, and CO 2 content have the highest to the lowest impact on the objective function in the research study. • The LS-SVM features high efficiency, excellent generalization, and routine computation methodology, which is suitable for classification and identification of nonlinear cases such as CO 2 -oil systems.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creative commons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.