Introduction

The world's energy consumption and demand, largely driven by oil and gas, significantly overshadow the utilization of renewable energy sources. Forecasts indicate a pressing need for global oil production to surge to approximately 20 million barrels per day (MMBL/day) by 2050 to meet escalating energy requirements (Energy Information Administration, 2022). Waterflooding stands as a widely practiced secondary oil recovery technique, entailing the injection of highly saline water into reservoirs to displace oil towards production wells. While conventional waterflooding offers benefits such as increased oil recovery factor (RF), improved sweep efficiency, and pressure maintenance, its efficacy can be compromised in reservoirs with elevated salinity levels. To address this challenge, a hybrid enhanced oil recovery (EOR) method known as low salinity water alternating immiscible gas CO2 (Immiscible CO2-LSWAG) injection has emerged (Al-Saedi & Flori 2019; Dang et al. 2016).

Immiscible CO2-LSWAG injection presents a promising solution by integrating the advantages of low salinity waterflooding and immiscible CO2 flooding. This approach modifies the macroscopic and microscopic displacement properties of the reservoir, resulting in heightened sweep efficiency and increased oil production (Dang et al. 2014). As the petroleum industry grapples with the dual challenge of meeting rising energy demands while optimizing oil production, immiscible CO2-LSWAG injection, coupled with machine learning techniques, offers potential solutions (Carvalhal et al. 2019; Gorucu et al. 2019).

Previous studies have extensively explored enhanced oil recovery methods, emphasizing the efficacy of waterflooding in displacing oil and enhancing oil RF. However, conventional waterflooding's performance can be constrained by factors such as reservoir heterogeneity and residual oil presence (Al-Saedi et al. 2019). Immiscible CO2-LSWAG injection, with its ability to modify reservoir wettability and enhance displacement efficiency, has garnered attention for its potential to boost oil production (AlQuraishi et al. 2019; Kumar et al. 2016).Research investigating the impact of immiscible CO2-LSWAG injection on oil recovery factor has employed various techniques, including experimental studies, numerical simulations, and machine learning approaches (AlQuraishi et al. 2019; Dang et al. 2016; Zolfaghari et al. 2013). Coreflood experiments have demonstrated promising results, with immiscible CO2-LSWAG injection yielding higher oil RF compared to conventional methods (Zolfaghari et al. 2013; Kumar et al. 2016). Additionally, recent studies have explored the application of immiscible CO2-LSWAG injection in different reservoir types, such as carbonate cores, highlighting its potential for incremental oil recovery (Bastos et al. 2023).

Numerical reservoir simulation studies have delved into the intricacies of modeling three-dimensional (3D) immiscible CO2-LSWAG injection in oil-wet sandstone. Factors such as reservoir formation, injection brine composition, WAG ratio, reservoir mineralogy, temperature, and pressure all play crucial roles in affecting oil recovery factor (RF) (Dang et al. 2016; Dang et al. 2014). This hybrid EOR approach has exhibited superior oil RF recovery compared to both low salinity waterflooding (LSWF) and conventional CO2 injection methods (Dang et al. 2016, 2014). Further insights have been gained through simulation studies exploring LSWF in oil-wet sandstone and the role of salinity levels in optimizing oil RF (Dang et al. 2015). Mineral dissolutions, facilitated by multiple ion exchange (MIE), have been identified as mechanisms enhancing wettability alteration and subsequently increasing oil RF (Naderi & Simjoo 2019). Notably, immiscible CO2-LSWAG injection has demonstrated remarkable potential, achieving a 90% increase in oil RF compared to conventional methods and LSWF (Naderi & Simjoo 2019). Studies by Carvalhal et al. (2019) and Gorucu et al. (2019) have underscored the significant influence of CO2 content and fine migration on oil RF during immiscible CO2-LSWAG injection. Moreover, investigations into simultaneous miscible CO2-WAG injection have revealed promising results in boosting oil RF compared to conventional WAG processes (Nasser et al. 2023).

To expedite reservoir performance predictions and mitigate computational challenges, researchers have turned to machine learning proxy models. These models, exemplified in studies by Cheraghi et al. (2021), Dang et al. (2018), and Liang and Zhao (2019), offer faster results and are instrumental in sensitivity analysis, history matching, and production optimization (Saberi et al. 2021). Van and Chon (2017) and Van and Chon (2018) have successfully utilized artificial neural networks (ANN) to predict conventional CO2-WAG performances, showcasing high prediction accuracies. Similar success has been achieved in predicting CO2-WAG development using random forest algorithms, as demonstrated by (Li et al. 2022) and (Gao et al. 2023).

Machine learning models have revolutionized reservoir management by enhancing prediction accuracy, reducing uncertainty, and facilitating better decision-making (Merdeka et al. 2022). Their application in Enhanced Oil Recovery (EOR) operations offers quicker and more precise results compared to traditional simulations, leading to improved production optimization and increased oil recovery rates (Gao et al. 2023). However, to fully harness the potential benefits of immiscible CO2-LSWAG injection and machine learning, sensitivity analysis is crucial for identifying the key parameters influencing oil Recovery Factor (RF). Gradient boosting, a widely-used machine learning technique, has demonstrated effectiveness in solving diverse challenges such as handling heterogeneous features and noisy data (Dorogush et al. 2018). Catboost, a novel gradient boosting library, excels in managing categorical features and exhibits faster performance compared to other methods (Dorogush et al. 2018). Similarly, LightGBM implements advanced techniques like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) to enhance accuracy and reduce computational complexity (Ke et al. 2017).

This study evaluates immiscible CO2-LSWAG injection in oil-wet sandstone reservoirs, using Catboost and LightGBM algorithms to identify key parameters affecting oil RF. These algorithms were chosen for their efficiency and ability to handle overfitting. Hyperparameter tuning was conducted to enhance model performance. Various injection and reservoir parameters were examined across one thousand values, aiming to optimize oil production. Insights from this research can guide decision-making during immiscible CO2-LSWAG field trials, enhancing efficiency and cost-effectiveness in oil recovery operations to meet global demand by 2050.

Screening criteria for CO2-LSWAG

Prior studies on the miscible and immiscible CO2-LSWAG injections concluded that certain conditions should be satisfied to obtain a positive effect in applying this hybrid EOR through its mechanisms. Therefore, the prescreening criteria must be done before the field trial implementation is conducted to maximize the oil production in implementing the CO2-LSWAG injection (Dang et al. 2016). The summary of the prescreening criteria for this hybrid EOR injection is gathered in Table 1. These crucial parameters are considered in this study, and the type of reservoir from this study is a sandstone formation. Additionally, this project had a reservoir mineral that contained Quartz, Orthoclase, Magnetite, calcite, etc. Afterwards, the clay content, reservoir water composition and concentration, injection water composition and concentration that contained monovalent and divalent cation are also used in this study.

Table 1 Prescreening criteria for CO2-LSWAG injection (Dang et al. 2016)

There was an interesting finding by Jiang et al. (2010), who reported that the miscible CO2-WAG produced higher oil RF than miscible CO2-LSWAG, and the reason was because of the degradation in CO2 solubility. However, the entire core samples used in these experiments were in extremely water-wet conditions with very low clay content. These are unfavorable conditions for immiscible CO2-LSWAG. Afterward, AlQuraishi et al. (2019) conducted another experiment in this hybrid EOR using miscible CO2 injection. The study obtained a similar result, where the CO2-WAG provided higher oil RF on the Bentheimer core with extremely low clay content, in detail 0.5% of clay content. These phenomena were caused by the limited factor on the experiment prescreening criteria as shown in Table 1. since one or more important parameters were absent. Polar component in the oil is necessary for improve oil recovery (IOR) since the oil without polar component did not show any response to LSWF. Reservoir minerals are also known as the cation exchange materials. The presence of divalent cations are important since it can induce high ion exhange capability as well as the presence of connate water. mixed and oil condition is preferred since the LSWF and CO2 precipation and dissolution could generate wettalability alteration. Ability to achieve MMP obviously could further helping to reduce oil viscosity and increase oil relative permeability. Lower salinity than the reservoir brine is also reponsible for the ion exchange. Further detail of each pre-screening criteria please refer to Dang et al. (2016) and Katende and Sagala (2019). Therefore, the presence of these parameters is essential to be considered in the model for securing the success of applying the immiscible CO2-LSWAG injection in the reservoir scale.

Methodology

This project used two different commercial software in building numerical reservoir simulations: Petrel and CMG. Afterwards, the modeling proceeds using machine learning application. The project’s framework is shown in Fig. 1. The research study combined numerical simulation and machine learning applications to conduct a sensitivity analysis of oil recovery factor in immiscible CO2-LSWAG injection on oil-wet sandstone reservoir. The modeling process commenced with the utilization of Petrel software to construct the clay volume, permeability, and porosity distribution. Subsequently, CMG Winprop was used to create the fluid properties in a compositional model. CMG GEM was employed to establish the rock properties, reservoir conditions, and immiscible CO2-LSWAG injection plan for predicting oil RF. To generate the required dataset, CMG CMOST produced 1000 data samples for initial oil-wet sand. The input features comprised the parameters influencing immiscible CO2-LSWAG injection, while the output features represented the oil RF. Finally, machine learning techniques were applied to the dataset, The splitting ratio of data is 80% training (training + validation) and 20% testing. This study used 1000 datasets, 80% of this amount is 800 datasets. Then, we split it further with the ratio of 80% of 800 datasets for training equals to 640 datasets and 20% of 800 dataset for validation equal to 160 datasets. Lastly, another 200 datasets are for testing. This amount of splitting ratio was also used by Moosavi et al. (2019). However, they did not use validation to validate the coefficient determination (R2). The selected splitting ratio concerned a larger portion of the data should be allocated to training to ensure the model learns patterns and features adequately. Secondly, a separate validation set is critical to assess the model’s generalization performance on unseen data. Lastly, the test set provides an unbiased evaluation of the model’s true perfomance mentioned by Asante et al. (2023) who also used 80% training and 20% testing. Catboost, LightGBM, KNN, random forest, ANN, extreme gradient boosting (XGBoost), and group method of data handling (GMDH) would be simulated and compared in terms of the model evaluation. Afterwards, the top two model evaluation from the algorithms would be used to perform sensitivity analysis studies to find the significant parameters affecting oil RF in implementing this hybrid EOR injection.

Fig. 1
figure 1

The project framework

Data set up for the CO2-LSWAG injection project

The numerical simulation data was utilized for obtaining the oil RF on a reservoir (field) scale. The oil RF is defined as the ratio of oil production cumulative over oil reserve in the reservoir multiplied by one hundred percent. Table 2 shows the modeling parameters. The grid size used 100 ft, and the reason was that the investigation was focused on the fluid flow phenomena that occur from the injection well into the production wells. Therefore, it was unnecessary to have a smaller grid size which is usually used in observing the fluid flow phenomena in near wellbore conditions. Additionally, this grid size was also applied by Dang et al. (2016) to apply miscible CO2-LSWAG injection in a sector model of a reservoir scale (3D model). This study investigated the performance of this hybrid EOR injection in a sectors reservoir model.

Table 2 Modeling parameters

Rock properties and modeling parameters

This project used geological commercial software (Petrel) to capture geological uncertainty and create all important parameters in the reservoir, including distribution and quantities of clay volume, permeability, and porosity. The initial reservoir property modelling was performed on creating the permeability distributions. Afterwards, the co-kriging technique was implemented to generate the porosity and clay volume distribution models, where the permeability was treated as a variable that had a statistical relationship with porosity. This technique is typically used for estimating properties at a location that does not have measured data. The rock properties of this study are shown in Table 3, and Fig. 2 shows the distribution of rock properties with the standard deviation of 0.05 and the distribution of the wells. These approaches are a powerful method to create reservoir properties, especially in the reservoir (large-scale) modeling. Dulang crude oil and rock properties were used in applying immiscible CO2-LSWAG due to its previous field application in CO2-WAG injection. Therefore, this field would be considered suitable for applying this hybrid EOR. The reservoir mineral and clay content are one of the main mechanisms or consideration for choosing immiscible CO2-LSWAG injection to oil RF. The dulang field contain these parameters, becoming another consideration to use this rock properties.

Table 3 Rock properties of Dulang field (Ali et al. 2005; Hussain et al. 1992)
Fig. 2
figure 2

Distribution of rock properties and well distribution in 3D sector reservoir model

According to Dang et al. (2016), who modelled and compared the performance of miscible CO2-LSWAG to miscible CO2-WAG, LSWF, miscible CO2 flooding in a sandstone reservoir. Their work found that the inverted five-spot pattern in applying miscible CO2-LSWAG injection could produce the highest oil production compared to the aforementioned conventional EOR methods. Therefore, in this current work, a similar well pattern was utilized. Regarding the well spacing, the study used 18.6 acres of well spacing.

Further modeling process requires the relative permeability data to observe the wettability alteration after the application of immiscible CO2-LSWAG injection. Figure 3 shows the initial wettability of oil-wet condition before and after injecting the hybrid EOR method. The wettability alteration occurs due to the mechanism of this hybrid EOR as shown in Fig. 5 and contributed by the Eqs. 815.

Fig. 3
figure 3

Relative permeability data in oil-wet sand conditions

Fluid properties

This research study used brine composition and crude oil data from the Dulang field. Table 4 shows the Dulang crude oil composition. Table 5 shows the brine composition, seawater composition, and the variation of low salinity concentrations thereafter. These salinity concentrations were design on the Laboraroty pore scale study by Rahman and Dzulkarnain (2021) in applying CO2-LSWAG injection. The low salinity ranges were determined by considering works by Dang et al. (2015) that mention low salinity concentration were within these ranges could still reflect the LSWF positive performance.

Table 4 Dulang crude oil compositions (Zain. Md et al. 2001)
Table 5 Brine composition from Dulang and modified salinity concentration (Rahman and Dzulkarnain 2021)

In this study, the Peng-Robinson (1978) equation of state (EOS) was employed to predict the behavior of naturally occurring hydrocarbon systems. The fluid modeling process commenced with the constant composition expansion (CCE) test, which allowed the observation of how the relative total volume responded to pressure changes both above and below saturation conditions. Following the CCE test, the researchers utilized the differential liberation calculation to model the response of the oil formation volume factor and solution gas-oil ratio (GOR) to changes in pressure. Additionally, the swelling test from Dulang crude oil was modelled, and the test provided information on the fluid behavior under gas injection processes. When a gas is injected into a reservoir, it can go into the solution and swell the oil, increasing the oil volume. Lastly, the fluid models are matched towards Dulang crude oil properties by tuning the critical pressure and critical temperature of the C7-C11+ to imitate the real fluid properties from this field. The minimum miscibility pressure (MMP) of Dulang crude oil is 2875 psi (Rosman et al. 2011). CO2-LSWAG injection would be performed under immiscible injection. Hence, the pressure injection would be lower than the MMP of the Dulang crude oil.

Figure 4a displays the modeling results of the CCE test, and the error was 0.023%. The reduction of pressure at lower saturation pressure would increase the relative volume as shown in this Figure. The ROV stands for relative oil volume (ROV), EXP ROV is the exponent relative oil volume, and Psat is saturation pressure. The saturation pressure in this modeling is 1600 psia. Afterwards, Fig. 4b shows the differential liberation test with an error of 1.65%. The increasing of pressure would increase the ROV and gas oil ratio (GOR) until it reached Psat. Exp is also the term of exponents. Lastly, Fig. 4c exhibits the result of the swelling test, and it had a 0.22% error. A higher gas composition would increase the swelling factor and the saturation pressure. The exponent point (dot points) represented the real Dulang’s crude oil properties. Therefore, the closer the exponent value to the simulation results (line) would indicate a higher accuracy of duplicating the Dulang crude oil performances in changing the pressure and the gas injection composition.

Fig. 4
figure 4

a Pressure vs relative volume, b oil formation volume factor and solution GOR c effect of gas injection composition on swelling factor

The compositional model of oil, water, and gas in applying CO2-LSWAG injection is crucial in order to detect each salt composition and concentration's effect on reservoir rock properties as the fluid is not treated as a single component i.e. water or oil only without considering its composition. Afterward, this model could reflect the CO2 effect on the oil properties namely oil swelling and viscosity reduction due to gas dispersion and diffusivity (Teklu et al. 2016).

Design of experiment (DOE) CO2-LSWAG parameters

In this study, the Design of Experiments (DOE) was utilized to establish the acceptable range for each parameter that would be applicable in the implementation of immiscible CO2-LSWAG injection. Moreover, The Design of experiment (DOE) is used to evaluate the reservoirs and operating parameters and to determine their optimum values, which were then used to predict the performance of the reservoir (Evans et al. 2021). The ranges were selected from citating and considering the previous published work on the miscible and immiscible CO2-LSWAG and LSWF injection. The significance of this ranges could provide the information regarding the most affecting parameters in applying this hybrid EOR through sensitivity analysis. Subsequently, the minimum and maximum values of each parameter or feature were set in CMG CMOST to generate 1000 data samples for the initial oil-wet conditions, encompassing 18 parameters or features. These generated data were then employed in the sensitivity analysis of oil recovery factor (RF) using machine learning applications, specifically the Catboost and LightGBM algorithms. It is worth noting that many other researchers have also employed this technique in their studies. (Dang et al. 2018; Hidayat and Astsauri 2022; Sierra et al. 2020; Thanh et al. 2020).

Table 6 shows the Design of experiment for this hybrid EOR, where the minimum and maximum values of water composition in the reservoir (aqueous) and in the injection well were obtained from Rahman and Dzulkarnain (2021), who experimented on the performance of CO2-LSWAG injection in a sandstone reservoir. The clay content value was initially obtained from the project that had been done by Dang et al. (2016). In order to observe the effect of clay distribution in the reservoir, this study varied the clay values from 5 to 13%. Additionally, the gas injection rate (Qginj) effect was also considered in this study. Therefore, the Qginj values were varied as can be seen in Table 6. A similar approach was also applied to the water injection rate to study its effect on the oil RF. This study used the continuous real for all the parameters, except the Aqueous SO4 that used discrete real for the variables.

Table 6 Design of experiment (DOE) immiscible CO2-LSWAG parameter (Dang et al. 2016; Rahman and Dzulkarnain 2021)

In detail, this project simulated three types of different EOR namely, CO2-LSWAG, CO2-WAG, LSWF, and one baseline scenario immiscible CO2 flooding. The simulation period spanned from 1 January 2022 to 1 January 2026 as shown in Table 7. Continuous LSWF injection was performed throughout the production time, followed by four years of continuous CO2 flooding. CO2-WAG injection involved alternating between high salinity water and CO2 injection in six cycles starting from the beginning of production time. Similar to CO2-WAG, CO2-LSWAG was implemented with the hybrid WAG method, utilizing low salinity water for injection. One cycle of CO2-WAG and CO2-LSWAG took eight months (4 months water injection and 4 months gas injection). The study aimed to evaluate the performance of these EOR techniques to identify the most effective approach for enhancing oil recovery.

Table 7 Injection scheme of three different types of EOR and baseline scenario immiscible CO2 flooding

Low salinity water alternating immisicible and miscible gas CO2 mechanisms.

CO2-LSWAG injection is the combined process between LSWF and CO2 flooding. Hence, integrated modeling must be created to capture the mechanisms of this hybrid EOR. Earlier mentioned that the mechanisms of LSWF are ion exchange, pH modification, fine migration, geochemical reaction, etc. These phenomena would increase water wetness, leading to higher oil mobilization. Afterwards, the CO2 flooding could swell the oil, reduce the oil density, and oil viscosity resulting in further mobilizing the oil. Moreover, when these EOR methods were combined, it would overcome the disadvantages of CO2 flooding that had poor mobility control, in order to overcome the issue, the WAG process should be carried out by altering the CO2 flooding into the LSWF. These processes would be repeated for several numbers of cycles. Additionally, the advantages of this hybrid EOR would also lead to higher carbonated water (more CO2 soluble in the water) to alter the wettability, reduce IFT and viscosity further (Dang et al. 2016).

On the basis of the forgoing, this study considered the mechanisms of the CO2-LSWAG injection. Figure 5 shows the key elements of the modelling process.

Fig. 5
figure 5

Key elements for modeling CO2-LSWAG process (Dang et al. 2016)

Catboost and LightGBM hyperparameter tuning and sensitivity analysis

The machine learning models were run in python programming languages using Catboost and LightGBM algorithms for performing sensitivity analysis study to oil RF. Afterwards, this study set hyperparameters tuning to obtain a higher quality of model evaluation (Hancock and Khoshgoftaar 2021). Afterward, hyperparameter tuning is also an essential procedure in the machine learning model development since it directly impacts the performance of the model. Afterward, hyperparameter tuning could improve the model, prevent overfitting, and computational efficiency. Table 8 is the hyperparameters of Catboost, and Table 9 is the hyperparameters of LightGBM.

Table 8 Catboost hyperparameters tuning values (Bentéjac et al. 2021)
Table 9 LightGBM hyperparameter tuning values (Bentéjac et al. 2021)

CatBoost is utilized to handle categorical features efficiently and is particularly well-suited for tabular data (Hancock and Khoshgoftaar 2020). In the case of CatBoost, sensitivity analysis could refer to understanding how changes in input features impact the model's predictions. This can include examining the importance of different features or assessing how model performance changes with variations in specific input values.

CatBoost employs a technique known as "Permutation Feature Importance" for assessing feature importance or sensitivity analysis (Szczepanek 2022). The process intially train the CatBoost model on the original dataset to obtain a baseline performance. Secondly, for each feature, randomly shuffle the values of that feature across all samples in the dataset, leaving the target variable unchanged. Then, use the model to make predictions on the modified dataset this is known as permutation importance. Thirdly, Compare the model's performance on the modified dataset with the baseline performance on the original dataset. The decrease in performance indicates the importance of the feature. If shuffling a particular feature led to a significant drop in performance, it suggests that the model relies heavily on that feature for making predictions. This process is called Performance Comparison. Lastly, Repeat the process for each feature to assess their relative importance.

LightGBM, a gradient boosting framework developed by Microsoft, uses a feature called "feature importance" to perform sensitivity analysis. Sensitivity analysis in the context of LightGBM refers to understanding the impact of individual features on the model's predictions (Ke et al. 2017).

The feature importance in LightGBM is calculated based on Split Value Gain, and the algorithm calculates the improvement in accuracy (or reduction in loss) brought by a feature when a tree node is split based on that feature. Afterward, features with higher split value gain are considered more important (Tang et al. 2020).

Model evaluation

The quality of estimations in this study is evaluated quantitatively using four decisive factors, known as the coefficient of determination \(\left( {{\text{R}}^{2} } \right)\), which is for linear regression, mean absolute error (MAE) and mean square error (MSE) are used to measure and evaluate the quality of fit in terms of the distance of the regressor to the actual training point, whose difference lies in the evaluating metric. The root mean square error (RMSE) to standardize the units of measures of MSE. According to the data structure, the different types of regularization imposed by the intrinsic metrics reflect the measure's relative effectiveness. MSE is more sensitive to outliers than MAE; in addition to this general note, several further considerations help researchers choose the more suitable metric for evaluating a regression model given the available data, and the target task can be drawn (Chicco et al. 2021). The formula:

$${\text{R}}^{2} = 1 - \frac{{\mathop \sum \nolimits_{{\text{i}}}^{{\text{n}}} - 1\left( {{\text{X}}_{{\text{i}}} - {\text{Y}}_{{\text{i}}} } \right)^{2} }}{{\mathop \sum \nolimits_{{\text{i}}}^{{\text{n}}} - 1\left( {\overline{{{\text{X}}_{{\text{i}}} }} - Y_{i} } \right)^{2} }}$$
(1)
$${\text{MAE}} = \frac{1}{{\text{m}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{m}}} \left| {{\text{X}}_{{\text{i}}} - {\text{Y}}_{{\text{i}}} } \right|$$
(2)
$${\text{MSE}} = \frac{1}{{\text{m}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{m}}} \left( {{\text{X}}_{{\text{i}}} - {\text{Y}}_{{\text{i}}} } \right)^{2}$$
(3)
$${\text{RMSE}} = { }\sqrt {\frac{1}{{\text{n}}}\mathop \sum \limits_{{{\text{i}} - 1}}^{{\text{n}}} \left( {{\text{X}}_{{\text{i}}} - {\text{Y}}_{{\text{i}}} } \right)^{2} }$$
(4)

where \({\text{X}}_{{\text{i}}} ,\overline{{{\text{ X}}_{{\text{i}}} }} , {\text{and Y}}_{{\text{i}}}\) represent data points from the sample, estimation by the network, and an average value of the sampling data, respectively.

Result and discussion

Mechanism of immiscible CO2-LSWAG injection after four years of implementation

Figure 6 illustrates the mechanisms involved in CO2-LSWAG injection after four years of implementation. Figure 6a–c showcase ion exchanges occurring within the reservoir, where higher values indicate a greater ion exchange capability. The results reveal that Ca2+ exhibits a higher capability for exchange compared to Mg2+. Moreover, lower Na+ values indicate that after injecting the hybrid EOR, most of the divalent cations have been replaced by monovalent cations. This outcome suggests that the mechanism functions effectively, as Na+ attaches to the rock surface, displacing divalent cations (Liu et al. 2023). As reported by Katende and Sagala (2019) and Mwakipunda et al. (2023), these occurrences enhance the electrostatic repulsion between clay particles and oil, ultimately breaking the tether between oil and clay particles. This leads to the desorption of oil particles from the clay surface, contributing to improved oil recovery. Afterwards, these processes would also change the wetting system due to the reduction of oil coated in the rock surface and allowing the oil to be swept out of the reservoir. These phenomena lead to the extension of the electrical double layer and MIE, these two mechanisms are the main factors to increase oil RF in LSWF. The contradictive phenomena have also found by other investigators where divalent ions were attached replacing Na+ (Fjelde et al. 2012).

Fig. 6
figure 6

Areal 2D view Immiscible CO2-LSWAG mechanism after 4 four years of production

The mineral dissolution and precipitation of calcite and dolomite are shown in Fig. 6d–k. The dominance of mineral dissolution occurs on the near injection well. Due to the fluid injection would easily flow around this area. Dang et al. (2016) mentioned that mineral dissolution could increase oil RF as a result of higher ion exchange capabilities. Nevertheless, the mineral precipitation could deteriorate the ion exchange and wettability alteration. Afterwards, Fig. 6l The interaction between CO2 and brine detained the escalation of pH value. Nonetheless, it did not reduce the positive effect of LSWF injection on the efficiency performances of the oil recovery factor (Zolfaghari et al. 2013).

Lastly, Fig. 6m and n display the effect of higher carbonated water (higher CO2 solubility in water) in this hybrid EOR injection would induce more diffusion and dissolution of CO2 that could reduce the interfacial tension and oil viscosity, resulting higher oil mobilization (Tabrizy and Vahid 2014; Teklu et al. 2016). Teklu et al. (2016) found similar results in their experiment study in terms of IFT and oil viscosity reduction. The solution with higher divalent ion concentrations had a higher degree of change in IFT values, which became lower than LSWF (Kumar et al. 2016; Teklu et al. 2016).

Comparison of immiscible CO2-LSWAG with conventional immiscible WAG, low salinity, and immiscible CO2 flooding

The research study compared immiscible CO2-LSWAG to immiscible CO2-WAG, low salinity, and immiscible CO2 flooding, and these EOR type were executed on initial oil-wet sand. The results are shown in Fig. 7, and the application of immiscible CO2-LSWAG (before sensitivity analysis) could increase oil RF even more compared to the other aforementioned conventional EOR and secondary oil recovery method such immiscible CO2 flooding due to the contribution of wettability alteration and improved the macroscopic-microscopic displacement resulting in more oil RF (Carvalhal et al. 2019; Dang et al. 2016, 2014; Kumar et al. 2016; Naderi and Simjoo 2019; Teklu et al. 2016). However, the lowest oil RF was experienced by immiscible CO2 flooding because it easily formed gas breakthrough. Hence, it could reduce the oil production due to the unstable displacement, and these phenomena occur.

Fig. 7
figure 7

Oil recovery factor of each EOR type in initial oil-wet condition

Figure 8 illustrates the remaining oil saturation after injecting various types of EOR techniques and immiscible CO2 flooding. Notably, immiscible CO2-LSWAG injection shows the lowest remaining oil saturation compared to immiscible CO2-WAG, LSWF, and immiscible CO2 flooding. This enhanced performance of the hybrid EOR can be attributed to the synergistic effect achieved through the combination of LSWF and immiscible CO2 flooding injections. The incorporation of both methods ensures effective sweeping of oil in both higher and lower zones of the reservoir. As water tends to flow in the lower part, and gas sweeps the upper zone, a favorable mobility condition is established, resulting in an increase in overall oil recovery (Nygard and Andersen 2020).

Fig. 8
figure 8

Remaining oil saturation a immiscible CO2 flooding, b LSWF, c immiscible CO2-WAG, and d immiscible CO2-LSWAG injection after four years of production

Sensitivity analysis in initial oil-wet condition

Further studies on the immiscible CO2-LSWAG injection were constructed using CMG CMOST to observe the impact of various parameters on oil RF. This study used water formation composition, water injection composition, CO2 content injection, gas injection rate, water injection rate, and clay content as the input parameters. Based on Fig. 9a and b, the findings showed that each scenario affected the oil RF, and it was noticeable that these parameters affected the performance of immiscible CO2-LSWAG injection. The base case was obtained from the previous result of immiscible CO2-LSWAG injection, which is shown in Fig. 7, and the general solutions were obtained by setting the lower and upper limit values of each parameter, then 1 Star is the lowest oil RF, and 5 Star is the highest oil RF from these scenarios for both wettability conditions.

Fig. 9
figure 9

Immiscible CO2-LSWAG injection using 1000 scenarios in initial oil-wet condition

The immiscible CO2-LSWAG performance in oil-wet conditions shows an acceleration in the early stage of oil production which means the implementation of this hybrid process is working accordingly, based on the previous research held by Dang et al. (2016), who stated that the main advantages of CO2-LSWAG were the ability to overcome the delay of oil production from conventional CO2-WAG hence it can increase oil production and favorable in the economic aspects as well. Furthermore, their study showed that the incremental oil RF using this hybrid EOR was 10% from CO2 flooding, 9% from CO2-WAG, and 24% from LSWF. Comparably, this research study obtained a similar trend for applying this hybrid EOR with 59.15% oil RF in initial oil-wet conditions: the simulations provided incremental oil RF of approximately 4.89% from LSWF, 6.16% from immiscible CO2-WAG, and 11.48% from immiscible CO2 flooding refers to Fig. 9a and b and Fig. 7.

In addition, Fig. 9c also show the histogram comparison of the oil recovery factor in oil wet condition. The highest frequency of oil RF occurs at around 57% with a frequency percentage of 20.7%, and 59% of oil RF was the highest oil production recovery with a frequency of 2.2% in the initial oil-wet condition from 1000 scenario.

Developed scenarios using 1000 datasets shows that the higher oil production distributed by water salinity concentrations of 1000 ppm to 5000 ppm, this ranges were also found and recommended by Gorucu et al. (2019).

Comparison of machine learning algorithms to predict R.F in initial oil-wet condition.

The predictive models were created using immiscible CO2-LSWAG parameters as the input features and oil RF as the output feature. Figure 10. shows the input data distribution in the initial oil-wet condition from 1000 scenarios, the SO4 reservoir brine composition was set to constant using the distribution of discrete real variable and other data distribution used continuous real variable. Afterwards, Fig. 11. exhibited a correlation matrix of all the numeric variables in this study for oil-wet conditions. It was found that there is no multicollinearity in the data sample because the correlation between the features is showing low to no correlation with each other. At the same time, it also indicates that some of the features are highly correlated with the target or oil RF, i.e., Injector Stw (0.66), Inj_Na (0.51), and Aqu_Ca (0.43) had a high positive correlation. Statistically, multicollinearity can cause inaccuracy or increase errors and further make certain variables insignificant (Hidayat and Astsauri 2022; Liang and Zhao 2019).

Fig. 10
figure 10

Data distribution of in initial oil-wet condition for injecting immiscible CO2-LSWAG

Fig. 11
figure 11

Features correlation for initial oil-wet sand

This study compared different algorithms from the same dataset (initial oil-wet sand), as shown in Fig. 12. The grid search cross-validation was used to perform hyperparameter tuning, and the study divided the domain of the hyperparameters into discrete grids. Then, the process continues to attempt every combination of values of this grid, calculating some performance metrics using cross-validation where in this research, R2, MAE, MSE, and RMSE were used as the evaluation performance metrics. This technique was applied in Catboost, LightGBM, Random Forest, ANN, and KNN. Additionally, Randomized search cross-validation was used to reduce the computation time when running the model. The randomized search cross-validation technique is an alternative to grid search cross-validation in performing hyperparameter optimization (Hidayat and Astsauri 2022) and this study used this technique on the XGBoost algorithm. The number of iterations from Catboost was 20, LightGBM 100, Random Forest 81, ANN 84, KNN 90, XGBoost 50, and GMDH 12 in the initial oil-wet dataset.

Fig. 12
figure 12

a R2, b MAE, MSE, and d RMSE score using different types of algorithms in initial oil-wet condition

According to the model evaluation of R2, the top two algorithms were Catboost and LightGBM followed by XGBoost, GMDH, ANN, Random Forest, and KNN. These top two algorithms had the advantages of decreasing overfitting and faster prediction time, as mentioned by Dorogush et al. (2018), Hancock and Khoshgoftaar (2021), and Prokhorenkova et al. 2018). Specifically, the MAE, MSE, and RMSE of these two algorithms were lower than 5%. Interestingly, KNN had an overfitting result for this data set. This was due to the limitation of KNN that could not predict the trend accurately due to giving equal importance to all the predictors in the data, sensitive to noisy data, missing values, and outliers. The KNN did not work well with a large dataset, and the accuracy depends on the quality of the data along with the extensive data; the prediction might be slow.

The Catboost algorithm in initial oil-wet sand was used to predict each independent variable's feature importance toward oil RF. According to Prokhorenkova et al. (2018), this algorithm can also be utilized when there is no categorical feature and had several best features that separate the Catboost from other open-source gradient boosting methods such as reducing time spent on parameter tuning, because this algorithm provides great results with default parameters. Afterwards, improving training results allow the user to use non-numeric factors, instead of having to pre-process the data or spend time and effort turning it into numbers. Then, training faster than LightGBM and XGBoost, reduce overfitting when constructing the models with a novel gradient boosting scheme. Finally, applying the trained model quickly and efficiently for a faster prediction. These advantages became the main motivation to use the Catboost algorithm in this study.

The Catboost model used 4 hyperparameters as mentioned in Table 8. As stated earlier, the project used the grid search cross-validation technique to reduce the computation time when running the model and finding the optimized model. Based on Fig. 13a and b, this method found the best model using the max_depth of 3, leaf_estimation_iterations were 10, l2_leaf_reg was 1, and learning_rate was 0.1. This model showed the plot between predicted and actual in terms of R2 and had the highest model evaluation in the initial oil-wet condition. Specifically, the R2 train was 0.999, validation R2 was 0.982, and the R2 test was 0.983. Referring to the residual plot, it showed the model's good quality. Hence sensitivity analysis for finding the significant parameters that affect oil RF from immiscible CO2-LSWAG was done by utilizing this algorithm. In the same wettability data, the study also implemented the LightGBM algorithm; Fig. 14a shows the evaluation of this model, and the same approach was used for this model. The grid search cross-validation was implemented, and the best hyperparameter was obtained by using a learning rate of 0.2, feature_fraction_bynode of 1, num_leaves of 3, top rate of 0.2, and other_rate of 0.05. This model obtained 0.976 for the R2 train, 0.965 for R2 validation, and 0.970 for the R2 test. Afterward, Fig. 14b shows the model quality based on the residual plot, and it represented the good quality of the model due to the very low data outlier. Based on the model quality and evaluation, the sensitivity analysis study also used this algorithm to find the vital parameters of immiscible CO2-LSWAG that affects oil RF.

Fig. 13
figure 13

a The R2 score plot and b residual plot of the initial oil-wet condition using Catboost model

Fig. 14
figure 14

a The R2 score plot and b residual plot of the initial oil-wet condition using LightGBM model

Evaluation of features’ importance in immiscible CO2-LSWAG

The study proceeded to find the feature importances of this hybrid EOR. Figure 15a showed the significant parameters that affect oil RF were water injection rate, water Na + injection composition, Ca2+ reservoir brine composition, Mg2+ reservoir brine composition, and gas injection rate by utilizing Catboost algorithm. This result was proportional to the previous investigations by Carvalhal et al. (2019), Dang et al. (2015, 2016), and Gorucu et al. (2019), who mentioned that the oil RF was higher when both Ca2+ and Mg2+ water formation is involved because when the reduce Na+ concentration was injected, it can trigger multiple ion exchange and geochemical reaction which can promote more wettability alteration resulting higher oil RF. Therefore, having different low salinity water injection rates could have a more significant effect on the oil RF due to the different amounts of water that would be injected into the reservoir, where the more amount of injected water would have a positive impact on the oil RF since higher ion exchange and DLE would be induced in the reservoir.

Fig. 15
figure 15

Feature importance of immiscible CO2-LSWAG in initial oil-wet condition using a Catboost algorithm and b LightGBM algorithm

Similarly, to the gas injection rate effect, where different injection rates would also affect the oil RF due to the amount of injected CO2 into the reservoir were different, leading to a different level of carbonated water that could have resulted in different IFT and oil viscosity reduction, wettability alteration, and residual oil mobilization through gas diffusivity and dissolution. In terms of the CO2 aspect, a similar result was found by Teklu et al. (2016) who did a micromodel experiment in sandstone using miscible and immisble CO2-LSWAG injection. Afterwards, the injection of CO2 content “Inj_CO2” was detected to have a higher effect on the oil RF as well compared to the LightGBM model; this result was supported by Thanh et al. (2020) who did the simulation study in analyzing the effect of CO2 content to oil RF.

This study also found that the presence of K+ had a higher impact on the oil RF using the Catboost algorithm. Referring to the previous study by Shabib-Asl et al. (2014) found that the dominance composition of K+ also had higher wettability alteration during LSWF, as a result of a higher MIE due to its good reactivity, leading to lower residual oil saturation and increase oil RF. Another interesting finding was that the clay volume did not significantly impact oil RF for the Catboost model. However, Fig. 15b showed clay volume had a higher effect on oil RF, although it had a higher effect on the oil RF from the LightGBM model, the clay content was still not included in the top 5 parameters that could significantly affect the oil production. Referring the prescreening criteria in Table 1 indicated that if sufficient clay content existed, it would have a positive effect on the application of immiscible CO2-LSWAG injection to increase oil RF. Additionally, this result was agreed by Jiang et al. (2010) and AlQuraishi et al. (2019) who found that the contribution of clay content was one of the parameters that affect oil RF LSWF injection due to the ion exchange happened on the clay surface and generate double layer expansion. In terms of comparing the clay volume performance in the initial oil-wet condition, in this type of wettability, the clay volume had a higher effect on the oil RF than in the initial water-wet condition. It is obvious since the wettability of the rock is initially water-wet, therefore, the effect of clay content would be hardly detected to have a higher impact on the oil RF.

Correspondingly to the Catboost algorithm, the top five significant affecting parameters in the LightGBM algorithm were the injection water rate, followed by Na+ injection composition, Ca2+ formation composition, Mg2+ formation composition, and gas injection rate. Afterwards, the LightGBM algorithm found that the injection SO42− composition also had a higher impact on the oil RF. Additionally, this finding was supported by Hidayat and Astsauri (2022), who found SO42− was the potential determining ion (PDI) that had a major influence on the wettability alteration using LSWF on the carbonate formation. However, this study performed the hybrid EOR in the sandstone reservoir, therefore these parameters did not detect as the top five significant affecting parameters to the oil RF. Lastly, this study found that the presence of Sr2+ in the reservoir brine had a higher impact than K+ reservoir brine on oil RF using LightGBM model. This phenomenon is a result of the presence of divalent cation in the water formation could trigger multiple ion exchanges further.

HCO3 in the injected brine has also affected the oil RF as shown in both algorithms. Dang et al. (2016) mentioned that the HCO3- has a detrimental effect on this hybrid EOR performance as it may lead to the precipitation of calcite and lead to a lower ion exchange and wettability alteration. This chemical composition had a higher effect on the LightGBM algorithm compared to the Catboost algorithm as shown in Fig. 15. The noticeable finding on the SO42− brine reservoir composition did not affect the oil RF due the value was set to constant. Hence, the contribution of this feature or parameter cannot be detected in both wettability conditions.

The study on sensitivity analysis of Immiscible CO2-LSWAG performance using machine learning application in sandstone reservoir has significant implications for the petroleum industry. With the current challenges of low oil recovery factors (RF) in reservoirs, there is an urgent need for advanced EOR techniques that can improve oil recovery. The findings of this study provide insights into the performance of the immiscible CO2-LSWAG injection technique and identify the significant affecting parameters of oil RF, which can be optimized for better EOR results.

The utilization of machine learning algorithms in this study offers a more efficient and accurate approach to analyzing complex reservoir engineering data sets, reducing time and cost compared to numerical simulations. This makes it an appealing option for the petroleum industry. Moreover, the study's findings can be applied in field trial applications of immiscible CO2-LSWAG injection, leading to faster decision-making and higher oil recovery factors, crucial for meeting global oil production demand until 2050. The implications of this research are vital for enhancing EOR techniques and maximizing oil recovery from reservoirs, especially as conventional reserves deplete, and petroleum demand rises. This study's contribution holds significant potential for the petroleum industry, exerting a substantial impact on global energy production.

Conclusions

  1. 1.

    This study demonstrated the efficacy of machine learning in conducting sensitivity analysis for low salinity waterflood alternating immiscible CO2 injection (immiscible CO2-LSWAG) in oil-wet sandstone reservoirs.

  2. 2.

    By integrating commercial numerical simulation software with machine learning algorithms, the research identified the top five significant parameters influencing oil recovery factor (RF) in immiscible CO2-LSWAG injection.

  3. 3.

    CatBoost and LightGBM algorithms provided high predictive accuracy with R2 values greater than 98% and 97%, respectively, and lower error metrics (MAE, MSE, RMSE all less than 5%), along with faster processing times.

  4. 4.

    The major contributors to oil RF were water injection rate, Na + injection composition, Ca2+ and Mg2+ concentrations in reservoir brine, and gas injection rate.

  5. 5.

    This study offers a quick decision-making tool for applying immiscible CO2-LSWAG injection in field trials or pilot projects by focusing on the top five significant factors.

  6. 6.

    Limitations of this study include its focus on oil-wet sand in pilot or field trial projects, necessitating further research.

  7. 7.

    Future research could extend this method to water-wet conditions, develop a field-scale model, or apply it to other Enhanced Oil Recovery (EOR) methods to broaden its applicability and strengthen its impact.