Modeling CO2 solubility in water using gradient boosting and light gradient boosting machine

Mahmoudzadeh, Atena; Amiri-Ramsheh, Behnam; Atashrouz, Saeid; Abedi, Ali; Abuswer, Meftah Ali; Ostadhassan, Mehdi; Mohaddespour, Ahmad; Hemmati-Sarapardeh, Abdolhossein

doi:10.1038/s41598-024-63159-9

Modeling CO₂ solubility in water using gradient boosting and light gradient boosting machine

Article
Open access
Published: 12 June 2024

Volume 14, article number 13511, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Modeling CO₂ solubility in water using gradient boosting and light gradient boosting machine

Download PDF

Atena Mahmoudzadeh¹,
Behnam Amiri-Ramsheh¹,
Saeid Atashrouz²,
Ali Abedi³,
Meftah Ali Abuswer³,
Mehdi Ostadhassan⁴,
Ahmad Mohaddespour⁵ &
…
Abdolhossein Hemmati-Sarapardeh¹

506 Accesses
Explore all metrics

Abstract

The growing application of carbon dioxide (CO₂) in various environmental and energy fields, including carbon capture and storage (CCS) and several CO₂-based enhanced oil recovery (EOR) techniques, highlights the importance of studying the phase equilibria of this gas with water. Therefore, accurate prediction of CO₂ solubility in water becomes an important thermodynamic property. This study focused on developing two powerful intelligent models, namely gradient boosting (GBoost) and light gradient boosting machine (LightGBM) that predict CO₂ solubility in water with high accuracy. The results revealed the outperformance of the GBoost model with root mean square error (RMSE) and determination coefficient (R²) of 0.137 mol/kg and 0.9976, respectively. The trend analysis demonstrated that the developed models were highly capable of detecting the physical trend of CO₂ solubility in water across various pressure and temperature ranges. Moreover, the Leverage technique was employed to identify suspected data points as well as the applicability domain of the proposed models. The results showed that less than 5% of the data points were detected as outliers representing the large applicability domain of intelligent models. The outcome of this research provided insight into the potential of intelligent models in predicting solubility of CO₂ in pure water.

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Article Open access 22 December 2021

Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state

Article Open access 09 September 2021

Modeling solubility of CO2–N2 gas mixtures in aqueous electrolyte systems using artificial intelligence techniques and equations of state

Article Open access 07 March 2022

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Introduction

Phase equilibria calculations of carbon dioxide (CO₂) solubility in water holds significant importance in a variety of research areas due to the rising worldwide application of this gas for a broad range of purposes. CO₂ is one of the greenhouse gases that cause major ecological issues, including global warming and climate change effects. Over the last few years, carbon capture and utilization (CCU) and carbon capture and storage (CCS) techniques have attracted significant interest from researchers as effective greenhouse gas mitigating strategies^1,2. Underground geological formations, such as depleted oil and gas reservoirs, deep saline aquifers, and salt domes have been identified as reliable sites for the CCS process³. Depleted gas and oil reservoirs have been widely regarded as a highly promising choice, primarily due to their large storage capacity and the vast number of available data attained from exploration and production operations^4,5,6. The oil industry also takes the advantage of CO₂ injection into oil and gas reservoirs as a promising enhanced oil recovery (EOR) method^7,8. Moreover, the combination of water and CO₂ injection, which is known as water-alternating-gas (WAG), is considered a successful technique because of its ability to control gas mobility and reduce the viscous fingering phenomenon^9,10. As seen in many fields, the interactions between water and CO₂ play an important role and the dissolution of CO₂ could affect the ultimate storage capacity in CCS^11,12 and the performance of EOR methods. From an environmental perspective, it could also pose a leakage risk by contaminating underground water^13,14,15,16.

Accordingly, CO₂ dissolution in water has been intensively investigated using a variety of experimental techniques^17,18,19,20. One of the earliest studies that examined the solubility of CO₂ in water at sequestration conditions was conducted by Wiebe and Gaddy²¹ at temperatures between 323 and 373 K and pressures up to almost 71 MPa. According to the literature, the maximum pressure at which the solubility of CO₂ gas in water has been experimentally measured is 350 MPa which was reported in the studies of Todheide & Franck²² and Takenouchi and Kennedy²³. In 1999, Dhima et al. measured the CO₂ solubility in pure water at 344.25 K and pressures up to 100 MPa²⁴. Ahmadi and Chapoy²⁵ utilized a high-pressure setup to examine the CO₂ solubility in several salt solutions at temperatures between 300 and 424 K and pressures up to 41 MPa. They conducted experiments with deionized water under equal conditions to verify the validity of their method. Wang et al.²⁶ conducted experiments to determine the solubility of CO₂ in water under high pressure and temperature conditions. They investigated the effect of the vapor phase of H₂O. A summary of literature experimental research os presented in Table 1, comprising the respective year of study, applied pressure and temperature ranges, and the employed equipment.

Table 1 Literature experiments studies on the determination of CO₂ solubility in water.

Full size table

CO₂ solubility in water can also be estimated by thermodynamic modeling, which is more cost-effective than experimental measurements. In recent years, the potential of theoretical models to effectively represent influential characteristics such as pressure, temperature, and electrolyte concentrations has attracted the interest of researchers attempting to develop models for different systems²⁷. Duan and Sun²⁸ proposed a model to estimate CO₂ solubility in water and aqueous NaCl solutions using an extensive databank containing about 1500 data points at temperatures ranging from 273 to 533 K and pressures between 0 and 2000 bar. Spycher et al.²⁹ utilized a calculating approach based on determining new Redlich-Kwong Equation of State (EoS) parameters¹⁹ for mutual solubility between CO₂ and water, using the CO₂ solubility data obtained at 285–373 K and up to 60 MPa. Yan et al.³⁰ measured CO₂ solubility in water and NaCl brine in three temperatures of 232, 373, and 413 K and a pressure range of 5 to 40 MPa. By comparing experimental data with Søreide and Whitson EoS model³¹, they indicated that a modification in the model by refitting the interaction parameter between water and CO₂ in the aqueous phase could lead to an acceptable solubility prediction. Recently, statistical associating fluid theory (SAFT) EoSs have also been applied to CO₂–electrolyte–water solutions^27,32,33. Yan and Chen³⁴ created a model for CO₂ solubility in NaCl solution employing a Perturbed-Chain SAFT (PC-SAFT) EoS with a maximum temperature and pressure of 473.15 K and 150 MPa. They determined Henry's constant by fitting the experimental CO₂ solubility in pure water at the same pressure and temperature range. Moreover, multiple thermodynamic models of CO₂ solubility have been constructed based on various pressures, temperatures, and water salinity conditions^25,35,36. Table 2 provides an informative summary of theoretical models found in the literature. It includes information such as the year of study, the pressure and temperature ranges that were examined, the techniques used for prediction, and the error.

Table 2 Literature theoretical models on the calculation of CO₂ solubility in water.

Full size table

In recent years, artificial intelligence (AI) has garnered significant attention from researchers and has become a potent tool for predicting tasks in a variety of fields^{37,38,39,40,41}. AI has been widely applied in the oil industry owing to its advantages over costly and time-consuming experimental procedures and sophisticated thermodynamic models⁴². AI was utilized in the assessment of a water-alternating-CO₂ process³⁷ for determining the minimum miscibility pressure (MMP) of CO₂³⁸ and predicting oil recovery in CO₂-EOR approaches^39,40. A considerable number of studies have been carried out in the field of environmental research, specifically in the area of CCS, which have provided valuable insights into the potential of CCS as a feasible approach to mitigate climate change. These studies have focused on predicting the efficiency of CO₂ storage⁴¹ and assessing the features of coal as an approach to carbon sequestration^42,43.

Ghasemian et al.⁴⁴ developed an artificial neural network (ANN) to estimate the solubility of CO₂ in water based on 105 data points from their experiments conducted at pressure and temperature ranges of 0.1–1 MPa and 278.15–348.15 K, respectively. They noticed that the ANN model outperformed the EoSs in CO₂ solubility prediction with an absolute average deviation (AAD) of 4%. Hemmati-Sarapardeh et al.⁴⁵ investigated the efficiency of four machine learning (ML) techniques, namely, radial basis function (RBF), multilayer perceptron (MLP), gene expression programming (GEP), and least-squares support vector machine (LSSVM) models, to estimate CO₂ solubility in pure water at high pressures and temperatures. The results showed that the highest level of accuracy was achieved with the LSSVM model optimized by the firefly optimization algorithm (FFA) with a root mean square error (RMSE) of 0.3261. Khoshraftar and Ghaemi utilized ANN and response surface methodology (RSM) to predict the solubility of CO₂ in water. Their model development was based on 240 measurements that were conducted within a pressure range of 0.5–200 MPa and a temperature between 313.15 and 473.15 K. The efficiency of MLP and RBF models were compared using an ANN technique. All the developed models accurately predicted solubility, although the RBF and MLP models exhibited the highest performance⁴⁶.

Jeon and Lee⁴⁷ developed an ANN based on 2406 data points to predict CO₂ solubility in pure water and single-salt aqueous solutions. In order to train the model, 80% of the data bank was used, and the remaining 20% of the data points of solubility in the complex or multi-component solutions were utilized for validation and testing. They stated that the developed ANN model could be extrapolated to predict CO₂ solubility in multi-component salt solutions. Over the past years, several investigations have been conducted on the prediction of pure and impure CO₂ solubility⁴⁸ in brine solutions containing different components with the aid of AI^49,50,51. Furthermore, research has been carried out to predict the solubility of CO₂ in the non-aqueous phase. Certain EOR techniques employ the injection of miscible CO₂ into the oil in order to enhance oil mobility by lowering the viscosity of the oil, the IFT, and oil swelling. Rostami et al. developed a gene expression programming (GEP) model for predicting the solubility of CO₂ in both dead and live oil, utilizing 106 and 74 data points, respectively. The error analysis revealed that the QEP-based model predicts the solubility of CO₂ in both dead oil and living oil accurately, with correlation coefficients (R²) of 0.9860 and 0.9844 for each⁵². Prior studies that have explored the application of AI to predict the solubility of gases in predicting the solubility of gases in different types of aqueous solutions and non-aqueous phase are summarized in Table 3. This table presents data regarding the year of the study, the applied AI technique, the kind of gas, the solution type, and the temperature and pressure ranges.

Table 3 Literature AI models on the prediction of gas solubility in aqueous and non-aqueous phase.

Full size table

The purpose of this study is to develop intelligent models that accurately predict CO₂ solubility in pure water. For this, a large data bank with 785 data points containing the values of pressure, temperature, and CO₂ solubility in water is gathered. Then, two powerful intelligent models, namely gradient boosting (GBoost) and light gradient boosting machine (LightGBM) are implemented to provide predictions for the CO₂ solubility as a function of temperature and pressure. Various statistical and graphical methods are employed to assess the validity and precision of the developed intelligent models. In addition, a trend analysis is undertaken to verify the developed models’ ability to detect physical trends. Lastly, the validity of the data bank and application domain of both models is examined by the Leverage method. As depicted in Table 3, although there are valuable artificial models for forecasting CO₂ solubility in aqueous solutions, blending data points of CO₂ solubility in pure water, single salt solutions, and diverse brines have been implemented in most predicted models. This study comprises an extensive database covering a broad range of temperature and pressures with reference to CO₂ solubility in water. This vast data collection resulted in a well-trained AI model with high accuracy. Both development models offered in this study demonstrate a high level of accuracy with a minimum determination coefficient of 0.995, indicating the reliability of the models. When employing EOR techniques or carbon storage technologies, we encounter a complex scenario involving the combination of gases in a water solution with varying salt levels and components. These validated and precise AI models can be employed in future research to assess complex systems, such as gas mixes and aqueous solutions of varying salinity levels. This not only improves the reliability of these models but also creates new opportunities for enhancing the effectiveness of these operations.

Methodology

Data gathering

In order to develop intelligent models, a trustworthy and broad data set was collected from various sources^{17,19,20,22,24,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87}. Each of the 785 data points in the database contained values for temperature, pressure, and CO₂ solubility in pure water. Temperature, pressure were regarded as the models’ inputs, while solubility was specified as the models’ output parameter. These data points have already been used by Hemmati-Sarapardeh et al.⁴⁵. The data set was randomly partitioned into 80% and 20% subsets for model training and testing, respectively. Table 4 describes the statistical features of the data bank.

Table 4 Statistical overview of the dataset in this study.

Full size table

Modeling techniques

Gradient boosting (GBoost)

Gradient boosting (GB) is a kind of ensemble supervised tree-based machine learning (ML) approaches that can be utilized for both regression and classification issues^88,89,90. It is called an ensemble because the ultimate model's prediction is produced based on various single models’ (decision trees) predictions⁸⁸. GB, which is portrayed in Fig. 1, is an iterative accumulation of sequentially organized tree-based models of weak learners or predictors that are converted to powerful learners^91,92. Commonly, boosting techniques combine weak predictors into a powerful one in an iterative path to minimize the loss function⁸⁹. This loss function is minimized similarly to an ANN in which weights are tuned⁹³.

To achieve this purpose, it is recommended to choose a function $h(x,{\theta }_{t})$ to be the most parallel to the negative gradient ${({g}_{t}\left({x}_{i}\right))}_{i=1}^{N}$. By selecting an iterative approach, we can defeat challenge posed by the prediction variables. The function ${g}_{t}\left(x\right)$ for every experimental data is calculated as below⁹⁴:

$${g}_{t}\left(x\right)={E}_{y} {\left[\frac{\partial \Psi (y,f(x))}{\partial f(x)}| x\right]}_{f\left(x\right)={\widehat{f}}^{t-1}(x)}$$

(1)

To permit the replacement of a hard optimizing problem, one can easily choose the new function increment to be the most matched with ${-g}_{t}\left(x\right)$ utilizing classic least-squares optimization as follows⁸⁹:

$$\left({\rho }_{t},{\theta }_{t}\right)={argmin}_{\rho ,\theta }\sum_{i=1}^{N}{\left[-{g}_{t}\left({x}_{i}\right)+\rho h\left({x}_{i},\theta \right)\right]}^{2}$$

(2)

The following stages show a general optimizing procedure of the GBoost⁹⁴:***

Initializing the ${\widehat{f}}_{0}$ as a constant;

Calculating the negative gradient of ${-g}_{t}\left(x\right)$;

Conforming a next base-learner function $h(x,{\theta }_{t})$;

Recognizing the optimal gradient descent step-size ${\rho }_{t}$ as:

$${\rho }_{t}={argmin}_{\rho }\sum_{i=1}^{N}\Psi \left[{y}_{i},{\widehat{f}}_{t-1}\left({x}_{i}\right)+\rho h\left({x}_{i},{\theta }_{t}\right)\right]$$

(3)

Updating the model's prediction:

$${\widehat{f}}_{t}\leftarrow {\widehat{f}}_{t-1}+{\rho }_{t}h\left(x,{\theta }_{t}\right)$$

(4)

This approach involves the base-learner phase, which consists of a single neuron and the loss function employed is the standard squared error. Through the process of training of the model, the optimum structure is earned.

Light gradient boosting machine (LightGBM)

LightGBM is a gradient-supervised technique based on decision trees and the idea of boosting algorithms⁹⁵. LightGBM technique, which includes several decision trees, is applicable in various ML tasks like regression, classification, and ranking^96,97,98. Each LightGBM technique employs a powerful learning framework to produce prediction values⁹⁹. Its principal differences from other tree-based models are that it accelerates the training stage by applying histogram-based techniques, decrease memory consumption and it uses a leaf-wise growth strategy with depth constraints⁹⁵. Figure 2 illutrates a schematic image of the LightGBM. The training process of the LightGBM is determined by the subsequent formula:

$$X={\left\{\left({x}_{j},{y}_{j}\right)\right\}}_{j=1}^{N}$$

(5)

Next, ${\widehat{f}}_{(x)}$ will forecast by minimizing the loss function $L$ as⁹⁵:

$$L\left(y,f\left(x\right)\right): \widehat{f}\left(x\right)=argmin {E}_{y,x}.L\left(y,f\left(x\right)\right)$$

(6)

Eventually, the training stage of every regression tree can be indicated as ${W}_{q(x)} , q \in \left\{\text{1,2},3,\dots ,N\right\}$; where W denotes a weight term of every leaf node, q shows utilized decision rules in each tree, and N indicates the number of leaves in a tree¹⁰⁰. Thus, by the employing of Newton's method for recognizing objective function, the training outcome of every stage is tuned by the following equation:

$${G}_{t}\cong \sum_{i=1}^{N}L[{y}_{i}, {F}_{t-1}\left({x}_{i}\right)+{f}_{t}({x}_{i})]$$

(7)

Results and discussion

Model's development

This research focused on proposing two intelligent models that accurately predict the solubility of CO₂ in pure water. The models were designed to predict the target variable as a function of temperature and pressure. The established models were trained with 628 data points (80% of the data bank) and tested with 157 data points (20% of the data bank). The accuracy of the intelligent models in this study compared to the EoSs is impressive. All statistical and graphical error analyses verify this issue. Furthermore, smart techniques require less input parameter information than EoSs. In our current research, we only utilized pressure and temperature to predict the solubility of CO₂ in pure water. In contrast, many EoSs necessitate additional properties. Moreover, the application of EoSs typically consumes a significant amount of time.

For the tuning process, various ranges of hyperparameters were tested to find the optimal value of each hyperparameter. Table 5 shows the optimum values of the GBoost and LightGBM hyperparameters, separately. Max depth refers a maximum depth of the tree. Learning rate determines the step size at each sequential iteration. Min sample split and Min sample leaf show the minimum number of samples needed for splitting an internal node and to form a leaf node, respectively. Also, N estimators represents the number of trees in the forest.

Table 5 Optimum values of hyperparameters of the models.

Full size table

Performance evaluation

To analyze the performance of the developed models, multiple statistical and graphical evaluations were utilized. This research implemented the root mean square error (RMSE) and the correlation coefficient (R²) for statistical evaluation. Equations (8) and (9) provide the mathematical formulations of these criteria.

$$RMSE= {\left(\frac{1}{n} \sum_{i=1}^{n}{\left({y}_{i}^{exp}-{y}_{i}^{cal}\right)}^{2}\right)}^{0.5}$$

(8)

$${R}^{2}=1-\frac{{\sum_{i=1}^{n}\left({y}_{i}^{exp}-{y}_{i}^{cal}\right)}^{2}}{{\sum_{i=1}^{n}\left({y}_{i}^{exp}-\overline{y }\right)}^{2}}$$

(9)

where ${y}_{i}^{exp}$ is the experimentally determined solubility of CO₂ in pure water and ${y}_{i}^{cal}$ is the value of solubility predicted by the models. Also, y represents the mean value of the measured data points. As Table 6 reflects, the two developed models showed strong agreement with experimental data. Although error indices indicate the great accuracy of both models, the GBoost model outperformed the LightGBM algorithm slightly.

Table 6 Statistical error factors of established models.

Full size table

Graphical assessments were also used to compare the reliability of the models. For this purpose, various kinds of plots, comprising cross-plots, error distribution plots, and cumulative frequency were plotted. Cross plots are one of the visual assessments that compare the predicted data with experimental measurements. The concentration of data near the Y = X line indicates the accuracy of the developed model. Figure 3 shows the cross plots of the GBoost and LightGBM models. As shown, the dense compactness of train and test data points was seen around the unit slope line. While the points were more distributed close to the Y = X line in the cross plot of the LightGBM model, the GBoost approach demonstrated a stronger concentration around this line.

Figure 4 demonstrates the error distribution diagrams of the established models, where the error is defined as the deviation of predicted data of the solubility from the experimental values. In this diagram, more aggregation around the zero line implies a more accurate model. The plots revealed that the majority of points were located close to the zero-error line and confirmed the accurate predictions. While the GBoost model has an accurate performance over the CO₂ solubility range, a slight deviation was observed in the LightGBM model in a few points, especially in high solubility values, indicating that there was a minor relative error at these values.

Group error analysis

Group error diagrams were used to assess the efficiency of the established models in various pressure and temperature ranges. Figure 5 depicts group error plots of the GBoost and LightGBM models in five equal temperature and pressure ranges. As Fig. 5a shows, both models demonstrated higher precision at temperatures below 454.13 K. It should be noted that the GBoost model demonstrated higher reliability than LightGBM throughout all temperature ranges. The effect of pressure on the efficiency of the model was shown in Fig. 5b. As shown, both models provided more accurate predictions at pressures lower than 700 bar. While GBoost demonstrated relatively consistent accuracy at all ranges, increasing the pressure reduced the efficiency of LightGBM.

Cumulative frequency

The cumulative frequency plot is an interactive visualization technique for evaluating the reliability of intelligent models. In these charts, the higher a model is positioned, the more accurate its predictions are. The cumulative frequency curves of the GBoost and LightGBM are illustrated in Fig. 6. The closeness of the two models to the vertical axis signifies that these models predicted the majority of data points with a low error. Considering that the error is defined as the absolute difference between measured experimental and predicted data points, 95% of data predicted by the GBoost model have an error of less than 0.29 mol/kg. By contrast, LightGBM reported 93% of the data at this error value. Both developed models demonstrated an acceptable level of certainty in predicting the solubility of CO₂, whilst GBoost showed relatively higher accuracy.

Sensitivity analysis

One of the sensitivity approaches employed to measure the effect of input parameters on the output is the relevancy factor. This factor is employed to determine the correlation between dependent and independent variables, and is defined below¹⁰¹:

$$r\left({I}_{k}, y\right)\frac{\sum_{i=1}^{n}{(I}_{i}^{k}-{I}_{ave}^{k})\left({O}_{i}-{O}_{ave}\right)}{\sqrt{\sum_{i=1}^{n}{\left({I}_{i}^{k}-{I}_{ave}^{k}\right)}^{2}\sum_{i=1}^{n}{\left({O}_{i}-{O}_{ave}\right)}^{2}}}$$

(10)

where $I$ and $O$ symbolize the the input and output variables, respectively. As stated earlier, temperature and pressure were considered input parameters of the developed models for predicting CO₂ solubility in water. ${I}_{i}^{k}$ represents the $i$ th value of $k$th input parameters, ${I}_{ave}^{k}$ stands for the average value of the $k$th input, and ${O}_{ave}$ indicates the average output. The relevancy factor shows a value between − 1 and 1. Where a positive number implies a direct correlation between the input and output variables, while a negative value shows a reverse one. The degree of influence is expressed by the absolute value of $r$ so that the greater the r, the more influence of the input. Figure 7 demonstrates the impact of pressure and temperature on CO₂ solubility for the GBoost and LightGBM models. As can be seen, the two models reflected relatively the same relevancy factor of input parameters. While a rise of both temperature and pressure would result in an increase in solubility, pressure with a relevancy factor of 0.87 had a stronger impact than temperature with a relevancy factor of almost 0.65.

Model trend analysis

As noted in previous sections, the GBoost approach provided more accurate CO₂ solubility predictions. To investigate the impact of pressure and temperature on the CO₂ solubility in pure water and to verify the model's engagement to a physically expected trend, the experimentally measured points were compared to the GBoost model’s predictions with respect to the trends of solubility at various temperatures and pressures by keeping one of them at a constant value. The temperature's impact on solubility at three constant pressures is depicted in Fig. 8. As demonstrated, the predicted points were accurately consistent with the experimental data and illustrated that increasing temperatures increases solubility strength. The solubility line at 3500 bar is positioned above the solubility lines at 2000 bar and 1200 bar in the figure, which illustrates the influence of pressure. In addition, the impact of pressure on solubility is represented in Fig. 9. As illustrated in the figure, a rise in pressure enhanced gas solubility at both high and low temperatures. Pressure rise causes compressibility in the gas phase, which leads to the release of more space for additional gas molecules to be dissolved in the water. In Fig. 9a, the solubility at 278 K was higher than that of 304.19 K. Although, in high-pressure regions, the solubility increased as the temperature rise. In other words, at low and medium pressures, heating the liquid or increasing the temperature results in a rise of the molecules’ kinetic energy, leading them to move faster and letting more of them escape the solvent. This indicates that thermal energy overcomes the intermolecular attraction force between carbon dioxide and pure water. However, when gas reaches the supercritical condition, which for CO₂ is above 304.13 K and 73.97 $bar$ of temperature and pressure, it exhibits a more complicated behavior. As seen in Figs. 8 and 9b demonstrate, CO₂ tends to behave more as a liquid, showing that both pressure and temperature positively affect solubility¹⁰². In conclusion, Figs. 8 and 9 confirmed the validity of the GBoost model and revealed that the model accurately captured the phenomenon’s existing physical trend.

Implementation of the Leverage method

Existing outlier data in a data bank might adversely impact the prediction’s efficiency and applicability of a model. Therefore, the detection of these measured points that differ from the bulk of data is a key step for model development. The Leverage method was employed to detect outliers in this study^103,104. According to a statistical perspective and a visual evaluation, this technique identifies outlier points. This method sketches the Williams plot based on a $H$ value and standardized residual. The $H$ value, refers to the elements of the Hat matrix, which is computed as below:

$$H=X{({X}^{T}X)}^{-1}{X}^{T} , X=N\times k$$

(11)

$$SR= \frac{{e}_{i}}{{RMSE \left(1-{H}_{ii}\right)}^{0.5}}$$

(12)

where $k$ and $N$ indicate the input parameters and number of data points, respectively, and $T$ refers to the transpose. The standardized residual is measured based on the ${e}_{i}$, which is the deviation between predicted and experimental data, root mean square error ($RMSE$), and $H$ vectors. In the Williams plot, valid data points are situated in the area bounded by the Leverage limit (${H}^{*}$), and the suspected limits. This valid zone represents the applicability of the model domain. The mathematical measurement of the Leverage limit is presented in Eq. (13), which corresponds to 0.0115 for the gathered databank. Suspicious limits are defined as a standard residual higher than 3 or less than − 3. In addition, the area with $H>{H}^{*}$ is classified into two areas based on its SR: good high leverage and bad high leverage. The good high leverage, which represented data points that predicted well but were outside the scope of the model's applicability, relates to the data as $-3\le SR\le 3$. The bad high leverage area belongs to points with $SR>3$ or $SR<-3$^105,106,107.

$${H}^{*}= \frac{3\times (k+1)}{N}$$

(13)

where k represents the number of inputs (here two) and N is the total number of data points (here 785).

Williams's plot is one of the outlier detection methods for examining the performance of developed models which is widely used in AI application studies^{45,108,109,110,111}. The William plots of the proposed models for predicting the CO₂ solubility in water are shown in Fig. 10. As the plots demonstrate, the majority of data points were placed in the valid region, 95.92% of the points for GBoost and 95.67% of them for the LightGBM. For both models, only 2.42% of data exceeded the leverage limit. Overall, as a negligible percentage of data were placed out of the valid zone, the Leverage approach approved the applicability of proposed models and the validity of experimental data points.

Conclusions

In this study, two tree-based models, GBoost and LightGBM, were developed to predict CO₂ solubility in pure water based on an extensive data bank including 785 experimental data points collected from diverse sources. Two parameters of pressure and temperature were considered as the input parameters, while solubility was defined as the output.

In order to validate models, statistical and graphical evaluations were implemented. Multiple validation procedures approved the high precision agreement between experimental and predicted solubility values. The findings indicated the outperformance of the GBoost model with R² and RMSE values of 0.9976 and 0.137 mol/kg, respectively.
The trend analysis was employed to assess the pressure and temperature effects on the solubility in comparison to the predictions of the model. The trend analysis revealed that the proposed models exhibited the high accuracy in comprehending understanding the physical trend of the problem.
For outlier detection, the Leverage approach was implemented which demonstrated the validity and reliability of the models on a large portion of data; nevertheless, only a few points were identified as suspected data.
The findings of this study demonstrated that both developed models could be considered potent and trustworthy tools for predicting the solubility of CO₂ in water.

Data availability

All the data have been collected from literature. We cited all the references of the data in the manuscript. However, the data will be available from the corresponding author on reasonable request.

References

Xu, T., Apps, J. A. & Pruess, K. Mineral sequestration of carbon dioxide in a sandstone–shale system. Chem. Geol. 217(3), 295–318. https://doi.org/10.1016/j.chemgeo.2004.12.015 (2005).
Article ADS CAS Google Scholar
Cuéllar-Franca, R. M. & Azapagic, A. Carbon capture, storage and utilisation technologies: A critical analysis and comparison of their life cycle environmental impacts. J. CO2 Utilization. 9, 82–102. https://doi.org/10.1016/j.jcou.2014.12.001 (2015).
Article CAS Google Scholar
S. Bachu & S. Bachu. Screening and ranking of hydrocarbon reservoirs for CO₂ storage in the Alberta Basin, Canada (2001).
Raza, A. et al. Preliminary assessment of CO₂ injectivity in carbonate storage sites. Petroleum 3(1), 144–154. https://doi.org/10.1016/j.petlm.2016.11.008 (2017).
Article Google Scholar
Raza, A. et al. Well selection in depleted oil and gas fields for a safe CO₂ storage practice: A case study from Malaysia. Petroleum 3(1), 167–177. https://doi.org/10.1016/j.petlm.2016.10.003 (2017).
Article Google Scholar
Underschultz, J. et al. CO₂ storage in a depleted gas field: An overview of the CO2CRC Otway Project and initial results. Int. J. Greenhouse Gas Control 5(4), 922–932. https://doi.org/10.1016/j.ijggc.2011.02.009 (2011).
Article CAS Google Scholar
L. Lake, R. T. Johns, W. R. Rossen, & G. A. Pope. Fundamentals of Enhanced Oil Recovery. Society of Petroleum Engineers.
L. Lake, M. Lotfollahi, & S. Bryant. Fifty years of field observations: Lessons for CO2 storage from CO2 enhanced oil recovery. in 14th Greenhouse Gas Control Technologies Conference Melbourne, 2018, pp. 21–26.
Aziz, H., Muther, T., Khan, M. J. & Syed, F. I. A review on nanofluid water alternating gas (N-WAG): Application, preparation, mechanism, and challenges. Arab. J. Geosci. 14(14), 1416. https://doi.org/10.1007/s12517-021-07787-9 (2021).
Article Google Scholar
D. Merchant. Enhanced oil recovery—The history of CO2 conventional wag injection techniques developed from lab in the 1950’s to 2017. in Carbon Management Technology Conference, 2017, vol. All Days, CMTC-502866-MS. https://doi.org/10.7122/502866-ms
Li, W., Nan, Y., Zhang, Z., You, Q. & Jin, Z. Hydrophilicity/hydrophobicity driven CO2 solubility in kaolinite nanopores in relation to carbon sequestration. Chem. Eng. J. 398, 125449. https://doi.org/10.1016/j.cej.2020.125449 (2020).
Article CAS Google Scholar
Ho, N. L., Perez-Pellitero, J., Porcheron, F. & Pellenq, R. J. M. Enhanced CO2 solubility in hybrid adsorbents: Optimization of solid support and solvent properties for CO2 capture. J. Phys. Chem. C 116(5), 3600–3607. https://doi.org/10.1021/jp2099625 (2012).
Article CAS Google Scholar
Frye, E., Bao, C., Li, L. & Blumsack, S. Environmental controls of cadmium desorption during CO2 leakage. Environ. Sci. Technol. 46(8), 4388–4395. https://doi.org/10.1021/es3005199 (2012).
Article ADS CAS PubMed Google Scholar
Xiao, T. et al. Potential chemical impacts of CO2 leakage on underground source of drinking water assessed by quantitative risk analysis. Int. J. Greenhouse Gas Control 50, 305–316. https://doi.org/10.1016/j.ijggc.2016.04.009 (2016).
Article CAS Google Scholar
Little, M. G. & Jackson, R. B. Potential impacts of leakage from deep CO2 geosequestration on overlying freshwater aquifers. Environ. Sci. Technol. 44(23), 9225–9232. https://doi.org/10.1021/es102235w (2010).
Article ADS CAS PubMed Google Scholar
Lu, J., Partin, J. W., Hovorka, S. D. & Wong, C. Potential risks to freshwater resources as a result of leakage from CO2 geological storage: A batch-reaction experiment. Environ. Earth Sci. 60(2), 335–348. https://doi.org/10.1007/s12665-009-0382-0 (2010).
Article ADS CAS Google Scholar
Wiebe, R. & Gaddy, V. L. The solubility of carbon dioxide in water at various temperatures from 12 to 40° and at pressures to 500 atmospheres. Critical phenomena*. J. Am. Chem. Soc. 62(4), 815–817. https://doi.org/10.1021/ja01861a033 (1940).
Article CAS Google Scholar
Nakayama, T., Sagara, H., Arai, K. & Saito, S. High pressure liquid–liquid equilibria for the system of water, ethanol and 1,1-difluoroethane at 323.2 K. Fluid Phase Equilibria 38(1), 109–127. https://doi.org/10.1016/0378-3812(87)90007-0 (1987).
Article CAS Google Scholar
King, M. B., Mubarak, A., Kim, J. D. & Bott, T. R. The mutual solubilities of water with supercritical and liquid carbon dioxides. J. Supercrit. Fluids 5(4), 296–302. https://doi.org/10.1016/0896-8446(92)90021-B (1992).
Article CAS Google Scholar
Teng, H. O. & Yamasaki, A. Pressure-mole fraction phase diagrams for CO2-pure water system under temperatures and pressures corresponding to ocean waters at depth to 3000 M. Chem. Eng. Commun. 189(11), 1485–1497. https://doi.org/10.1080/00986440214993 (2002).
Article CAS Google Scholar
Wiebe, R. & Gaddy, V. L. The solubility in water of carbon dioxide at 50, 75 and 100°, at pressures to 700 atmospheres. J. Am. Chem. Society 61(2), 315–318. https://doi.org/10.1021/ja01871a025 (1939).
Article CAS Google Scholar
K. Tödheide & E. U. Franck. Das Zweiphasengebiet und die kritische Kurve im System Kohlendioxid–Wasser bis zu Drucken von 3500 bar. 37(5_6), 387–401 (1963). https://doi.org/10.1524/zpch.1963.37.5_6.387.
Takenouchi, S. & Kennedy, G. C. The binary system H₂O-CO₂ at high temperatures and pressures. Am. J. Sci. 262(9), 1055. https://doi.org/10.2475/ajs.262.9.1055 (1964).
Article ADS CAS Google Scholar
Dhima, A., de Hemptinne, J.-C. & Jose, J. Solubility of hydrocarbons and CO2 mixtures in water under high pressure. Ind. Eng. Chem. Res. 38(8), 3144–3161. https://doi.org/10.1021/ie980768g (1999).
Article CAS Google Scholar
Ahmadi, P. & Chapoy, A. CO2 solubility in formation water under sequestration conditions. Fluid Phase Equilibria 463, 80–90. https://doi.org/10.1016/j.fluid.2018.02.002 (2018).
Article CAS Google Scholar
Wang, X. et al. Experiment based modeling of CO2 solubility in H2O at 313.15–473.15 K and 0.5–200 MPa. Appl. Geochem. 130, 105005. https://doi.org/10.1016/j.apgeochem.2021.105005 (2021).
Article CAS Google Scholar
Ramdan, D. et al. Prediction of CO2 solubility in electrolyte solutions using the e-PHSC equation of state. J. Supercrit. Fluids 180, 105454. https://doi.org/10.1016/j.supflu.2021.105454 (2022).
Article CAS Google Scholar
Duan, Z. & Sun, R. An improved model calculating CO2 solubility in pure water and aqueous NaCl solutions from 273 to 533 K and from 0 to 2000 bar. Chem. Geol. 193(3), 257–271. https://doi.org/10.1016/S0009-2541(02)00263-2 (2003).
Article ADS CAS Google Scholar
Spycher, N., Pruess, K. & Ennis-King, J. CO2-H2O mixtures in the geological sequestration of CO2. I. Assessment and calculation of mutual solubilities from 12 to 100°C and up to 600 bar. Geochimica et Cosmochimica Acta 67(16), 3015–3031. https://doi.org/10.1016/S0016-7037(03)00273-4 (2003).
Article ADS CAS Google Scholar
Yan, W., Huang, S. & Stenby, E. H. Measurement and modeling of CO2 solubility in NaCl brine and CO2–saturated NaCl brine density. Int. J. Greenhouse Gas Control 5(6), 1460–1477. https://doi.org/10.1016/j.ijggc.2011.08.004 (2011).
Article CAS Google Scholar
Søreide, I. & Whitson, C. H. Peng-Robinson predictions for hydrocarbons, CO2, N2, and H2S with pure water and NaCI brine. Fluid Phase Equilibria 77, 217–240. https://doi.org/10.1016/0378-3812(92)85105-H (1992).
Article Google Scholar
Rozmus, J., de Hemptinne, J.-C., Galindo, A., Dufal, S. & Mougin, P. Modeling of strong electrolytes with ePPC-SAFT up to high temperatures. Ind. Eng. Chem. Res. 52(29), 9979–9994. https://doi.org/10.1021/ie303527j (2013).
Article CAS Google Scholar
Ji, X., Tan, S. P., Adidharma, H. & Radosz, M. SAFT1-RPM approximation extended to phase equilibria and densities of CO2−H2O and CO2−H2O−NaCl systems. Ind. Eng. Chem. Res. 44(22), 8419–8427. https://doi.org/10.1021/ie050725h (2005).
Article CAS Google Scholar
Yan, Y. & Chen, C.-C. Thermodynamic modeling of CO2 solubility in aqueous solutions of NaCl and Na2SO4. J. Supercrit. Fluids 55(2), 623–634. https://doi.org/10.1016/j.supflu.2010.09.039 (2010).
Article CAS Google Scholar
Haghtalab, A., Asadi, E. & Shahsavari, M. High-pressure vapor-liquid equilibrium measurement of CO2 solubility into aqueous solvents of (diisopropylamine + l-lysine) and (diisopropylamine + piperazine + l-lysine) at different temperatures and compositions. J. Chem. Eng. Data 66(11), 4254–4271. https://doi.org/10.1021/acs.jced.1c00725 (2021).
Article CAS Google Scholar
Liu, Z., Cui, P., Cui, X., Wang, X. & Du, D. Prediction of CO2 solubility in NaCl brine under geological conditions with an improved binary interaction parameter in the Søreide-Whitson model. Geothermics 105, 102544. https://doi.org/10.1016/j.geothermics.2022.102544 (2022).
Article Google Scholar
Le Van, S. & Chon, B. H. Applicability of an artificial neural network for predicting water-alternating-CO2 performance. Energies 10(7), 842 (2017).
Article Google Scholar
Dong, P., Liao, X., Chen, Z. & Chu, H. An improved method for predicting CO₂ minimum miscibility pressure based on artificial neural network. Adv. Geo-Energy Res. 3(4), 355–364. https://doi.org/10.26804/ager.2019.04.02 (2019).
Article Google Scholar
Vo Thanh, H., Sugai, Y. & Sasaki, K. Application of artificial neural network for predicting the performance of CO2 enhanced oil recovery and storage in residual oil zones. Sci. Rep. 10(1), 18204. https://doi.org/10.1038/s41598-020-73931-2 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Le Van, S. & Chon, B. H. Evaluating the critical performances of a CO2-enhanced oil recovery process using artificial neural network models. J. Petrol. Sci. Eng. 157, 207–222. https://doi.org/10.1016/j.petrol.2017.07.034 (2017).
Article CAS Google Scholar
Kim, Y., Jang, H., Kim, J. & Lee, J. Prediction of storage efficiency on CO2 sequestration in deep saline aquifers using artificial neural network. Appl. Energy 185, 916–928. https://doi.org/10.1016/j.apenergy.2016.10.012 (2017).
Article ADS CAS Google Scholar
Ibrahim, A. F. Application of various machine learning techniques in predicting coal wettability for CO2 sequestration purpose. Int. J. Coal Geol. 252, 103951. https://doi.org/10.1016/j.coal.2022.103951 (2022).
Article CAS Google Scholar
Lv, Q. et al. On the evaluation of coal strength alteration induced by CO2 injection using advanced black-box and white-box machine learning algorithms. SPE J. https://doi.org/10.2118/218403-pa (2024).
Article Google Scholar
Ghasemian, N., Kalbasi, M. & Pazuki, G. Experimental study and mathematical modeling of solubility of CO2 in water: Application of artificial neural network and genetic algorithm. J. Dispersion Sci. Technol. 34(3), 347–355. https://doi.org/10.1080/01932691.2012.667293 (2013).
Article CAS Google Scholar
Hemmati-Sarapardeh, A., Amar, M. N., Soltanian, M. R., Dai, Z. & Zhang, X. Modeling CO2 solubility in water at high pressure and temperature conditions. Energy Fuels 34(4), 4761–4776. https://doi.org/10.1021/acs.energyfuels.0c00114 (2020).
Article CAS Google Scholar
Khoshraftar, Z. & Ghaemi, A. Prediction of CO2 solubility in water at high pressure and temperature via deep learning and response surface methodology. Case Stud. Chem. Environ. Eng. 7, 100338. https://doi.org/10.1016/j.cscee.2023.100338 (2023).
Article CAS Google Scholar
Jeon, P. R. & Lee, C.-H. Artificial neural network modelling for solubility of carbon dioxide in various aqueous solutions from pure water to brine. J. CO Utilization. 47, 101500. https://doi.org/10.1016/j.jcou.2021.101500 (2021).
Article CAS Google Scholar
Nakhaei-Kohani, R., Taslimi-Renani, E., Hadavimoghaddam, F., Mohammadi, M.-R. & Hemmati-Sarapardeh, A. Modeling solubility of CO2–N2 gas mixtures in aqueous electrolyte systems using artificial intelligence techniques and equations of state. Sci. Rep. 12(1), 3625. https://doi.org/10.1038/s41598-022-07393-z (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Mohammadian, E., Liu, B. & Riazi, A. Evaluation of different machine learning frameworks to estimate CO2 solubility in NaCl brines: Implications for CO2 injection into low-salinity formations. Lithosphere. 12, 1615832 (2022).
Article Google Scholar
Menad, N. A., Hemmati-Sarapardeh, A., Varamesh, A. & Shamshirband, S. Predicting solubility of CO2 in brine by advanced machine learning systems: Application to carbon capture and sequestration. J. CO2 Utilization. 33, 83–95. https://doi.org/10.1016/j.jcou.2019.05.009 (2019).
Article CAS Google Scholar
Raji, M., Dashti, A., Amani, P. & Mohammadi, A. H. Efficient estimation of CO2 solubility in aqueous salt solutions. J. Mol. Liquids 283, 804–815. https://doi.org/10.1016/j.molliq.2019.02.090 (2019).
Article CAS Google Scholar
Rostami, A., Arabloo, M., Kamari, A. & Mohammadi, A. H. Modeling of CO2 solubility in crude oil during carbon dioxide enhanced oil recovery using gene expression programming. Fuel 210, 768–782. https://doi.org/10.1016/j.fuel.2017.08.110 (2017).
Article CAS Google Scholar
Rostami, A., Arabloo, M., Lee, M. & Bahadori, A. Applying SVM framework for modeling of CO2 solubility in oil during CO2 flooding. Fuel 214, 73–87. https://doi.org/10.1016/j.fuel.2017.10.121 (2018).
Article CAS Google Scholar
Nakhaei-Kohani, R. et al. Chemical structure and thermodynamic properties based models for estimating nitrous oxide solubility in ionic Liquids: Equations of state and Machine learning approaches. J. Mol. Liquids 367, 120445. https://doi.org/10.1016/j.molliq.2022.120445 (2022).
Article CAS Google Scholar
Austin, W. H., Lacombe, E., Rand, P. W. & Chatterjee, M. Solubility of carbon dioxide in serum from 15 to 38 C. J. Appl. Physiol. 18(2), 301–304. https://doi.org/10.1152/jappl.1963.18.2.301 (1963).
Article CAS PubMed Google Scholar
Bando, S., Takemura, F., Nishio, M., Hihara, E. & Akai, M. Solubility of CO2 in aqueous solutions of NaCl at (30 to 60) °C and (10 to 20) MPa. J. Chem. Eng. Data 48(3), 576–579. https://doi.org/10.1021/je0255832 (2003).
Article CAS Google Scholar
Briones, J. A., Mullins, J. C., Thies, M. C. & Kim, B. U. Ternary phase equilibria for acetic acid-water mixtures with supercritical carbon dioxide. Fluid Phase Equilibria 36, 235–246. https://doi.org/10.1016/0378-3812(87)85026-4 (1987).
Article CAS Google Scholar
Chapoy, A., Mohammadi, A. H., Chareton, A., Tohidi, B. & Richon, D. Measurement and modeling of gas solubility and literature review of the properties for the carbon dioxide−water system. Ind. Eng. Chem. Res. 43(7), 1794–1802. https://doi.org/10.1021/ie034232t (2004).
Article CAS Google Scholar
Dodds, W. S., Stutzman, L. F. & Sollami, B. J. Carbon dioxide solubility in water. Ind. Eng. Chem. Chem. Eng. Data Series 1(1), 92–95. https://doi.org/10.1021/i460001a018 (1956).
Article CAS Google Scholar
Ferrentino, G., Barletta, D., Donsì, F., Ferrari, G. & Poletto, M. Experimental measurements and thermodynamic modeling of CO2 solubility at high pressure in model apple juices. Ind. Eng. Chem. Res. 49(6), 2992–3000. https://doi.org/10.1021/ie9009974 (2010).
Article CAS Google Scholar
Findlay, A. & Howell, O. R. XXXII—The solubility of carbon dioxide in water in the presence of starch. J. Chem. Soc. Trans. 107, 282–284. https://doi.org/10.1039/CT9150700282 (1915).
Article CAS Google Scholar
Findlay, A. & Williams, T. LXXI—The influence of colloids and fine suspensions on the solubility of gases in water. Part III. Solubility of carbon dioxide at pressures lower than atmospheric. J. Chem. Society Trans. 103, 636–645. https://doi.org/10.1039/CT9130300636 (1913).
Article CAS Google Scholar
Han, J. M., Shin, H. Y., Min, B.-M., Han, K.-H. & Cho, A. Measurement and correlation of high pressure phase behavior of carbon dioxide+water system. J. Ind. Eng. Chem. 15(2), 212–216. https://doi.org/10.1016/j.jiec.2008.09.012 (2009).
Article CAS Google Scholar
Koschel, D., Coxam, J.-Y., Rodier, L. & Majer, V. Enthalpy and solubility data of CO2 in water and NaCl(aq) at conditions of interest for geological sequestration. Fluid Phase Equilibria 247(1), 107–120. https://doi.org/10.1016/j.fluid.2006.06.006 (2006).
Article CAS Google Scholar
Li, Y.-H. & Tsui, T.-F. The solubility of CO2 in water and sea water. J. Geophys. Res. (1896–1977) 76(18), 4203–4207. https://doi.org/10.1029/JC076i018p04203 (1971).
Article ADS CAS Google Scholar
Li, Z., Dong, M., Li, S. & Dai, L. Densities and solubilities for binary systems of carbon dioxide + water and carbon dioxide + brine at 59 °C and pressures to 29 MPa. J. Chem. Eng. Data 49(4), 1026–1031. https://doi.org/10.1021/je049945c (2004).
Article CAS Google Scholar
Markham, A. E. & Kobe, K. A. The solubility of carbon dioxide and nitrous oxide in aqueous salt solutions. J. Am. Chem. Society 63(2), 449–454. https://doi.org/10.1021/ja01847a027 (1941).
Article CAS Google Scholar
Matouš, J., Šobr, J., Novak, J. & Pick, J. Solubility of carbon dioxide in water at pressures up to 40 atm. Collect. Czechoslovak Chem. Commun. 34(12), 3982–3985 (1969).
Article Google Scholar
Morgan, O. M. & Maass, O. An investigation of the equilibria existing in gas-water systems forming electrolytes. Can. J. Res. 5(2), 162–199. https://doi.org/10.1139/cjr31-062 (1931).
Article CAS Google Scholar
Morrison, T. J. & Billett, F. The salting-out of non-electrolytes. Part II. The effect of variation in non-electrolyte. J. Chem. Society (Resumed). 3819, 3822. https://doi.org/10.1039/JR9520003819 (1952).
Article Google Scholar
Müller, G., Bender, E. & Maurer, G. Das Dampf-Flüssigkeitsgleichgewicht des ternären Systems Ammoniak-Kohlendioxid-Wasser bei hohen Wassergehalten im Bereich zwischen 373 und 473 Kelvin. Berichte der Bunsengesellschaft für physikalische Chemie 92(2), 148–160. https://doi.org/10.1002/bbpc.198800036 (1988).
Article Google Scholar
Nighswander, J. A., Kalogerakis, N. & Mehrotra, A. K. Solubilities of carbon dioxide in water and 1 wt. % sodium chloride solution at pressures up to 10 MPa and temperatures from 80 to 200.degree.C. J. Chem. Eng. Data 34(3), 355–360. https://doi.org/10.1021/je00057a027 (1989).
Article CAS Google Scholar
Novak, J., Fried, V. & Pick, J. Löslichkeit des Kohlendioxyds in Wasser bei verschiedenen Drücken und Temperaturen. Collect. Czechoslovak Chem. Commun. 26(9), 2266–2270 (1961).
Article CAS Google Scholar
Qin, J., Rosenbauer, R. J. & Duan, Z. Experimental measurements of vapor-liquid equilibria of the H2O + CO2 + CH4 ternary system. J. Chem. Eng. Data 53(6), 1246–1249. https://doi.org/10.1021/je700473e (2008).
Article CAS Google Scholar
Sako, T. et al. Phase equilibrium study of extraction and concentration of furfural produced in reactor using supercritical carbon dioxide. J. Chem. Eng. Japan 24(4), 449–455. https://doi.org/10.1252/jcej.24.449 (1991).
Article ADS CAS Google Scholar
Servio, P. & Englezos, P. Effect of temperature and pressure on the solubility of carbon dioxide in water in the presence of gas hydrate. Fluid Phase Equilibria 190(1), 127–134. https://doi.org/10.1016/S0378-3812(01)00598-2 (2001).
Article CAS Google Scholar
Shedlovsky, T. & MacInnes, D. A. The first ionization constant of carbonic acid, 0 to 38°, from conductance measurements. J. Am. Chem. Society 57(9), 1705–1710. https://doi.org/10.1021/ja01312a064 (1935).
Article CAS Google Scholar
Takenouchi, S. & Kennedy, G. C. The binary system H2O–CO2 at high temperatures and pressures. Am. J. Sci. 262(9), 1055–1074 (1964).
Article ADS CAS Google Scholar
Teng, H., Yamasaki, A., Chun, M. K. & Lee, H. Solubility of liquid CO2 in water at temperatures from 278 K to 293 K and pressures from 6.44 MPa to 29.49 MPa and densities of the corresponding aqueous solutions. J. Chem. Thermodyn. 29(11), 1301–1310. https://doi.org/10.1006/jcht.1997.0249 (1997).
Article CAS Google Scholar
Valtz, A., Chapoy, A., Coquelet, C., Paricaud, P. & Richon, D. Vapour–liquid equilibria in the carbon dioxide–water system, measurement and modelling from 278.2 to 318.2K. Fluid Phase Equilibria 226, 333–344. https://doi.org/10.1016/j.fluid.2004.10.013 (2004).
Article CAS Google Scholar
Weiss, R. F. Carbon dioxide in water and seawater: The solubility of a non-ideal gas. Mar. Chem. 2(3), 203–215. https://doi.org/10.1016/0304-4203(74)90015-2 (1974).
Article CAS Google Scholar
Wiebe, R. & Gaddy, V. The solubility in water of carbon dioxide at 50, 75 and 100, at pressures to 700 atmospheres. J. Am. Chem. Society 61(2), 315–318 (1939).
Article CAS Google Scholar
Wiebe, R. & Gaddy, V. L. Vapor phase composition of carbon dioxide-water mixtures at various temperatures and at pressures to 700 atmospheres. J. Am. Chem. Society 63(2), 475–477. https://doi.org/10.1021/ja01847a030 (1941).
Article CAS Google Scholar
Yeh, S.-Y. & Peterson, R. E. Solubility of carbon dioxide, krypton, and xenon in aqueous solution. J. Pharm. Sci. 53(7), 822–824. https://doi.org/10.1002/jps.2600530728 (1964).
Article CAS PubMed Google Scholar
Zawisza, A. & Malesinska, B. Solubility of carbon dioxide in liquid water and of water in gaseous carbon dioxide in the range 0.2–5 MPa and at temperatures up to 473 K. J. Chem. Eng. Data 26(4), 388–391. https://doi.org/10.1021/je00026a012 (1981).
Article CAS Google Scholar
Zheng, D.-Q., Guo, T.-M. & Knapp, H. Experimental and modeling studies on the solubility of CO2, CHC1F2, CHF3, C2H2F4 and C2H4F2 in water and aqueous NaCl solutions under low pressures. Fluid Phase Equilibria 129(1), 197–209. https://doi.org/10.1016/S0378-3812(96)03177-9 (1997).
Article CAS Google Scholar
Liu, Y., Hou, M., Yang, G. & Han, B. Solubility of CO2 in aqueous solutions of NaCl, KCl, CaCl2 and their mixed salts at different temperatures and pressures. J. Supercrit. Fluids 56(2), 125–129. https://doi.org/10.1016/j.supflu.2010.12.003 (2011).
Article CAS Google Scholar
Belyadi, H. & Haghighat, A. Machine Learning Guide for Oil and Gas Using Python (Elsevier, New York, 2021).
Google Scholar
Natekin, A. & Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013).
Article PubMed PubMed Central Google Scholar
Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54(3), 1937–1967 (2021).
Article Google Scholar
Chahar, J., Verma, J., Vyas, D. & Goyal, M. Data-driven approach for hydrocarbon production forecasting using machine learning techniques. J. Petrol. Sci. Eng. 217, 110757 (2022).
Article CAS Google Scholar
Otchere, D. A., Ganat, T. O. A., Ojero, J. O., Tackie-Otoo, B. N. & Taki, M. Y. Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. J. Petrol. Sci. Eng. 208, 109244 (2022).
Article CAS Google Scholar
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Ann. Stat. 1189–1232 (2001).
Larestani, A., Mousavi, S. P., Hadavimoghaddam, F. & Hemmati-Sarapardeh, A. Predicting formation damage of oil fields due to mineral scaling during water-flooding operations: Gradient boosting decision tree and cascade-forward back-propagation network. J. Petrol. Sci. Eng. 208, 109315 (2022).
Article CAS Google Scholar
Fan, J. et al. Light gradient boosting machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric. Water Manag. 225, 105758 (2019).
Article Google Scholar
Gu, Y. et al. Data-driven estimation for permeability of simplex pore-throat reservoirs via an improved light gradient boosting machine: A demonstration of sand-mud profile, Ordos Basin, northern China. J. Petrol. Sci. Eng. 217, 110909 (2022).
Article CAS Google Scholar
Gu, Y., Zhang, D., Lin, Y., Ruan, J. & Bao, Z. Data-driven lithology prediction for tight sandstone reservoirs based on new ensemble learning of conventional logs: A demonstration of a Yanchang member, Ordos Basin. J. Petrol. Sci. Eng. 207, 109292 (2021).
Article CAS Google Scholar
Taha, A. A. & Malebary, S. J. An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access 8, 25579–25587 (2020).
Article Google Scholar
Mahdaviara, M., Sharifi, M., Bakhshian, S. & Shokri, N. Prediction of spontaneous imbibition in porous media using deep and ensemble learning techniques. Fuel 329, 125349 (2022).
Article CAS Google Scholar
Zhou, B. et al. Pressure of different gases injected into large-scale coal matrix: analysis of time–space dependence and prediction using light gradient boosting machine. Fuel 279, 118448 (2020).
Article CAS Google Scholar
Hosseinzadeh, M. & Hemmati-Sarapardeh, A. Toward a predictive model for estimating viscosity of ternary mixtures containing ionic liquids. J. Mol. Liq. 200, 340–348. https://doi.org/10.1016/j.molliq.2014.10.033 (2014).
Article CAS Google Scholar
Sabirzyanov, A. N., Il’in, A. P., Akhunov, A. R. & Gumerov, F. M. Solubility of water in supercritical carbon dioxide. High Temperature 40(2), 203–206. https://doi.org/10.1023/A:1015294905132 (2002).
Article Google Scholar
P. Rousseeuw & A. Leroy. Robust regression and outlier detection: Wiley Interscience. New York (1987).
Rousseeuw, P. J. & van Zomeren, B. C. Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639. https://doi.org/10.1080/01621459.1990.10474920 (1990).
Article Google Scholar
C. R. Goodall. 13 computation using the QR decomposition. in Handbook of Statistics, vol. 9. (Elsevier, 1993), 467–508.
Gramatica, P. Principles of QSAR models validation: Internal and external. QSAR Combinatorial Sci. 26(5), 694–701 (2007).
Article CAS Google Scholar
A. H. Sarapardeh, A. Larestani, N. A. Menad, & S. Hajirezaie. Applications of Artificial Intelligence Techniques in the Petroleum Industry. (Gulf Professional Publishing, 2020).
Lv, Q. et al. Application of group method of data handling and gene expression programming to modeling molecular diffusivity of CO2 in heavy crudes. Geoenergy Sci. Eng. 237, 212789. https://doi.org/10.1016/j.geoen.2024.212789 (2024).
Article CAS Google Scholar
Lv, Q. et al. Modelling CO2 diffusion coefficient in heavy crude oils and bitumen using extreme gradient boosting and Gaussian process regression. Energy 275, 127396. https://doi.org/10.1016/j.energy.2023.127396 (2023).
Article CAS Google Scholar
Salehi, K., Rahmani, M. & Atashrouz, S. Machine learning assisted predictions for hydrogen storage in metal-organic frameworks. Int. J. Hydrogen Energy 48(85), 33260–33275. https://doi.org/10.1016/j.ijhydene.2023.04.338 (2023).
Article CAS Google Scholar
Zheng, H., Mahmoudzadeh, A., Amiri-Ramsheh, B. & Hemmati-Sarapardeh, A. Modeling viscosity of CO2–N2 Gaseous mixtures using robust tree-based techniques: extra tree, random forest, GBoost, and LightGBM. ACS Omega 8(15), 13863–13875. https://doi.org/10.1021/acsomega.3c00228 (2023).
Article CAS PubMed PubMed Central Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Petroleum Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
Atena Mahmoudzadeh, Behnam Amiri-Ramsheh & Abdolhossein Hemmati-Sarapardeh
Department of Chemical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Saeid Atashrouz
College of Engineering and Technology, American University of the Middle East, 54200, Egaila, Kuwait
Ali Abedi & Meftah Ali Abuswer
Institute of Geosciences, Marine and Land Geomechanics and Geotectonics, Christian-Albrechts-Universität, 24118, Kiel, Germany
Mehdi Ostadhassan
Department of Chemical Engineering, McGill University, Montreal, QC, H3A 0C5, Canada
Ahmad Mohaddespour

Authors

Atena Mahmoudzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Behnam Amiri-Ramsheh
View author publications
You can also search for this author in PubMed Google Scholar
Saeid Atashrouz
View author publications
You can also search for this author in PubMed Google Scholar
Ali Abedi
View author publications
You can also search for this author in PubMed Google Scholar
Meftah Ali Abuswer
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Ostadhassan
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Mohaddespour
View author publications
You can also search for this author in PubMed Google Scholar
Abdolhossein Hemmati-Sarapardeh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.M.: Writing – original draft, Methodology, Investigation, Visualization, data, Software B.A.: Writing – original draft, Methodology, Visualization, Software, S.A.: Writing-Review & Editing, Validation, Conceptualization, Methodology A.A.: Writing-Review & Editing, Validation, Conceptualization, M.A.: Validation, Conceptualization, data, M.O.: Methodology, Validation, Supervision, Writing-Review & Editing. A.M.: Writing-Review & Editing, Validation, Conceptualization, A.H.: Methodology, Validation, Supervision, Writing-Review & Editing, Software.

Corresponding authors

Correspondence to Mehdi Ostadhassan, Ahmad Mohaddespour or Abdolhossein Hemmati-Sarapardeh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mahmoudzadeh, A., Amiri-Ramsheh, B., Atashrouz, S. et al. Modeling CO₂ solubility in water using gradient boosting and light gradient boosting machine. Sci Rep 14, 13511 (2024). https://doi.org/10.1038/s41598-024-63159-9

Download citation

Received: 07 September 2023
Accepted: 26 May 2024
Published: 12 June 2024
DOI: https://doi.org/10.1038/s41598-024-63159-9
Springer Nature Limited

Modeling CO₂ solubility in water using gradient boosting and light gradient boosting machine

Abstract

Similar content being viewed by others

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state

Modeling solubility of CO2–N2 gas mixtures in aqueous electrolyte systems using artificial intelligence techniques and equations of state

Introduction