Introduction

Natural gas is one of the major naturally occurring and efficient energy sources, consisting of a complex mixture of light hydrocarbon compounds (C1–C11, where methane > 90% v/v) and a minor inorganics (non-hydrocarbon) compound such as (CO2, N2, H2S, and He) as well as a few heavy metals1,2,3. It is considered a subcategory of petroleum fluids which is a versatile, abundant, and clean energy source for both domestic and industrial processes owing to smaller greenhouse gas emissions and air pollution4. Global energy demand increases annually which led to the utilization of virgin HPHT reservoirs5. A precise estimation of the natural gas thermophysical properties and other hydrocarbon components including gas viscosity is crucial for gas engineering problems and physicochemical calculations2. Viscosity denotes the fluid’s resistance to the shearing action caused by molecular diffusion, which transfers momentum between flow layers6. It is an efficient theoretical and practical property for gas processes optimization including phase behavior analysis, the study of single-phase equations, reservoir simulation and characterization, estimation of pressure gradient for gas wells, fluid-flow dynamics in transportation pipelines, optimal exploitation, multiphase flow in porous media through gas and oil reservoirs, enhanced oil recovery7,8,9, gas reserves estimation, and molecular interactions including transport of momentum occur in a fluid motion and tubing1,6,10. For HPHT gas reservoirs, low gas viscosity errors adversely affect the inflow performance relationship (IPR) curves and, as a result, the reserves estimates11. Gases exhibit non-traditional reactions at high and low pressures. Under lower pressure, the gas viscosity increases with temperature, since elevated temperature degrees increase the gas molecules’ interactions, leading to reduced gas stream flow. An increase in viscosity is indicative of this effect. Conversely, as gases compress and their molecular distances decrease, they exhibit liquid behavior at higher pressures. Furthermore, the tight bonds that hold gas molecules together loosen as the temperature rises, which reduces the viscosity of gas flow3. These variations in natural gas viscosity as a measure of reservoir temperature and pressure, in addition to an extensive spectrum of possible gas combinations, impede some degree of measurement error3,12. Therefore, it is calculated theoretically by leveraging equations of state (EOS) relevant to the reduced/critical properties and gas composition, or empirical correlations as a dependence of temperature, pressure, molecular weight, gas density, and gravity1,2. When working with gas mixtures, EOS may yield low accuracy, whereas empirical correlations can involve one or two-step techniques1.

Several pieces of literature discuss gas viscosity correlations and mathematical models. Fayazi et al.6 and Dargahi-Zarandi et al.3 summarize the most common gas viscosity correlations. In the following discussion, the authors detailed the published correlations that considered gas viscosity.

Bicher Jr and Katz13 introduced the first gas viscosity correlation as a function of pressure ranges from (400–500) psia, temperature (77–437 °F), and molecular weight with a 5.8% average deviation. They employed a rolling ball inclined tube viscometer to estimate the viscosities of CH4, C3H8, and four binary mixtures of CH4. Smith and Brown14 approximate the fluid viscosity through EOS. Their theorem is based on the similarity of the physical properties of different substances with similar values of reduced temperature (Tr) and pressure (Pr). Their correlation was based on replacing the molecular weight with the molar average molecular weight and replacing reduced temperature and pressure with pseudo-critical properties. Their results showed that the correlation is not applicable for methane and gives a better estimation for liquid than gas viscosities. Smith also stated that the correlation is well-fitted for C2H6 and higher paraffin viscosities. Comings et al.15 proposed a graphical correlation using data of CO2, N2, NH3, H2O, CH4, C3H8, and C4H10 viscosities to approximate the gas viscosity through reduced pressure (Pr) charts and viscosity ratio. They assessed the viscosity of N2, CO2, CH4, C2H6, and C3H8 at (P = 14.7–14,196 psia and, T = 14 to 374 °F) and improved their previous correlation by manipulating the analogy between the kinetic pressure and the viscosity. Their improved correlation was found to be applicable with pure gases and the results showed that the highest P and T for the CH4 viscosity are 2514 psia and 203 °F respectively. Also, their correlation gives a 10–12% error for C2H6 viscosity at (Pr > 2 psia). Carr et al.16 presented a correlation known as (CKB) as a function of Ppr, Tpr, and viscosity ratio, to predict the gas viscosity at (T = 32–400 °F, P < 1200 psia, and gas gravities = 0.55–1.55). Their correlation is built upon 30 data points including the measurements of pure components such as N2, CO2, CH4, C2H4, C3H8, and natural gases. Jossi et al.17 built a correlation referred to as JST for calculating the gas mixture’s viscosity based on experimental measurement of pure components including Ar, N2, O2, CO2, SiO2, (C1–C5) hydrocarbons. The correlation between the reduced density \(({\rho }_{r})\) and the residual viscosity modulus (\(({\mu }_{g}-{\mu }^{*})\varepsilon\) is the basis of the suggested correlation and the gas properties including Pc, Tc, density, and molecular weight are considered inputs. Lohrenz et al.18 developed an empirical correlation referred to as LBC for gas mixture viscosity estimation. Their model used the same equation and coefficient for pure fluid in the Jossi et al.17 correlation and they concluded that their model is highly reliant on the density measurements2. Dempsey19 approximates the gas viscosity ratio at a specified pressure to the gas viscosity at ambient conditions. Lee et al.20 derived a correlation relevant to gas density, temperature, and gas molecular weight to forecast gas viscosities at reservoir conditions2. The correlation is accurate for estimating the viscosity of natural gas below HPHT. Londono21 optimized both the JST and LGE correlations using a total of 13,656 data points and anticipated two models for gas viscosity prediction. Then another new optimized empirical formula for gas viscosity relevant to the gas density and temperature was developed. Jeje and Mattar22 paralleled the LGE correlation to Carr et al.16 for sour and sweet gases. The results showed that the two correlations give identical results for sweet gases. Whereas, for sour gases, there is a significant difference between the compared correlations. Sutton23 correlated viscosities of gas condensate and associated gases as a function of CH4, C3H8, CH4/C3H8, CH4/C4H10, CH4/C10H22, and natural gas viscosity. They estimated the gas condensate behavior by utilizing the CH4/C10H22 binary mixtures. Viswanathan24 implemented NIST values at (T = 100–400°F and P = 5000 to 30,000 psia) to improve LGE correlation at HPHT conditions. Ohirhian and Abu25 estimated the natural gas viscosity using experimental data values obtained from routine PVT analysis of the Nigerian crude oil at reservoir conditions (Pres = 144–4100 psia and Tres = 130–220 °F). The correlation was also compared with the viscosity values derived from Carr et al.16 charts and Lee et al.20 viscosity equation, and the results revealed that the new correlation showed precise results.

Recently, computational techniques, mathematical models, and equations of states (EOSs) have been implemented in various industrial applications including petroleum processes26,27. Various predictive models including conventional and hybrid ANNs models, and adaptive neuro-fuzzy inference system models6 have been estimated. Londono et al.28 established a gas viscosity correlation relevant to gas density, and reservoir temperature, considering non-hydrocarbon compounds including carbon dioxide, helium, and nitrogen, with an AARE of 3.05%. Quraishi and Shokir2 developed two gas viscosity and density models through genetic programming (GP) and alternating conditional expectation (ACE) algorithms using 4445 datasets with an AARE of 3.95% and 5.4%, respectively. Yang et al.29 predict gas viscosity for gas samples containing high carbon dioxide concentrations. The formulated correlation is built with 1,539 data points of 9 natural gas samples measured at 250 to 450 K and 0.10 to 140 MPa. The correlation reported an (ARE < 0.98%) between the calculated and the experimental values. Dargahi-Zarandi et al.3 developed a robust model algorithm based on the GMDH neural network for estimating sour and natural gas viscosity as a function of Tpr, Ppr, molecular weight, and density using more than 3800 data points reported in the literature. Rostami et al.1 modeled the natural gas viscosities using integrated smart intelligent strategies with optimized algorithms, where more than 3800 data points were screened for their (Tpr, Ppr, and molecular weight) to develop the required models.

The fuzziness, non-linearity, and complexity of the reservoir behavior necessitate a prevailing tool to overwhelm these challenges6,12. In this regard, soft computations and machine learning gained incremental spread in petroleum and chemical engineering issues12,30,31,32,33. These approaches include various predictive models such as ANN, GP, ANFIS, SVM, and their optimized hybrids1,34,35,36. ANNs used to outperform multivariate nonlinear regression processes37. This work introduces the utilization of machine learning algorithms to precisely estimate the viscosities of CH4, N2, and gas mixture under normal and harsh conditions relevant to pressure, temperature, and gas density based on the measured PVT database reported in the literature to develop and test the models. The network’s prediction was validated against the published one. The results displayed better performance for the developed models concerning viscosities of CH4, N2, and gas mixture. Seven machine learning techniques specifically Linear Regression, Random Forest (RF), Decision Trees (DT), Gradient Boosting, K-Nearest Neighbors (KNN), Nu Support Vector Regression (NuSVR), and artificial neural network (ANN) were applied for the prediction of methane (CH4), nitrogen (N2), and natural gas mixture viscosities. The reasons behind using the considered ML types in modeling the gas viscosity lie in the viscosities of gas mixtures exhibit non-linear behavior under different conditions. Moreover, using machine learning models can help uncover and model these complicated relationships. The considered Machine learning models offer flexibility in modeling different types of data and handling various inputs. Machine learning models can handle multiple input variables and their interactions efficiently, potentially providing accurate predictions across diverse scenarios. Furthermore, machine learning is a data-driven approach, meaning it can learn patterns and relationships directly from the available data.

Unlike most of the literature on ML models, the viscosity of three distinct gases is predicted using seven different machine-learning techniques. Previous research focuses only on the modeling of gas mixture viscosity2,38. In addition, seven different machine-learning techniques were utilized for the prediction of gas viscosity. The previous research was mostly directed only at artificial neural networks (ANNs) and adaptive neuro-fuzzy inference systems (ANFIS)1,2,39,40. To the best of the author’s knowledge limited studies have applied ML techniques for Methane and Nitrogen viscosity prediction. Almost all viscosity models developed using ML algorithms are suitable for gas mixtures. Moreover, no work has been quoted in the literature that predicts methane, nitrogen, and natural gas mixture viscosities under normal and harsh conditions using seven distinct machine-learning techniques.

Materials and methods

This article utilizes statistical approaches for analyzing the results of published PVT experiments for CH4, N2, and gas mixture at standard conditions as well as severe conditions relevant to high temperature and pressure. The goal is to clarify the significance of important elements (pressure, temperature, and gas density) on the viscosity of CH4, N2, and gas mixture during PVT experiments. The research’s methodological approach is shown in Fig. 1. Initially, the published experimental PVT data and results were gathered. Additionally, the database was shown and explained using a histogram including pressure, temperature, and gas density together with the viscosity of CH4, N2, and gas mixture. Moreover, relationships between viscosity versus pressure, temperature, and gas density were established using sensitivity analysis, analysis of variance, factorial design, contour plot, and main effects plot. Finally, ML algorithms were applied to predict the viscosity of CH4, N2, and gas mixture at standard and harsh conditions of temperature and pressure based on PVT data experiments.

Figure 1
figure 1

Framework of research methodology.

The novelty of this research exists in the approach to present an advanced Machine learning model to predict methane, nitrogen, and natural gas mixture viscosities under a wide range of data. More than 4304 datasets from real experimental data utilizing pressure, temperature, and gas density were employed for developing ML models. Moreover, the article exhibits additional statistical approaches by which relationships between viscosity versus pressure, temperature, and gas density were established. The article also provides three new ANN-based correlations related to methane, nitrogen, and natural gas mixture viscosities. Therefore, this research is remarkable in its approach, as it extensively predicts methane, nitrogen, and natural gas mixture viscosities using seven distinct machine-learning algorithms for the first time.

Exploratory data analysis

Establishing an accurate predictive machine learning model (ML) strongly relies on the quality of the input dataset. The accurate prediction capability of ML models is determined by the effect of input feature significance on the output parameters. Analysis of variance (ANOVA) test was employed using Minitab software, to analyze several variables of the generated model. The relationship between the viscosities of the methane, nitrogen, and natural gas mixture and the number of independent variables was examined using the values of P and F. If the P-value < 0.05, the factor’s statistical importance is proven. To highlight the impact of different independent parameters even more on the methane, nitrogen, and natural gas mixture viscosities, factorial, contour plots, and main effects plots were analyzed.

Machine learning (ML) models

Seven machine learning algorithms have been applied to predict methane, nitrogen, and natural gas mixture viscosities under normal and harsh conditions: Linear Regression, Decision Trees (DT), Gradient Boosting, Random Forest (RF), Nu Support Vector Regression (NuSVR), Artificial Neural Network (ANN), and K-Nearest Neighbors (KNN). The definitions of such ML are as follows:

  1. 1.

    Linear regression is a statistical technique that uses the association between two or more quantitative variables to predict a response or outcome variable based on the others30. This methodology is frequently utilized in numerous disciplines.

  2. 2.

    The decision tree (DT) is a flowchart-like model, where the internal node signifies an attribute test, each branch signifies a test result, and each leaf node provides a class label or prediction41. The DT is simple to grasp and analyze, and it can deal with both category and numerical data. Furthermore, the DT can be utilized for classification and regression problems, as well as a foundation for more complex ensemble approaches42.

  3. 3.

    Random forest is an ensemble learning method that allows forecasts by combining many Decision trees. It generates many Decision trees and outputs the mode or mean prediction of each tree42,43. By incorporating randomization into the tree-building process, the Random Forest avoids the overfitting propensity of Decision trees and increases generalization.

  4. 4.

    Gradient boosting is a popular ensemble method for integrating numerous weak models to build a strong prediction model44. Gradient boosting also contains regularization techniques to prevent overfitting, as well as parallel processing and tree pruning, making it efficient and very accurate. It has gained traction in numerous machine learning competitions and is frequently utilized in the industry.

  5. 5.

    Support vector regression is a machine learning theory that is founded on the notion of structural risk minimization45. Support vector regression methods are a promising data mining and knowledge-finding technique. In the 1990s, support vector regression was introduced as a nonlinear solution for regression analysis. It was created for pattern recognition tasks using the paradigm of statistical learning theory. Support vector regression has gathered considerable interest as it has been recognized to be very robust for modeling numerous complex problems42.

  6. 6.

    K-nearest neighbour (KNN) is a widely used statistical method for pattern recognition and classification that is quick and easy to use in several different fields46. K-Nearest Neighbour is built on the premise that similar input points will produce similar outputs. These related input points are first sorted into groups in n-dimensional space. For new point output, the k points that are closest to the new point are selected, and the group with the highest number of points near the new point is found by analysis. After that, the points are tallied, and the group with the greatest number of points within the hypercube is approximated. Finally, the new point is allocated to that group, and the cluster prediction method is used to anticipate its output.

  7. 7.

    Artificial neural network (ANN) is an ML model that was stimulated by the anatomy of the human brain30,47. The ANN is made up of linked neurons that are organized into layers31. Each neuron processes input data and sends it to the next layer, allowing for complicated computations48. The ANN may learn from a variety of data and be applied to tasks such as classification, regression, and pattern recognition. Fully connected neural networks are useful in petroleum-related applications where the inputs and outputs have nonlinear correlations32,49,50.

Models evaluation and error analysis

For examining the model applicability and precision, several traditional statistical measures and graphical error analyses are used to assess the accuracy, validity, and reliability of the produced models, and predict the performance of the created machine learning algorithms12,51. Furthermore, relative error distribution graphs and Cross-plots were also employed.

Statistical error analysis

It is a vital tool to assess the model’s performance relative to the experimental values52. In the current work, the developed paradigms’ reliability was screened by some statistical principles involving (ARE%)1, (AARE%), (RMSE), (SD), and (R2)12. The following formulas are used to compute these statistical parameters3,4,37.

$$ARE\% = \frac{100}{N}\sum\limits_{i = 1}^{N} {\left( {\frac{{\mu_{i}^{\exp } - \mu_{i}^{cal} }}{{\mu_{i}^{\exp } }}} \right)}$$
(1)
$$AARE\% = \frac{100}{N}\sum\limits_{i = 1}^{N} {\left| {\frac{{\mu_{i}^{\exp } - \mu_{i}^{cal} }}{{\mu_{i}^{\exp } }}} \right|}$$
(2)
$$SD = \left[ {\frac{1}{N - 1}\sum\limits_{i = 1}^{N} {\left( {\frac{{\mu_{i}^{\exp } - \mu_{i}^{cal} }}{{\mu_{i}^{\exp } }}} \right)}^{2} } \right]^{1/2}$$
(3)
$$RMSE = \left[ {\frac{{\sum\limits_{i = 1}^{N} {\left( {\mu_{i}^{\exp } - \mu_{i}^{cal} } \right)^{2} } }}{N}} \right]^{1/2}$$
(4)
$$R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{N} {(\mu_{i}^{\exp } - \mu_{i}^{cal} )^{2} } }}{{\sum\limits_{i = 1}^{N} {\left[ {\mu_{i}^{\exp } - avg(\mu_{i}^{cal} )} \right]^{2} } }}$$
(5)
$$MAD = \max \left( {\frac{{\mu_{i}^{\exp } - \mu_{i}^{cal} }}{{\mu_{i}^{\exp } }}} \right)$$
(6)

Graphical error evaluation

The model performance is visualized through graphical error analyses, where crossplots and error distribution are reported3. Three visualization techniques, including cross plots, cumulative frequency plots versus AARE%, and error distribution curves, were implemented to screen the model accuracy4.

Cross plot

The predicted/experimental data points are plotted versus each other to assess the model’s competence in the prediction of the experimental one3. The diagram capability is evaluated through the grade of data gathering around the equality line and the deviation from the 45° line.

Error distribution diagram

It displays the error distribution around the zero-error line3,4, where the benchmark is the neighboring data amount around the zero-deviation line1. The gathering of predictions around the zero-error line is very reasonable.

Cumulative frequency versus AARE%

It is the sum of classes in a frequency distribution (i.e. adding up a value and all of the values that came before it4.

Results and discussion

Data acquisition and analysis

Reliable data were gathered from the published literature51. The dataset collected from literature relevant to the viscosities of CH4, N2, and natural gas mixture with their correlating parameter ranges are indicated in Table 1. Methane and nitrogen viscosities are correlated relevant to reservoir temperature (Tres) and pressure (Pres), while the viscosity of the natural gas mixture is correlated relative to reservoir pressure, temperature, and gas density. The collected data were treated to develop an ANN model for viscosity calculation.

Table 1 Ranges of correlating parameters for methane, nitrogen, and natural gas viscosities.

The dataset was graphically displayed, and the sampling distribution was explained using histograms. Figure 2A–C shows the histograms of the data sets for methane case (pressure, temperature, and viscosity) respectively. The datasets encompass a variety of pressure from 14.7 psia up to 25,000 psia (harsh conditions). However, most of the data points range from 14.7 and 10,000 psi. Furthermore, the data on temperature showed a large range of values (77–482 °F). The majority dataset of temperature falls between 100 and 350 °F. Regarding methane viscosity, the most common values varied from 0.011 to 0.05 cp. Figure 2D–F shows the histograms of the data sets for nitrogen viscosity. A variety of pressures are covered by the datasets, ranging from 14.7 psia up to 24,000 psia (harsh conditions) but most of the data points lie between 14.7 and 12,000 psi. Likewise, a large range was present in the temperature data (− 343 to 1880 °F). Most temperature data is in the range of − 340 to 350 °F. The most frequent values for methane viscosity ranged from 0.006 to 0.0.07cp. Figure 2G–I depict the histograms of pressure, density, temperature, and viscosity for mixture gas from the reported studies. The databank covers a range of pressure from 14.7 psia up to 25,000 psia (harsh conditions). However, most of the data points range from 14.7 to 10,000 psi. Similarly, the data on temperature showed a large range of values. (− 8.6 to 360 °F). The most frequent values for gas mixture viscosity ranged from 0.01 to 0.1cp. Furthermore, the density of gas mixture data had an extensive range of values (0.02 to 0.6 lb/ft3).

Figure 2
figure 2

Histogram plots based on the gathered data from literature: (A) methane pressure, (B) methane temperature, (C) methane viscosity, (D) nitrogen pressure, (E) nitrogen temperature, (F) nitrogen viscosity, (G) natural gas mixture pressure, (H) natural gas mixture temperature, and (I) natural gas mixture viscosity data.

Sensitivity analysis

Sensitivity analysis relies on the absolute relevancy factor (r) analysis and is used to evaluate the effect of input parameters on the viscosities of methane, nitrogen, and natural gas mixture3,32. The effect of the input parameter is directly correlated with its relevancy factor’s absolute value as follows4:

$$r(I_{j} ,p) = abs\left( {\frac{{\sum\limits_{i = 1}^{N} {(I_{j,i} - I_{j} )(p_{i} - \overline{p})} }}{{\sqrt {\sum\limits_{i = 1}^{N} {(I_{j,i} - I_{j} )^{2} \sum\limits_{i = 1}^{N} {(p_{i} - \overline{p})^{2} } } } }}} \right)$$
(7)

(i)subscript, The data point index; Ij,i, is jth input parameter; Ij, Corresponds to the average value of Ij, I; P and, Correspond to the predicted value and its average, respectively.

Nitrogen and methane viscosities are directly proportional to the pressure with a correlation coefficient of 4% and 97% respectively, and inversely proportional to temperature with a correlation coefficient of 27% and 10% respectively. The viscosity of the natural gas mixture is directly proportional to the pressure with a correlation coefficient of 95%, temperature with a correlation coefficient of 52%, and gas density with a correlation coefficient of 80%. Figure 3a–c display the implications of independent correlating parameters (pressure, temperature, and gas density) on the viscosity values of N2, CH4, and gas mixture, where (P, T) has the highest impact in the case of N2, CH4 viscosity, while gas density greatly affects in case of gas mixture viscosity2,4. Correlation coefficients range from (0–1), whereas 1.0 indicates a strong correlation, while 0.0 indicates no correlation between the input and the targeted variables53,54.

Figure 3
figure 3

Sensitivity of (a) methane, (b) nitrogen, (c) natural gas mixture viscosities to pressure and temperature.

Sensitivity analysis aims to measure the model’s sensitivity towards the independent variables. The variable’s influence on the prediction decreases with decreasing percent values for the designated independent variables. The analysis results assign the selection of independent variable sets for more precise prediction. A low-impact variable can be removed. The relative impact of the trained set variables may vary more in different data sets. The statistical parameters of the utilized datasets modeling through this study are summarized in Table 2. The important parameters are skewness, and kurtosis which are defined as:

  1. i.

    Skewness which signifies the property asymmetry degree relevant to its mean value. Skewness ranges from (− 1) to (0) to (+ 1), whereas zero skewness corresponds to the normal distribution, while (+ 1 and − 1) correspond to non-normal probability distribution.

  2. ii.

    Kurtosis describes the distribution of data with the normal probability shape. The normal distribution has zero kurtosis. Furthermore, compared to the normal distribution, the positive and negative kurtosis correspond to a more peaked and flatter distribution1.

Table 2 Statistical parameters of datasets used to develop ANN model for methane, nitrogen, and natural gas mixture respectively.

Investigation and screening of the impact of key factors on the gas viscosity

The pressure and temperature values greatly influence the gas viscosity. Consequently, statistical approaches were utilized to screen the highest meaningful factors. These statistical analyses were performed using Minitab software55.

The analysis of variance (ANOVA)

ANOVA is a statistical approach used in analyzing many problems in the upstream oil industry34,56. In this regard, Methane, nitrogen, and mixture gas were the three types of gases whose viscosity was studied. ANOVA was used to obtain a quantitative interpretation of the investigated parameters. The results of ANOVA are exhibited in Tables 3, 4, and 5. As displayed in the first analysis (methane viscosity), P, and T have a significant impact on the viscosity of methane based on the P-value of to a lesser extent than 0.05. Furthermore, pressure displays the greatest F-value (119,178.47), which indicates the stronger influence of such a parameter compared with the temperature parameter. In contrast, for the second analysis (nitrogen viscosity), the temperature has a meaningful effect on the viscosity of nitrogen based on the P-value and the F-values of 0 and 25.99 respectively. According to the P-value of less than 0.05 for the third case (gas mixture), temperature, pressure, and gas density, all significantly affect the gas mixture’s viscosity. Furthermore, the pressure displays the highest F-value (33,426.02) which indicates the strongest influence of the pressure parameter compared with other parameters, which also reflects its significance on gas viscosity.

Table 3 Analysis of variance for methane viscosity.
Table 4 Analysis of variance for nitrogen viscosity.
Table 5 Analysis of variance for gas mixture viscosity.

Factorial design

Factorial design is an important statistical method for optimizing and simplifying laboratory experiments by investigating the effects of multiple controllable factors on interest responses34,57. These statistical evaluations of gas viscosity were conducted using Minitab software55. Additionally, the results of the factorial design are shown in the Pareto chart. The Pareto plot uses horizontal bars to show the influences of the parameters from maximum impact to smallest impact. To further indicate which factors are statistically significant, a reference line is drawn on the Pareto chart. In the current analysis, the assessment was achieved using pressure, temperature, and gas density. The factorial design was performed for the three cases: methane, nitrogen, and gas mixture. The factorial design-generated Pareto chart for the methane viscosity results is shown in Fig. 4a. Pressure is found to have the most significant effect on the performance of methane viscosity. Figure 4b describes the Pareto chart for the results of nitrogen viscosity, where temperature was found to have the most dominant performance in nitrogen viscosity. Figure 4c exhibits the Pareto chart generated by factorial design for the results of mixture gas viscosity. According to the Pareto graph, the factors that do better than the reference line are statistically significant. Therefore, it is apparent that pressure, gas density, and temperature were found to be the leading parameters respectively in the viscosity of the gas mixture. In conclusion, the factorial design statistical analysis agrees with the ANOVA test results.

Figure 4
figure 4

Pareto chart for results of (a) methane viscosity, (b) nitrogen viscosity, and (c) gas mixture viscosity.

Contour plots

A contour plot was employed to evaluate the influence of input variables on methane viscosity. Two parameters’ results were scrutinized concurrently on the methane viscosity values. Figure 5a shows the contour plot for methane viscosity to analyze the mutual effects of pressure and temperature. The contour plot reveals that the highest values of methane viscosity > 7.0 cp were found in the elevated pressure (above 22,000 psi) and low-temperature values. In contrast, the lowest values of methane viscosity < 0.02 cp were found in the low pressure (above 3000 psi) and all ranges of temperatures. Likewise, Fig. 5b shows the surface plot for nitrogen viscosity. The valley in the lower region of the graph represents the highest values of nitrogen viscosity > 0.06 cp, which represents all ranges of pressure and lower values of temperature. At the constancy of pressure and increasing temperature, the nitrogen viscosity moves to the lower regions of viscosity. Therefore, the upper valley of the graph represents the lowest values of nitrogen viscosity, which represents all ranges of pressure and the highest values of temperature. Likewise, Fig. 5c shows the mapping of pressure and temperature for mixture gas viscosity represented by a contour plot. As shown in the contour plot, the region of the left part of the graph represents the lowest values of nitrogen viscosity < 0.04 cp, which represents all ranges of temperature and lower values of pressure. Additionally, the corner in the lower right region of the graph represents the highest values of gas mixture viscosity > 0.1 cp, which represents higher values of pressure and low-mid values of temperature.

Figure 5
figure 5

Contour map of viscosity correlated with T and P: (a) methane viscosity, (b) nitrogen viscosity, and (c) gas mixture viscosity.

Main effect plots

Figure 6a shows the main effects plot for the methane viscosity. The red line horizontal line represents the mean value of methane viscosity. At a low-pressure level, the viscosity is below the mean value then methane viscosity increases with increasing pressure until reaches its maximum value at the highest level of pressure. In contrast, at a low-temperature level, the methane viscosity is above the mean value, then methane viscosity decreases with increasing temperature until reaches its lowest value at the highest level of temperature. The graph implies that pressure is the greatest powerful factor in methane viscosity. Likewise, the effects of P and T on nitrogen and gas mixture viscosity (Fig. 6b,c) respectively exhibit the same performance as in the case of methane. Furthermore, at a low level of gas density, the viscosity is lower than the mean value, then gas mixture viscosity improves with rising the pressure until reaches its maximum value at the highest level of gas density.

Figure 6
figure 6

Main effect plot: (a) methane viscosity, (b) nitrogen viscosity, and (c) gas mixture viscosity.

Development of machine learning (ML) models

In pursuit of accurate gas viscosity predictions crucial for optimizing petroleum processes, we undertook a comprehensive investigation employing a diverse set of machine learning models. These models were chosen for their ability to capture intricate nonlinear relationships and patterns within the available complex datasets. In the development of the current model, 1664, 1672, and 968 data points were collected to model the viscosities of CH4, N2, and natural gas mixture respectively. The collected database was divided randomly into three subgroups as follows4.

  1. i.

    Training set is used to control the weights and biases of the network to reduce training errors and identify the best predictive model. Once the difference between the model targets and outputs is minimized, the model adjusts biases and weights to generate the proper networks between the input and output.

  2. ii.

    Validation set to prevent the likelihood of overfitting in training by screening the generalization of the process, where the inputs and target were normalized in the range of [1, + 1] by applying the subsequent formula37:

    $$p_{n} = \frac{{2(p - p_{\min } )}}{{(p_{\max } - p_{\min } )}} - 1$$
    (8)
  3. iii.

    Testing set An invisible, independent set is used to evaluate the trained network model’s performance.

Linear regression

Linear Regression, a fundamental regression method, was applied to capture the potential linear relationship between the inputs and the target variables. Despite its simplicity, Linear Regression exhibited satisfactory results in predicting gas viscosity as displayed in Figs. 7, 8, 9, 10, 11, and 12, achieving a respectable R2 score of 0.9437 and 0.9331 on the testing set for methane and natural gas mixtures respectively. The model’s performance, while commendable, indicates the potential presence of nonlinear relationships that could be better captured by more complex models. However, Linear Regression’s performance varied significantly when applied to Nitrogen gas (Figs. 9 and 10). In this case, the model exhibited a notable drop in performance, achieving a very poor R2 score of 0.0736. This discrepancy in model performance suggests that Nitrogen’s viscosity is influenced by factors that Linear Regression, primarily designed for linear relationships, might not effectively capture. The presence of complex nonlinear interactions in the Nitrogen dataset could explain this divergence. Consequently, this underscores the importance of selecting models that are well-suited to the underlying patterns within specific datasets. In the case of Nitrogen, more advanced modeling techniques are required to accurately represent the intricate relationships impacting gas viscosity.

Figure 7
figure 7

Cross plot for actual versus predicted values of methane viscosity (linear regression model).

Figure 8
figure 8

Error distribution for linear regression model (methane case).

Figure 9
figure 9

Cross plots of actual versus predicted values for nitrogen viscosity (linear regression model).

Figure 10
figure 10

Error distribution with Q-Q plot for linear regression model (nitrogen case).

Figure 11
figure 11

Cross plots of actual versus predicted values for natural gas mixtures viscosity (linear regression model).

Figure 12
figure 12

Error distribution with Q-Q plot for linear regression model (natural gas mixtures case).

Decision tree (DT)

Decision Trees offer a powerful framework for capturing intricate interactions within the data. Our Decision Tree model displayed impressive performance on the testing set as displayed in Figs. 13, 14, 15, 16, 17, and 18, achieving an R2 score of 0.9962, 0.9844, and 0.9892 for methane, nitrogen, and natural gas mixtures respectively. This suggests that the inherent decision boundaries learned by the model align well with the data’s underlying patterns, allowing it to achieve accurate predictions even for challenging conditions.

Figure 13
figure 13

Cross plots of actual versus predicted values for methane viscosity (decision tree model).

Figure 14
figure 14

Error distribution for decision tree model (methane case).

Figure 15
figure 15

Cross plots of actual versus predicted values for Nitrogen viscosity (decision tree model).

Figure 16
figure 16

Error distribution for decision tree model (nitrogen case).

Figure 17
figure 17

Cross plots of actual versus predicted values for natural gas mixtures viscosity (decision tree model).

Figure 18
figure 18

Error distribution for decision tree model (natural gas mixtures case).

Random forest

Ensemble techniques, like Random Forest, aim to alleviate overfitting and enhance predictive precision by gathering multiple decision trees. The Random Forest model delivered exceptional results with an R2 score of 0.9981, 0.9951, and 0.9941 for CH4, N2, and natural gas mixtures respectively on the testing set as displayed in Fig. 19, 20, 21, 22, 23, and 24. The ensemble’s ability to capture diverse feature interactions and provide robust predictions in the presence of noise and outliers contributed to its superior performance.

Figure 19
figure 19

Cross plots of actual versus predicted values for Methane viscosity (random forest model).

Figure 20
figure 20

Error distribution for random forest model (methane case).

Figure 21
figure 21

Cross plots of actual versus predicted values for Nitrogen viscosity (random forest model).

Figure 22
figure 22

Error distribution for random forest model (nitrogen case).

Figure 23
figure 23

Cross plots of actual versus predicted values for natural gas mixtures viscosity (random forest model).

Figure 24
figure 24

Error distribution for random forest model (natural gas mixtures case).

Gradient boosting (GB)

Gradient Boosting is another sequential ensemble technique that iteratively improves predictions by focusing on samples that the previous iteration misclassified. GB adapts by adjusting subsequent models to correct for errors, culminating in a strong collective predictor. The Gradient Boosting model displayed the best performance, yielding an R2 score of 0.9988, 0.9969, and 0.9918 for methane, natural gas mixtures, and nitrogen, respectively on the testing set as displayed in Figs. 25, 26, 27, 28, 29, and 30. The iterative nature of boosting allows the model to focus on challenging instances, enhancing its predictive capabilities. A sequential ensemble technique that iteratively improves predictions by focusing on samples that previous iterations misclassified. GB adapts by adjusting subsequent models to correct for errors, culminating in a strong collective predictor.

Figure 25
figure 25

Cross plots of actual versus predicted values for methane viscosity (gradient boosting model).

Figure 26
figure 26

Error distribution for gradient boosting model (methane case).

Figure 27
figure 27

Cross plots of actual versus predicted values for nitrogen viscosity (gradient boosting model).

Figure 28
figure 28

Error distribution for gradient boosting model (nitrogen case).

Figure 29
figure 29

Cross plots of actual versus predicted values for natural gas mixtures viscosity (gradient boosting model).

Figure 30
figure 30

Error distribution for gradient boosting model (natural gas mixtures case).

Nu support vector regression (NuSVR)

NuSVR, a variant of Support Vector Regression, is adept at capturing complex nonlinear relationships. The NuSVR model achieved a respectable R2 score of 0.9926 for methane on the testing set. While not as high as some other models, NuSVR still demonstrated its ability to capture intricate patterns in the data and provide accurate predictions under various conditions. However, the performance of NuSVR did exhibit some degree of variability, most notably evident in its R2 scores of 0.8855 for natural gas mixtures and 0.0146 for nitrogen as displayed in Figs. 31, 32, 33, 34, 35, and 36. The relatively high R2 score for natural gas mixtures underscores NuSVR’s capacity to grasp the underlying intricacies within the data and make precise predictions. A score of 0.8855 implies that the model accounted for 88.55% of the variance in the natural gas mixtures’ viscosity, indicating a substantial level of accuracy. It is noteworthy that the dataset for natural gas mixtures could involve more predictable patterns or perhaps align more closely with the underlying assumptions of the NuSVR model. Conversely, the notably lower R2 score of 0.0146 for nitrogen suggests a diminished ability to capture the complex relationships specific to nitrogen’s viscosity, accounting for only 1.46% of the variance. This discrepancy is indicative of the model’s sensitivity to variations in data patterns and underscores the importance of understanding the specific characteristics of the dataset. In the case of nitrogen, the dataset may comprise more intricate, non-linear relationships that may challenge the model’s predictive capabilities. Overall, NuSVR’s performance emphasizes its adaptability, demonstrating a commendable ability to capture complex data patterns and provide precise predictions. However, the variation in its performance across different datasets underscores the model’s sensitivity to the specific characteristics of the data it encounters. This nuanced understanding can inform future applications of NuSVR in scenarios with distinct data profiles.

Figure 31
figure 31

Cross plots of actual versus predicted values for methane viscosity (NuSVR model).

Figure 32
figure 32

Error distribution for NuSVR model (methane case).

Figure 33
figure 33

Cross plots of actual versus predicted values for nitrogen viscosity (NuSVR model).

Figure 34
figure 34

Error distribution for NuSVR model (nitrogen case).

Figure 35
figure 35

Cross plots of actual versus predicted values for natural gas mixtures viscosity (NuSVR Model).

Figure 36
figure 36

Error distribution for NuSVR model (natural gas mixtures case).

K-nearest neighbours (KNN)

K-Nearest Neighbours leverages the proximity of data points to make predictions. Despite its simplicity, KNN yielded competitive results with an R2 score of 0.9837 for methane on the testing set. However, its performance fell slightly below that of more sophisticated models, potentially due to the inability to capture intricate nonlinear relationships. This is evident in its R2 scores of 0.8408 and 0.7901 for natural gas mixtures and nitrogen, respectively as displayed in Figs. 37, 38, 39, 40, 41, and 42. These findings underscore KNN’s proficiency in various scenarios, but its performance may vary based on the complexity of the dataset.

Figure 37
figure 37

Cross plots of actual versus predicted values for methane viscosity (KNN model).

Figure 38
figure 38

Error distribution for KNN model (methane case).

Figure 39
figure 39

Cross plots of actual versus predicted values for Nitrogen viscosity (KNN model).

Figure 40
figure 40

Error distribution for KNN model (nitrogen case).

Figure 41
figure 41

Cross plots of actual versus predicted values for natural gas mixtures viscosity (KNN model).

Figure 42
figure 42

Error distribution for KNN model (natural gas mixtures case).

Considering the outcomes of our extensive evaluation using Coefficient of Determination (R2), Mean Squared Error (MSE), and Mean Absolute Error (MAE) reported in Table 2, the ensemble techniques, Random Forest and Gradient Boosting, emerged as the most promising models for accurately predicting gas viscosity across varying conditions. The ensemble nature of these methods contributed to their robustness and adaptability to the complex interactions present in the data. These findings underscore the significance of leveraging sophisticated machine learning approaches to enhance the precision of viscosity predictions, thereby facilitating the optimization of gas processing and reservoir management in the petroleum industry.

Comparison of the performance of machine learning techniques (non-ANN models)

In this section, the performance of the RF, DT, Gradient Boosting, KNN, NuSVR, and linear regression were compared. Table 6 lists a summary of calculability efforts (such as CPU time and memories). It is shown that the proposed ML saves time. Moreover, Table 7 shows the tuned hyperparameters of each method in terms of the computational process of the developed model. The compared performance of RF, DT, Gradient Boosting, (KNN), and (NuSVR) was stated through cross plots and error distribution curves. Figure 43a displays the testing cross plots of methane using the mentioned ML techniques with their corresponding R2 values. The error performance is shown in Fig. 43b,c. The performance of Gradient Boosting and Random Forest was excellent during the testing process of the model. The Gradient Boosting and Random Forest methods resulted in R2 values of 0.993, and 0.991 during testing phases, respectively. The Decision Tree and NuSVR also resulted in superior performance with R2 values of 0.986 and 0.987 during testing. The KNN and Linear Regression resulted in good prediction, but they had the lowest performance with R2 values of 0.945 and 0.943 during the testing process. Similarly, Fig. 44a displays the testing cross plots of nitrogen using the mentioned ML techniques with their corresponding R2 values. The error performance was presented in Fig. 44b,c. The performance of Random Forest, Gradient Boosting, and Decision Tree was excellent during the testing process of the model. Moreover, the Random Forest, Gradient Boosting, and Decision tree resulted in R2 values of 0.995, 0.991, and 0.989 during the testing phases, respectively. The KNN resulted in intermediate performance with R2 values of 0.79 during testing. The NuSVR and Linear Regression resulted in very poor prediction with R2 values of 0.0146 and 0.0736 during the testing process. Likewise, the performance of the gas mixture was compared using the mentioned ML techniques. Figure 45a displays the testing cross plots of the gas mixture using the mentioned ML techniques with their corresponding R2 values. The error performance was presented in Fig. 45b,c for the prediction of the gas mixture viscosity. The performance of Random Forest, Gradient Boosting, and Decision Tree were excellent during the testing process of the model. The Random Forest, Gradient Boosting and Decision tree resulted in R2 values of 0.994, 0.996, and 0.984 during the testing phases, respectively. The KNN resulted in fair performance with R2 values of 0.79 during testing. The Linear Regression, NuSVR, and KNN resulted in the lowest performance with R2 values of 0.933, 0.885, and 0.84 during the testing process.

Table 6 Summary of run time and memory usage for machine learning models used in this study.
Table 7 Optimized hyperparameters for machine learning models used in this study.
Figure 43
figure 43

Prediction performance of the ML techniques (non-ANN) for viscosity prediction of methane: (a) cross plot, (b) residual error distribution, and (c) cumulative error frequency.

Figure 44
figure 44

Prediction performance of the ML techniques (non-ANN) for viscosity prediction of Nitrogen: (a) cross plot, (b) residual error distribution, and (c) cumulative error frequency.

Figure 45
figure 45

Prediction performance of the ML techniques (non-ANN) for viscosity prediction of Gas mixture: (a) cross plot, (b) residual error distribution, and (c) cumulative error frequency.

Development of ANN models

An artificial neural network is a brain-like system that performs computational models on numeric inputs which are considered interconnected nodes (neurons) and generates a single complex functional output relevant to the inputs2,31. ANN is considered an artificial intelligence technique used in numerous science and technology fields that comprises learning, storing, and information recalling depending on the training dataset11. Intelligent algorithms such as (ANN), (PN), (GA), (GRN), and (GMDH)2 ease the complexity of computational calculations by altering complex formulations to simple ones3. One type of neural network is the (MLF), in which neurons of hidden layers are determined, while the (MFFNN) is the most used neural network for function approximation systems9. A three-layered MFFNN encompasses input, hidden, and one output layer displayed in Fig. 46 used to approximate complex functions37. The characteristics of the ANN model for modeling the viscosities of CH4, N2, and natural gas mixture, as well as the number of training, validation, and testing datasets for each model, are summarized in Table 8. To select the optimum model that represents the collected data, many trials have been done by changing the transfer function (Logistic-sigmoid/Tan-sigmoid), several neurons of the hidden layer (5, 6, 7, 8, 9, 10), and several hidden layers using the Levenberg–Marquardt technique as a training algorithm as discussed in the following sections. LM algorithms are one of the most popular MLPNN modeling which uses the nonlinear least-squares techniques1.

Figure 46
figure 46

Three-layered ANN model consists of (input, hidden, and output layers) for prediction viscosity of (a) methane, (b) nitrogen, (c) gas mixture.

Table 8 The ANN model’s characteristics of CH4, N2, and natural gas mixture viscosities.

Development of ANN model for calculating methane viscosity

A logistic sigmoid with eight neurons in the hidden layer is the optimum transfer function. The selected optimum model predicts the methane viscosity with high precision. This model achieves R2 = 0.99918, a standard deviation of 1.47, RMSE of 0.00038, MPE of − 0.01%, and MAPE of 1.10% as presented in Table 9. The complete description of the model as described in Table 8 indicates that the proposed model consists of 3 layers. The 1st layer has two neurons for the inputs (pressure and temperature), the hidden layer has 8-neurons, and the third layer has one neuron for the output (methane viscosity).

Table 9 Statistics for selecting the optimum ANN model of methane viscosity.

The proposed ANN model for methane viscosity calculation can be illustrated as follows:

For i = 1 to none of the neurons and for j = 1 to none of the inputs, the hidden layer inputs are computed using the following formula:

$$S_{i,j} = \sum\limits_{j = 1}^{N} {(w_{i,j} x_{j} } ) + b_{j}$$
(9)

where \(\text{represents}\) the normalized pressure and temperature respectively which can be formulated as:

$$P_{n} = 0.00008P - 1.001176$$
(10)
$$T_{n} = 0.004938T - 1.3802475$$
(11)

The viscosity of methane can be formulated as follows:

$$\mu = 0.055806\left[ {\sum\limits_{i = 1}^{N} {\left( {\frac{{w_{hoi} }}{{1 + \exp \left( { - S_{i} } \right)}}} \right) + b_{ho} } } \right] + 0.065545$$
(12)

The model weights and bias coefficients between the layers are explained in Table 10.

Table 10 Model coefficients between the layers for methane viscosity modeling.

Development of ANN model for calculating nitrogen viscosity

The optimum model includes two hidden layers with ten neurons for each and the transfer function is Logistic sigmoid. The selected optimum model predicts the nitrogen viscosity with high accuracy. This model achieves R2 = 0.997, a standard deviation of 35.05, RMSE of 0.0025, MPE of − 1.08%, and MAPE of 3.22% as presented in Table 11. The complete description of the model as presented in Table 8 includes that the proposed model consists of four layers. The first layer has two neurons for the inputs (pressure and temperature), two hidden layers have ten neurons for each, and the third layer has one neuron for the output (nitrogen viscosity).

Table 11 Statistics for selecting the optimum ANN model of nitrogen viscosity.

The proposed ANN model for calculating the nitrogen viscosity can be illustrated as develops. The inputs for the first hidden layer are calculated, where the normalized pressure and temperature can be expressed as:

$$P_{n} = 0.000081632P - 1.001184$$
(13)
$$T_{n} = 0.0008997T - 0.691408$$
(14)

The outputs from the first hidden layer are calculated using the Logistic-sigmoid function as follows:

$$H_{i} = \frac{1}{{1 + \exp \left( { - S_{i} } \right)}}$$
(15)

The input of the second hidden layer is calculated using the following expression:

$$SS_{i,j} = \sum\limits_{j = 1}^{N} {\left( {w_{i,j} H_{j} } \right)} + b_{j}$$
(16)

The outputs from the second hidden layer are computed as follows:

$$HH_{j} = \frac{1}{{1 + \exp \left( { - SS_{j} } \right)}}$$
(17)

The nitrogen viscosity can be formulated as:

$$\mu_{N2} = 0.19294\left[ {\sum\limits_{i = 1}^{N} {\left( {HH_{j} w_{hoij} } \right) + b_{ho} } } \right] + 0.19846$$
(18)

The ANN model weights and biases between the layers are explained in Tables 12, 13, and 14.

Table 12 Model coefficients between the layers between the input and the first hidden layer.
Table 13 Model coefficients between the first and the second hidden layers.
Table 14 Model coefficients between the second hidden layer and the output layer.

Development of ANN model for calculating gas mixture viscosity

The optimum model includes one hidden layer with ten neurons besides the input and output layers, and the transfer function is Logistic Sigmoid. The selected optimum model predicts the gas mixture viscosity with high accuracy. This model achieves R2 = 0.9998, a standard deviation of 1.75, RMSE of 0.00029, MPE of − 0.03%, and MAPE of 1.26. The complete description of the model as presented in Table 8 includes that the proposed model consists of three layers. The first layer has three neurons for the inputs (pressure, temperature, and gas density), one hidden layer has ten neurons, and the third layer has one neuron for the output (gas mixture viscosity). The proposed ANN model for calculating the gas mixture viscosity can be designated as follows. The inputs for the first hidden layer are calculated using Eq. (9). The normalized pressure, temperature, and gas density can be expressed as:

$$P_{n} = 0.00008P - 1.001176$$
(19)
$$T_{n} = 0.005692T - 1.048947$$
(20)
$$\rho_{n} = 0.166473\rho - 1.000385$$
(21)

The viscosity of the gas mixture can be calculated by the following function:

$$\mu = 0.055806\left[ {\sum\limits_{i = 1}^{N} {\left( {\frac{{w_{hoi} }}{{1 + \exp \left( { - S_{i} } \right)}}} \right) + b_{ho} } } \right] + 0.065545$$
(22)

The weights and biases between the layers are explained in Tables 15 and 16.

Table 15 Weights and biases between the input layer and the hidden layer.
Table 16 Weights and biases between the second hidden layer and the output layer.

Figures 47, 48, and 49 show the outputs regression plots for training, validation, testing, and all datasets for the methane, nitrogen, and gas mixture viscosity model respectively. When the ANN outcomes match the targets, the data points should be close to the line of the unit slope for an optimal fit. In this work, the estimated values versus the experimental one fit well closely around the 45° lines for all data sets, including training, validation, and testing data which implies the strength of the proposed model1. The coefficient of determination values is more than 0.999, which demonstrates an outstanding arrangement between the predicted and measured values for the three proposed models37.

Figure 47
figure 47

Cross plots of methane viscosity model (after this work).

Figure 48
figure 48

Cross plots of nitrogen viscosity model (after this work).

Figure 49
figure 49

Cross plots of gas mixture viscosity model (after this work).

Furthermore, Fig. 50a–c shows the relative errors between the developed model’s outputs and the consistent experimental values for the training, validation, and testing sets. Since there is no discernible error trend in the predictions, error distribution curves show that the suggested models are capable of accurately estimating natural gas viscosity over a broad range of P, and T conditions3. The accuracy of the developed model is indicated by a denser cloud of data points surrounding the zero-error line. Additionally, at high P, and T conditions, these curves show that the suggested model does not exhibit a significant error trend3. Furthermore, Fig. 51 illustrates the cumulative frequency of the ARE related to the developed ANN models. According to the figure, more than 90% of the estimated viscosity values for methane, nitrogen, and gas mixture by the ANN model had an absolute relative error of less than 2%. Hence, these results demonstrate that ANN models had acceptable performance and could successfully estimate the gas viscosity for CH4, N2, and natural gas mixture.

Figure 50
figure 50

Relative error distribution for (a) methane viscosity predicted by ANN model, (b) nitrogen viscosity predicted by ANN model, (c) gas mixture viscosity predicted by ANN model.

Figure 51
figure 51

Cumulative frequency versus ARE% for the developed ANN models.

The potential advantages and limitations of the present study

The present study potentially presents faster and more effective techniques for predicting the viscosities of methane, nitrogen, and natural gas mixtures under normal and harsh conditions, compared to traditional experimental approaches. The developed ML models can certainly be employed for new data, granting the potential to predict viscosities for various mixtures and conditions without the need for further experimental measurements. The established ML models in the present study can also serve as a foundation for further improvement of the predictive models. As more data becomes available, their accuracy and reliability can be enhanced. It is important to add that the proposed ML algorithms can be considered as a substitution of the experimental measurements of PVT properties if the experimental parameters are within the considered ranges. From a limitation point of view, the developed Machine learning models would be better at interpolating within the range of the training data rather than extrapolating beyond it. Therefore, the accuracy of the predictions may decrease when applied to conditions significantly different from those present in the training data.

Conclusion

Mathematical models based on machine learning models were used to predict the viscosities of methane, nitrogen, and gas mixtures at high temperatures and high-pressure conditions. Moreover, for effective gas viscosity prediction, several optimization tools including statistical and ML approaches were used. Afterward, several verification principles were implemented to evaluate the developed models. The ML models suggested in this research hold significant importance for researchers and industry specialists involved in natural gas simulation, processing, and recovery. In general, the following conclusion can be drawn:

  • The proposed ML models in this work effectively addressed the predicting gas viscosity under high-temperature and high-pressure circumstances.

  • A database consisting of 4304 sets of experimental data was collected from the literature before the development of the ML models.

  • Three novel correlations have also been developed for CH4, N2, and natural gas viscosities using ANN. The measured and ANN-estimated viscosity values showed excellent agreement with R2 = 0.99 for testing data sets.

  • The performances of the ANN, RF, and Gradient boosting were outstanding prediction modeling during the training and testing phases of CH4, N2 gas, and gas mixture viscosities.

  • The ANN modeling resulted in R2 values of 0.99 and 0.99 for the training and testing of the CH4, N2 gas, and gas mixture models, respectively.

  • Linear regression, NuSVR, and KNN were the least performance ML models for predicting CH4, N2 gas, and gas mixture viscosity models, whereas Linear regression and NuSVR models were unable to predict N2 gas viscosity.

  • The inclusive efficacy of the ML models, ordered from high ranking to lowest ranking, was ANN > Gradient Boosting > RF > DT > KNN > NuSVR > Linear Regression, suggesting that ANN, Gradient Boosting, RF, and DT perform better than KNN, NuSVR, and Linear Regression.

  • The study outcomes indicate that data-based ML could be a beneficial model for precisely predicting the viscosity of CH4, N2 gas, and gas mixture under normal and hard conditions, gaining enhancement in the modeling of natural gas operations.