Introduction

The stability of crude oil flow is often disrupted due to the presence of asphaltene, a high molecular weight solid component of crude oil weight (Du et al. 2018; Setoodeh et al. 2018b). The presence of asphaltene often caused a series of challenges in various stages of crude oil processing (Setoodeh et al. 2018a). One of the serious challenges is the difficulties in transporting crude oil by pipeline over long distances under a variety of climatic circumstances (Sivakumar et al. 2018). As a result of this. the presence of asphaltene in crude can lead to significant operational and production issues (Chala et al. 2018). Hence, serious research efforts have been made to investigate its effects and possibilities of mitigating its effects (Khaleel et al. 2020; Enayat et al. 2021). Asphaltene deposition after CO2 injection in conventional oil reservoirs has been the subject of several investigations (Enayat et al. 2021). Studies have shown that an increase in pressure below CO2 minimum miscibility pressure (MMP) causes asphaltene deposition in sandstone to increase, while above CO2 MMP an increase in pressure causes asphaltene deposition to decrease (Soroush et al. 2014). In a similar study, cycle gas injection was used by Shen and Sheng (2018) to study asphaltene formation in the Eagle Ford shale. The asphaltene precipitation and deposition were studied using filters of 30 nm, 100 nm, and 200 nm. The finding revealed that there was a substantial permeability degradation due to the presence of asphaltene.

Different approaches have been investigated to improve crude oil stability by mitigating the effect of asphaltene (Fakher et al. 2020). One of such strategy is the use of an inhibitor to prevent or delay asphaltene precipitation in crude oil (Mohammadi et al. 2011). Studies have shown that when asphaltene particles in oil are exposed to TiO2 nanofluids in static circumstances, it was found that the asphaltenes were shown to be more stable when exposed to TiO2 in organic-based nanofluids in an acidic environment (Mohammadi and Richon 2007; Ghasemi et al. 2020, 2021). In a further study, Mohammadi et al. (2011) explored the use of TiO2, SiO2, and ZrO2-based nanofluids as inhibitors to asphaltene precipitation in crude oil. The study revealed that n-Heptane was shown to be much more effective in destabilizing colloidal asphaltene in the presence of TiO2 nanofluids at pH values below 4. The mechanistic, modeling and optimization of TiO2/SiO2 nanofluids as an inhibitor for the stability of asphaltene particles in crude oil has been reported by Mohammadi et al. (2017). The results show that there is a strong correlation between asphaltene adsorption on the surface of the particles and the onset point of the nanocomposites made of 80% TiO2.

For efficient stabilization of the asphaltene in the crude oil, it is important to study the relationship between the pH of the medium, the composition of the nanoparticles in the nanofluid, the salinity, and the organic solvents using machine learning algorithms such as Gaussian Process Regression (GPR). The GPR is a nonparametric machine learning algorithm that employs Bayesian method to regression (Gao et al. 2018; Ebrahimi et al. 2021). The GPR has the capacity to operate effectively with limited datasets and provide uncertainty assessments on the predictions of the targeted output (Chen et al. 2018). The GPR is a robust machine learning tool that has been used to model several processes (Taki et al. 2016; Liu et al. 2020; Van Dao et al. 2020). The predictive capability of the GPR has been demonstrated in modeling the prediction of future capacity and the remaining useful life of batteries (Richardson and Leachman 2017). Shen and Sheng (2018) revealed the robustness of employing GPR in modeling short-term the prediction of solar power to curb its volatile nature and minimize the level of uncertainty in power systems. Similarly, GPR has also been used to model the relationship between capacity, storage temperature, and state-of-charge (Liu et al. 2020). The GPR model displayed a robust prediction performance of high accuracy and accurate generalization ability (Mageed et al. 2020; Shnain et al. 2022). Extensive literature search shows that the use of GPR for modeling the relationship between parameters such as the pH, salinity, and the ratio of TiO2/SiO2 nanoparticle in the nanofluids and the amount of n-heptane in crude oil has not been reported. This study, therefore, focuses on the application of GPR to model the effect of pH, salinity, and the ratio of TiO2/SiO2 nanoparticles in the nanofluids on the amount of n-heptane in stabilizing crude oil flow.

Experimental and model configuration

The onset of asphaltene flocculation may be measured using a variety of techniques that have been developed (Mohammadi et al. 2017). A detailed description of the materials and the experimental procedure has been reported by Mohammadi et al. (2017). The flocculation onset of crude oils may be determined by introducing a precipitating agent, generally a straight-chain alkane, such as n-heptane, to the crude oil. When asphaltenes are flocculated out of the crude oil medium, the onset point is defined as the minimal quantity of the precipitant agent. As reported by Mohammadi et al. (2017), the n-heptane titration technique was used to determine the flocculation points of crude oils. The dataset for the modeling consists of the TiO2/SiO2 nanocomposite composition in the nanofluid, the salinity, the pH, and the maximum volume of the n-heptane added to crude oil. The set of data was generated using a central composite experimental design. The detailed descriptive statistics of the parameters used for the modeling are summarized in Table 1. These parameters include the range, the minimum, and maximum values, the mean, standard deviation, and the variance.

Table 1 Descriptive statistics of the parameters used for the modeling

The choice of GPR as a sort of Bayesian non-parametric machine learning approach used in this study can be attributed to its tendency to learn complicated systems while dealing with uncertainty in a systematic manner (Alghamdi et al. 2020; Liu et al. 2020). The GPR is structured in such a way that the regression learner acquires knowledge of distribution over functions by acquiring knowledge of the mean and covariance functions at x, represented by f (x) in Eq. (1). The GPR hyperparameters of length-scale and marginal standard deviation help to control the frequency of functions and the magnitude of the range of function, respectively.

$$f\left( x \right) = GP \left( {m \left( x \right), k x,x^{\prime}} \right)$$
(1)

where m depicts the mean function, k is the kernel functions, x is the matric of the training features,

The training dataset which could consist of a set of N data is presented in Eq. (2),

$$\beta_{{{\text{train}}}} = \left( {X, y} \right) = \left\{ {x_{i} , y_{i} } \right\}_{i = 1, }^{N} x_{i} \in \emptyset^{d} , y \in \emptyset$$
(2)

While the test dataset which could consist of a set of N data is represented in Eq. (3)

$$\beta_{{{\text{test}}}} = X_{*} = \left\{ {x_{*, i} } \right\}_{i = 1, }^{N} x_{*, i} \in \emptyset^{d}$$
(3)

where \({X}_{*}\) depicts the matric text points, y is the vector training target, X is the feature (s).

The detailed steps involved in configuring the GPR model are represented in Fig. 1. These steps include data acquisition through the experimental design, data pre-processing, data uploading into the regression learning platform in MATLAB (version 2019a, MathWorks Inc.) environment, modeling configuration by setting up the parameters, training of the model, performance analysis using various matrices such as coefficient determination (R2), mean standard error (MSE), root mean square error (RMSE) and mean absolute error (MAE). Once the performance of the GPR model is satisfactory, the model is deployed for modeling the prediction of the appropriate amount of n-heptane require to stabilize the crude oil flow. A typical topology of the GPR network is depicted in Fig. 2. It consists of a set of predictors (x1x3) and the targeted output (y1y2). Each of the predicted is linked with the hidden layer using hidden neurons. The hidden neurons are assigned a weight (w) and bias (b). The predictors consist of the ratio of TiO2/SiO2 nanoparticles, salinity, and pH while the targeted output consists of the amount of the n-heptane added to stabilize the crude oil. A total of 17 experimental runs were employed to train and test the models.

Fig. 1
figure 1

Stages involved in the configuration of the Gaussian process regression model

Fig. 2
figure 2

The topology of the Gaussian process regression model showing the weight (w) associated with the input units

Results and discussion

The effect of the interaction of parameters with one another is depicted in Fig. 3. The interaction effect of salinity and the ratio of TiO2/SiO2 nanoparticles in the nanofluid on the amount of the n-heptane added to stabilize the crude oil is depicted in Fig. 3a. Both salinity and the ratio of TiO2/SiO2 nanoparticles in the organic solvent influence the amount of n-heptane added to stabilize the crude oil. The effect of the salinity on the amount of n-heptane added to stabilize the crude oil was significant until 6 wt% and thereafter a decline in the n-heptane added to stabilize the crude oil was observed as the salinity increases. This implies that salinity up to 6wt% is suitable for the right amount of n-heptane needed to stabilize the crude oil. On the other hand, the increase in the ratio of TiO2/SiO2 nanoparticle in the nanofluid significantly influences the amount of n-heptane require to stabilize the crude oil which is also similar to the trend observed in Fig. 3b for the interaction between the pH and the ratio of TiO2/SiO2 nanoparticle in the nanofluid. An increasing effect on the amount of n-heptane needed to stabilize the crude oil is noticeable at a pH range of 1–7. Thereafter, a declining effect was observed at a pH greater than 7. The interaction between pH and salinity depicted in Fig. 3c does not have a regular pattern as shown in Figs. 3a, c. Both pH and salinity have undulating effects on the amount of n-heptane required to stabilize the crude oil from the effect of asphaltene. Modeling the stability of crude oil from the effect of asphaltene requires detailed knowledge of are the various parameters are interrelated with the targeted outputs. The interaction effect of different nanoparticles such as SiO2, MgO, and Al2O3 for the adsorption of asphaltene has been reported by Madhi et al. (2017). The study revealed that the SiO2, MgO, and Al2O3 nanoparticles have a higher affinity toward asphaltenes. Hence, the injection of such nanoparticles in porous media would significantly improve oil recovery.

Fig. 3
figure 3

Effect of a salinity and TiO2/SiO2 nanoparticle ratio b pH and salinity and TiO2/SiO2 nanoparticle ratio c pH and salinity of the n-heptane volume added to the crude oil

In this study, five different models namely squared exponential GPR (model 1), rotational quadratic GPR (model 2), Matern 5/2 GPR (model 3), exponential GPR (model 4), and optimizable GPR (mode5). The performance of the various GPR models used in this study as a function of the various statistical analysis is summarized in Table 2 and depicted in Figs. 4, 5, 6, 7, 8, 9. Figure 4 depicts the dispersion and the regression plots obtained from the prediction of the amount of n-heptane required to stabilize the effect of asphaltene using model 1. The plot in Fig. 4a shows that the observed and the predicted values are to a large extent correlated. Using the squared exponential kernel for the GPR model resulted in the prediction of the amount of n-heptane required to stabilize the crude oil resulting in R2 of 0.820 (Fig. 4b). This implies that 82% of the datasets can be explained and generalized using the GPR model with a squared exponential kernel. The analysis of the errors shows that RMSE, MSE, MAE values of 0.047, 2.27 × 10–2, and 3.97 × 10–2, respectively were obtained for the prediction of the amount of n-heptane added to the crude oil. The performance of the GPR model using the squared exponential kernel reported in this study is consistent with that reported by Van Dao et al. (2020) for high-performance concrete compressive strength prediction. An R2 of 0.882 with RMSE, and MAE of 5.736, and 4.143, respectively. Also, the use of a squared exponential kernel for GPR in the prediction of fragmentation in surface mines has been reported by Gao et al. (2018). The model resulted in robust prediction with an R2 of 0.948 and a prediction error of 2.02 based on the RMSE.

Table 2 Performance analysis of the various GPR models used in this study
Fig. 4
figure 4

The performance of GPR with exponential square kernel based on the a dispersion plots b regression plot

Fig. 5
figure 5

The performance of GPR with rotational quadratic kernel based on the a dispersion plots b regression plot

Fig. 6
figure 6

The performance of GPR with Matern 5/2 kernel based on the a dispersion plots b regression plot

Fig. 7
figure 7

The performance of GPR with exponential kernel based on the a dispersion plots b regression plot

Fig. 8
figure 8

The minimum MSE obtained for the optimized GPR for the various iteration

Fig. 9
figure 9

The performance of optimizable GPR based on the a dispersion plots b regression plot

The performance of GPR with rotational quadratic kernel in predicting the amount of n-heptane required to stabilize the crude oil based on dispersion and regression plot is shown in Fig. 5. Figure 5a shows that the observed and the predicted values are closely related. This may be validated further from Fig. 5b resulting in an R2 of 0.990. This implies that the rotational quadratic kernel enhances the performance of the GPR to be able to learn the relationship between input parameters and the targeted output. The robustness of the GPR with the rotational quadratic kernel can also be confirmed from the error analysis as indicated with low values of RMSE, MSE, and MAE shown in Table 2. The performance of the rotational quadratic kernel-based GPR model used in this study is consistent with those reported in the literature. Gao et al. (2018) reported an R2 of 0.936 for the prediction of rock fragmentation in surface mines with an RMSE of 2.18. The predictive modeling of total acid and basic number in engine oils using Rotational Quadratic GPR has been reported by Macián et al. (2021). The excellent performance of the model was proven with an R2 of 0.983 and RMSE of 0.105. Similar to the performance of the Rotational Quadratic GPR, the GPR with Matern 5/2 kernel also displayed a robust performance as indicated by the dispersion and regression plots in Fig. 6. There is a proximity between the observed n-heptane added to the crude oil and the forecasted values as displayed in Fig. 6a. This shows that the Matern 5/2 kernel also influences the performance of the GPR mode in learning the relationship between the input parameters and the targeted output. The prediction accuracy of the model can be proven further in Fig. 6b with an R2 of 0.999. This is an indication that the GRP model with Matern 5/2 kernel generalized well using the datasets with RMSE, MSE, and MAE of 0.011, 1.43 × 10–4, and 9.15 × 10–2, respectively. The tendency of the Matern 5/2 GPR model to have a good prediction can be attributed to the ability of the spectral densities of the stationary kernel to create Fourier transforms of the input signals. The performance of the GRP model with the Matern 5/2 kernel used in this study is consistent with that reported in the literature. Alghamdi et al. (2020) employed Matern 5/2-based GPR for the prediction of non-invasive continuous blood pressure from cuff oscillometric signals with R2 of 0.700 and RMSE of 7.500. The Matern 5/2 GPR model used in this study had a better prediction compared to that reported by Alghamdi et al. (2020) probably due to the nature of the datasets. The prediction of combined terrestrial evapotranspiration index over Large River Basin Based using Matern 5/2 kernel for GPR has been reported by Elbeltagi et al. (2021). The prediction of the combined terrestrial evapotranspiration index over the Large River Basin Based using the Matern 5/2 GPR model resulted in R2 of 0.75 and RMSE of 0.39.

The performance of the GPR with the exponential kernel as a function of the dispersion and regression plots in predicting the amount of n-heptane required stabilized crude oil is depicted in Fig. 7. Similar to the GPR with exponential square and rotational quadratic kernels, the use of exponential kernels also enhances the performance of the GPR. Figure 7a revealed that the measured amount of n-heptane required to stabilize the crude oil strongly correlates with the predicted amount of n-heptane. The R2 of 0.999 estimated from Fig. 7b is an indication of the robustness of the exponential GPR to model the relationship between the input and output parameters. Hence, over 99% of the datasets are well-learned by the model and thereby have a good prediction. The robust performance of the exponential has been reported in the literature. The use of GPR with exponential kernel for the prediction of non-invasive continuous blood pressure from cuff oscillometric signals has been reported by Alghamdi et al. (2020). The R2 of 0.925 implies that the predicted output was strongly correlated with the observed values with RMSE of 4.883. Also, Amankwaa-Kyeremeh et al. (2021) reported a robust performance of exponential GPR used for the prediction of rougher copper recovery. An R2 of 0.981 and RMSE of 0.17 were obtained for the prediction of the cooper recovering which is an indication that the predicted values are strongly correlated with the observed values with minimum prediction error.

The GPR algorithm was optimized using a non-isotopic rotational quadratic kernel function as shown in Fig. 8. The optimized GPR was obtained after 5 iterations, the points at which the least MSE was obtained. The performance of the optimized GPR as a function of the dispersion and regression plots is presented in Fig. 9. Figure 9a shows that the observed amount of n-heptane added to stabilize the crude oil correlates significantly with the forecast values by the optimized GPR. This can be validated further from in Fig. 9b with an R2 of 0.999. Compared to the other GPR models, the optimized GPR model displayed the lowest MSE of 1.107 × 10–4. The residual analysis for the prediction of the amount of n-heptane added to stabilize the crude oil is depicted in Fig. 10. The analysis shows that residuals of \(\pm\) 0.08, \(\pm\) 0.025, \(\pm\) 0.013, \(\pm\) 0.015 and \(\pm \hspace{0.17em}0.008\), respectively for model 1, model 2, model 3, model 4, and model 5, respectively (Figs. 10a–e). This further confirms that model 5 which is the optimized GPR had the least residual, an indication of its superior performance compared to the other models. The sensitivity analysis to determine the influence of the input parameters on the predicted output by the optimized model using modified Garson algorithms is depicted in Fig. 11. The analysis shows that all the three input parameters significantly influence the predicted n-heptane amount added to stabilize the crude oil. However, the greatest influence on the predicted output was offered by the ratio of the TiO2/SiO2 nanoparticles in the nanofluids which is consistent with that reported in the literature.

Fig. 10
figure 10

The residual analysis for the prediction of the amount of n-heptane added to stabilize the crude oil using a model 1 b model 2 c model 3 d model 4 e model 5

Fig. 11
figure 11

The level of influence of the input parameters on the predicted output for the optimized model

Conclusion

This study examined the performance of Gaussian Process Regression for the prediction of the amount of n-heptane require to stabilize crude flow using different kernel functions such as in exponential square kernel (model 1), rotational quadratic (model 2), Matern 5/2 (model 3), exponential (model 4), and non-isotopic rotational quadratic (model 5). The GPR models were robust in modeling the non-linear relationship between the ratio of TiO2/SiO2 nanocomposite in the organic nanofluid, the salinity of the nanofluid, and pH of the solvent (n-heptane) added to stabilize the crude oil flow. As indicated by the R2 > 0.8 for all the models, the predicted n-heptane require to stabilize the crude oil flow is strongly correlated with the observed values. The various error analysis using MSE, RMSE, and MAE shows very small prediction errors using the various models. However, the optimized GPR had the best performance based on the least prediction error. The three parameters were observed to strongly influenced the model output according to the input analysis. The ratio of TiO2/SiO2 nanocomposite in the organic nanofluid had the most significant influence on the model output. The influence of the input parameters on the model output can be ranked as the ratio of TiO2/SiO2 nanocomposite in the organic nanofluid > salinity > pH. This study has demonstrated the importance of nanoparticles addition to nanofluid used to stabilized crude oil flow from a modeling perspective.