Introduction

The conventional PVT properties for the reservoir fluids black oil type are solution gas–oil ratio, Rs; oil formation volume factor, Bo; and gas formation volume factor, Bg. According to Fattah et al. (2009), the difference between the modified black oil (MBO) PVT properties and the conventional black oil PVT properties are in the handling of the liquid in the gas phase, the oil–gas ratio, Rv. The MBO approach supposes that the composition of stock tank liquid can exist in both of the liquid and gas phases at reservoir conditions. The liquid content of the gas phase was assumed to be defined as a function of pressure named vaporized oil–gas ratio, Rv. It is similar to the solution gas–oil ratio, Rs, which is used to describe how much gas is dissolved in the liquid phase. The MBO properties or extended black oil was introduced for the first time by Spivak and Dixon (1973).

Whitson and Torp (1983), formed a set of steps to estimate the MBO properties according to the constant volume depletion (CVD) test data for the gas condensate. Coats, 1985, introduced a different method for gas condensate reservoir using commercial software of EOS PVT and a regression package to fit the PVT data obtained in the laboratory. McVay (1994), expanded Coats method for the volatile oil reservoir. Walsh and Towler (1994) also introduced another method to compute the MBO properties from the CVD experiment data of gas condensate. Fevang et al. (2000), showed strategies to assist engineers in choosing either MBO or compositional approaches. In 2006, Fattah et al. presented a comparison between Whitson and Torp, and Coats methods using compositional simulation. Results show a good match between experimental PVT data and the EOS model. Also in 2009, Fattah et al. developed a new set of correlations to calculate the MBO properties of volatile oil and gas condensate. Alimadadi et al. (2011), predicted the PVT properties using ANN model. Component mole fraction of the fluid sample, solution gas/oil ratio, (Rs), bubble point pressure (Pb), reservoir pressure, API oil gravity, and temperature were used as input parameters. The model operated the input parameters using two parallel multilayer perceptron (MLP) networks before the results will be recombined.

Arief et al. (2017), proposed a technique of using surrogate models and the available laboratory database to estimate the fluid properties. Two surrogate models are studied in his work: universal kriging and NN.

González et al. (2003) developed NN models to estimate the dew point pressure for retrograde gas reservoirs. The model accuracy was reported to be 8.74% in prediction.

Osman and Al-Marhoun (2005) established ANN models to estimate several PVT properties; formation volume factor, isothermal compressibility and brine density as a function of temperature, salinity and pressure. Also they predict the brine viscosity as a function of brine salinity and temperature. Developing these models was achieved using 1040 data points.

Oloso et al. (2009) predicted the crude oil viscosity and gas–oil ratio using support vector machine and functional network.

Ahmadi et al. (2015) utilized the NN technique to model the bubble point pressure as a function of fluid composition and other reservoir parameters. They used NN back propagation along with particle swarm optimization algorithm as an optimization tool to minimize the error.

Sahterri et al. (2015) developed NN model to predict the gas compressibility factor Z-factor using data set of 978 points. Their model estimated the Z-factor with 2.3% average relative error using Wilcoxon generalized radial basis function network.

Adeeyo (2016) developed NN models to predict the bubble point pressure and formation volume factor at bubble point pressure for Nigerian crude oils. Trial and error technique was used to come up with the best number of neurons that gave stable results.

Artificial intelligent technique models

Artificial neural network

An artificial neural network (ANN) defined as data processing model analogous to biological nervous systems, such as the brain and data processing. The most important component of this model is the innovative structure system of processing the information. It consists of a big number of extremely consistent elements or neurons which are in a harmony work to solve definite problems. ANNs, can learn by example like people. The ANN can be constructed for a definite application, such as pattern recognition or classification of data, over a process of learning. This process in biological systems includes modifications to the synaptic connections which exist between the imaginary neurons.

For the last two decades, AI has been sued extensively in several applications in oil industry. A good number of studies have been done using various computational intelligence (CI) schemes to forecast the characteristics of gas and oil flow through reservoirs and pipes using such schemes including logistic regression (LR) (Hosmer and Lemeshow 2000), multilayer perceptrons (MLP) (Wlodzisław et. al. 1997), and radial basis function (RBF) (Guojie 2004).

Functional networks

Functional networks (FNs) are modification of neural networks which consist of many layers of neurons that are connected through links. A simple calculation is performed in each computing unit or neuron. A special scalar function f of a weighted sum of inputs is linked with the neurons and with the use of well-known algorithms learning from the used data is processed. The functional networks main idea involves the allowance of learning the f functions during suppressing the weights. In addition, these functions of multidimensional can be consistently substituted by single variables functions. With the multiple n links that move from the last layer of neurons into an output unit which can will be written through a number of different forms for each different link. This procedure will generate a system of n − 1 equations. This system can be written directly from the neural network topology. Solving such system of equations will simplify the initial functions f linked with the neurons. Castillo et al. (2001) provided an inclusive demonstration showing the application of FN in the field of engineering and statistics. It was, however, observed in the literature that not much has been applied in of oil industry field.

Support vector machines

Support vector machines (SVMs) are a group of connected supervised learning methods that have been applied for organization and regression as well. These set of methods follow a group of generalized linear classifiers. They can also be treated as a special case of Tikhonov regularization. SVMs correlate the input vectors to a higher dimensional space where a maximal braking hyperplane is made. Two parallel hyperplanes are produced on each side of the hyperplane that splits the data. During this process a certain assumption is assumed that the greater the distance between these parallel hyperplanes, the better the abstraction error of the classifier will be (Burges 1998).

SVMs have been used widely in many engineering fields including defect prediction in software engineering (Elish and Elish 2007), surface tension prediction in chemistry (Jie Wang et. al. 2007), geotechnical engineering (Anthony and Goh 2007) and oil and gas (Jian and Wenfen 2006) with very promising results.

Fluid samples used

Fattah (2005) exposed PVT experiment data for thirteen reservoir fluid samples [eight gas condensates, (GC), and five volatile oils, (VO)]. The result of these PVT experiments data was used in this study. The samples were gotten from reservoirs demonstrating different locations and depth, and were selected to cover a wide range of oil and gas fluid characteristics. Some samples characterize near critical fluids (VO 2, VO 5, GC 1, and GC 2) as clarified by McCain and Bridges (1994). Table 1 presents a description of the major properties of these thirteen fluid samples.

Table 1 Characteristics of fluid samples (Fattah et al. 2009)

EOS approach

For every sample in Table 1, an EOS model that matches the experimental results of all available PVT laboratory experiments (CCE, DL, CVD, and separator tests) was derived. For consistency, all EOS models were developed using Peng and Robinson (1976) EOS with volume shift correction (three-parameter EOS) (Fattah 2005). The procedure suggested by Coats and Smart (1986) to match the laboratory results was followed. Then the developed EOS model for each sample was used to obtain MBO PVT properties at different separator conditions using Whitson and Torp (1983) procedure. The MBO PVT properties include the four functions required for MBO simulation are (Rv, Rs, Bo, and Bg). Our database of Rv data consists of 1850 points from eight different gas condensate samples and 1180 points from five volatile oil samples. PVTi module of Eclipse was used to generate our database of Rv data.

R v models using artificial intelligence techniques

Three Matlab codes were written for each tool to develop these AI models where the provided input data in excel format was read and prepressed before implementation. The ANN model and the other models were developed using 13 actual reservoir fluid samples. Table 2 represents statistical analysis of the input data. The data used as input for the three models developed in this study include reservoir pressure in psi, reservoir temperature in °R, reservoir bubble point pressure in psi, oil density and gas density at stock tank conditions in lb/cu, condensate yield, bbl/MMscf, and oil–gas ratio Rv, while the output parameter is the oil–gas ratio Rv.

Table 2 Input data statistics

Results and discussion

The ANN model architecture in terms of number of neurons, layers and the type of connection function were determined based on trial and error process because it was the most successful criteria in developing the model. Two neural network types were used; feedforward backpropagation (Type 1, Fig. 1) and trainable cascade-forward backpropagation (Type 2, Fig. 2). On the other hand, different transfer functions were tested and it was found that the best function was log sigmoid. However, the best neural network or learning algorithm for training was of Type 2. Several problems were faced during training the network. The model was trapped at some point and caused the training to be sopped. This problem was related to the local minimum. To overcome this problem, the maximum number of validation failure was increased to 300 to get the global minimum. To train the network, 70% of the data was used while 15% was used for validation and the other 15% for testing.

Fig. 1
figure 1

Feedforward backpropagation, Type 1

Fig. 2
figure 2

Trainable cascade-forward backpropagation, Type 2

Table 3 shows a summary of the results indicating that neural network of Type 2 gives better prediction using three hidden layers. Of course using more hidden layers will increase the accuracy of the results but it is time consuming.

Table 3 Neural network results

Table 4 exhibits the statistics of the Rv correlation (Fattah 2005) as compared with the new neural network models generated. From this table, one can easily recognize that the new models from the NN and SVM are the best matching models which give the lowest average absolute error 0.1496 and 0.1222%, respectively. To validate the developed models, super testing was done based on unseen data. According to this test, SVM is the best method in terms of accurate prediction with an average relative error of 0.121% followed by NN model with an average relative error of 0.313. On the other hand, FN was the worst model with an average relative error of 27.3%.

Table 4 Statistical comparison of all models and correlations

Figures 3 and 4 present the graphical representation of the results of Type 1 neural network using two and three layers, respectively. On the other hand, Figs. 5 and 6 display the results of neural network of Type 2.

Fig. 3
figure 3

Results of type-1 neural network using two hidden layers

Fig. 4
figure 4

Results of type-1 neural network using three hidden layers

Fig. 5
figure 5

Results of type-2 neural network using two hidden layers

Fig. 6
figure 6

Results of type-2 neural network using three hidden layers

Functional Networks were applied using 70% of the data for training and 30% for testing. Since the range of the input parameters is different as shown in Table 2, the logarithmic values of the input parameters was taken as a normalization method to improve the accuracy of the FN technique. This tool gives results of correlation coefficient of 0.965 for training and 0.962. Figure 7 represents the correlation between the predicted and the actual Rv for training which shows good match while Fig. 8 was for testing. Similarly, Fig. 9 displays the cross-plot of the predicted versus the actual Rv for training, whereas Fig. 10 displays the cross-plot for testing. The results show good agreement but it is not as good as the results obtained using the ANN for both the training and testing although FN is a type of ANN but it is not good in prediction for clustered data as the ANN do.

Fig. 7
figure 7

Predicted and actual Rv as a function of data points using FN for training

Fig. 8
figure 8

Predicted and the actual Rv as a function of data points using FN for testing

Fig. 9
figure 9

Cross-plot of Rv using FN for training

Fig. 10
figure 10

Cross-plot of Rv using FN for testing

SVMs with different kernel functions (Poly, Gaussian, polyhomog, htrbf, and rbf) were used. SVMs were applied using 70% of the data for training and 30% for testing. It was noted that for this type of data only Poly and Gaussian kernel functions were working. This technique gives results of correlation coefficient of 0.995 for training and 0.999 for testing. Figure 11 represents the correlation between the predicted and the actual Rv for training which shows good match while Fig. 12 was for testing. Similarly, Fig. 13 displays the cross-plot of the predicted versus the actual Rv for training whereas Fig. 14 displays the cross-plot for testing. The results show better prediction compared with FN but it is not as good as the results obtained using the ANN for both the training and testing. The advantage of the SVM compare with the ANN is the run time is less. For this type of data only Poly and Gaussian kernel functions were working.

Fig. 11
figure 11

Predicted and actual Rv as a function of data points using SVM for training

Fig. 12
figure 12

Predicted and the actual Rv as a function of data points using SVM for testing

Fig. 13
figure 13

Cross-plot of Rv using SVM for training

Fig. 14
figure 14

Cross-plot of Rv using SVM for testing

Figure 15 represents additional comparison between NN and SVM models in terms of average relative error percent against pressure range. It was shown that SVM is much better than NN.

Fig. 15
figure 15

Error comparison between neural network and support vector machine models

Conclusions

  • The Artificial Neural Network, Support Vector Machine and Functional network techniques are effectively useful to estimate the oil–gas ratio.

  • The input and output parameters were preprocessed and the log normalization method is implemented which give better results for FN and SVM techniques.

  • Support Vector Machines give better results than Functional Networks with an average correlation coefficient of 0.9970 and 0.9935, respectively.

  • Since the analysis of the data indicated that the nature of the data is clustered for most of the input and output parameters, the ANN and SVM give the best results with an average relative error of 0.15 and 0.12%, correspondingly because these models are more flexible to deal with such data.

  • Super testing results also confirm the research conclusions.