Introduction

Saturation pressure is the crucial characteristic of reservoir fluids which plays a significant role in different petroleum engineering calculations such as material balance, reserve estimation, well testing, reservoir simulation, and production planning [1]. Since accurate measurement of saturation pressure is highly expensive and time-consuming, it is more favorable to estimate saturation pressure from empirical models. Several researchers have proposed empirical correlations to estimate saturation pressure from PVT data [36]. Asoodeh and Kazemi [7] integrated three of these correlations in a power-law structure and developed an enhanced model. Quantitative formulation between gas chromatography data and saturation pressure is a potent tool for estimation of saturation pressure in a cheap, fast, and precise way. Soave–Redlich–Kwong (SRK) and Peng–Robinson (PR) equations of states can be employed for such estimations (EOSs). Elsharkawy [8] developed an empirical model to estimate saturation pressure from gas chromatography data. He claimed his model outperformed SRK and PR EOSs. In this study, a novel strategy is proposed to develop a quick, convenient to use, and accurate method for estimation of saturation pressure from gas chromatography data. It utilizes committee machine concept to integrate Elsharkawy, SRK EOS and PR EOS models and produces a sophisticated model reaping all benefits of these models. A hybrid genetic algorithm-pattern search (GA-PS) tool is embedded in the structure of committee machine to extract optimal linear combination of Elsharkawy, SRK EOS and PR EOS models. Results indicated that committee machine performed more satisfyingly compared with individual model performing alone.

Committee machine

Committee machine is an integrated method that produces final output through optimal linear combination of other methods’ outputs. Inspired by divide and conquer principle, committee machine breaks down the problem into computationally simpler parts and reaps the benefits of each part through integrating solutions to these parts. The idea behind the committee machine is to fuse the knowledge acquired by different methods to arrive at an overall output that is more accurate than that of any of the individual method acting alone [9]. Supposing K different methods producing prediction vector of O i such that O1 are predictions corresponding to the first method, O2 predictions corresponding to the second method and so on. Knowing the target vector of T, error of prediction is simply evaluated from following equation.

e i = O i - T , i = 1 , , K
(1)

The expectation of the squared error for the ith method (Oi) is:

E i = ξ [ ( O i - T ) 2 ] = ξ [ e i 2 ] , i = 1 , , K
(2)

where ξ [ . ] denotes the expectation. Therefore, average mean square error (MSE) of prediction for K methods is computed as below:

E avg . = 1 K i = 1 K E i = 1 K i = 1 K ξ [ e i 2 ]
(3)

Extending above equation releases:

E avg . = ξ K 2 ( e 1 2 + e 2 2 + + e K 2 ) × ( 1 + 1 + + 1 ) = ξ K ( e 1 2 + e 2 2 + + e K 2 )
(4)

Assuming average method as the simplest committee machine, following equation results in outputs of committee machine:

O CM = 1 K i = 1 K O i
(5)

Hence, MSE of committee machine is defined as:

E CM = ξ O CM - T 2 = ξ 1 K i = 1 K O i - T 2 = ξ 1 K i = 1 K e i 2
(6)

Rewriting above equation as follow:

E CM = ξ K 2 ( e 1 × 1 + e 2 × 1 + + e N × 1 ) 2
(7)

Considering following Cauchy’s inequality:

( a 1 b 1 + a 2 b 2 + + a K b K ) 2 ( a 1 2 + a 2 2 + + a K 2 ) ( b 1 2 + b 2 2 + + b K 2 )
(8)

Comparing Eqs. 4 and 7 with Cauchy’s inequality, it is simply deduced that error of committee machine is less than or equal to the average error of all methods acting alone.

E CM E avg .
(9)

In this study, Elsharkawy, SRK and PR EOSs models are employed to form inputs of committee machine. To develop the committee machine, a hybrid GA-PS tool is embedded in its structure. Hybrid GA-PS extracts optimal coefficients corresponding to each model to constitute an optimal linear combination of them. To extract optimal coefficients assigned to each model, following fitness function is introduced to GA-PS tool.

MSE CM = 1 K j = 1 K ( w 1 El j + w 2 SRK j + w 3 PR j - T j ) 2
(10)

where, K is number of training data; w j (j = 1, 2, and 3) are assigned coefficient to each model and T is corresponding target value. GA-PS finds appropriate “w” for each model such that MSE of committee machine reaches its minimum value. Constructed committee machine will have better performance compared with individual models (Eqs 1 through 9). For more study about hybrid GA-PS technique, refer to Asoodeh and Bagheripour [10]. Genetic algorithm could be used for following purposes:

  1. (i)

    For curve fitting and finding optimal match function over an input/output data space

  2. (ii)

    For constructing an optimal linear combination of different models which have an overlapping to enhance accuracy of final prediction (current study)

  3. (iii)

    For aggregating different separate models which are constructed on separate subspaces of a specific problem

  4. (iv)

    As an alternative for training of intelligent systems (e.g., neural network, fuzzy logic, support vector machine and, etc.)

Results and discussion

Modeling and testing data for this study are originated from crude oil samples of Middle East, North Sea, and North America which are collected by Elsharkawy [8]. These data contain information about mole fraction of hydrocarbon and non-hydrocarbon component (C1–C6–, C7+, H2S, CO2, and N2), specific gravity of C7+, molecular weight of C7+, reservoir temperature, and saturation pressure. A statistical analysis on these datasets exists in the work by Kazemi et al. [11]. Estimated values of saturation pressure from Elsharkawy, SRK EOS, and PR EOS exist in the work by Elsharkawy [8]. In this stage of study, Eq. 10 was introduced to hybrid GA-PS technique as fitness function for committee machine construction meant to predict saturation pressure through combination of empirical models. GA-PS algorithm was regulated to 1,000 generations, crossover fraction of 72 % with scattered crossover function, and initial range of [0 1]. Run of GA-PS (Fig. 1) leads to finding global minimum of MSE function of committee machine. Figure 1 indicates that global minimum of MSE function of committee machine is 82,142.6 Psi2. Committee machine offers the following equation for estimating saturation pressure through optimal linear combination of Elsharkawy, SRK and PR equation of states models.

Fig. 1
figure 1

Run of hybrid genetic algorithm-pattern search tool for MSE fitness function of committee machine. This plot shows the best and mean fitness values of fitness functions after 1,000 generations for saturation pressure prediction

Saturation pressure = 0.2140 El + 0.0855 SRK + 0.7716 PR
(11)

After construction of committee machine, test data were input to it to evaluate performance of constructed model. Figure 2 shows crossplot between measured and committee machine predicted saturation pressure. High value of correlation coefficient in this figure puts a confirmation on success of committee machine in estimating of saturation pressure. Figure 3 shows cumulative probability of error distribution for committee machine predictions. This figure explicitly indicates that error for most of data points is located in close proximity of zero. 68 % of data points have errors in range of −5.8280 ± 287.6436 which is an acceptable error range comparing with range of saturation pressure data, i.e., [313 6880]. Eventually, committee machine was compared with its elements, including Elsharkawy empirical model and SRK and PR equation of states models. Table 1 indicates results of such a comparison. Results indicate superiority of committee machine to its elements. In almost all petroleum-related problems, where performing a real task/job might last for days, months, and even years, algorithm running time is of less importance. Running the genetic algorithm may take a couple of minutes which is negligible. In the other hand, committee machine by little additional computation enhances accuracy of final prediction, significantly.

Fig. 2
figure 2

Crossplot showing correlation coefficient between measured and committee machine predicted saturation pressure. High value of correlation coefficient, i.e., 0.97338 proves the robustness of committee machine modeling

Fig. 3
figure 3

Cumulative probability of error distribution for committee machine model meant to estimate saturation pressure. Small values of mean and standard deviation (STD) reveal high performance of committee machine modeling. This distribution indicates that 68 % of predicted values have errors in range of −5.8280 ± 287.6436

Table 1 Comparing committee machine (CM) with its elements, including Elsharkawy (El) empirical model, Soave–Redlich–Kwong (SRK) equation of state (EOS) and Peng–Robinson (PR) EOS using correlation coefficient (R) and root mean square error (RMSE) concepts

Conclusions

Owing to the significance of saturation pressure, different empirical correlations are established between gas chromatography data and saturation pressure. Elsharkawy (El) empirical model, SRK EOS, and PR EOS are among the most successful models estimating saturation pressure. This study proved that committee machine concept, inspired by divide and conquer principle can effectively enhance precision of final prediction through combining aforementioned empirical models. Hybrid GA-PS tool, embedded in committee machine was employed to extract optimal coefficients corresponding to each model indicating its involvement in overall prediction of saturation pressure. Hybrid GA-PS was capable of optimizing MSE of prediction for committee machine such that performance function of committee machine is located in its global minimum. The propounded methodology enables to have all beneficial advantages of the three priceless empirical correlations, including El, SRK EOS, and PR EOS in one model. Implementation of the proposed methodology can effectively save time and money.