Gas–oil ratio correlation (Rs) for gas condensate using genetic programming

A new correlation for solution gas–oil ratio (Rs) for gas condensate reservoir was developed in this paper by using genetic programming algorithm of a commercial software (Discipulus) program. Matching PVT experimental data with an equation of state model, a commercial simulator (Eclipse simulator) was used to calculate the solution gas–oil ratio (Rs) values used in this study. More than 1,800 solution gas–oil ratio (Rs) values obtained from the analysis of eight gas condensate fluid PVT laboratory reports, selected under a wide range of reservoir temperature and pressure, composition and condensate yield, were used. Comparisons of the results showed that currently published correlations of gas–oil ratio (Rs) for gas condensate gave poor estimates of its value (the average absolute error for Standing correlation was 63.48 with a standard deviation (SD) equal to 0.724, the average absolute error for Glaso correlation was 61.19 % with a SD equal to 0.688, the average absolute error for Vasques and Beggs correlation was 52.22 % with a SD equal to 0.512, the average absolute error for Marhoun correlation was 56.34 % with a SD equal to 0.519 and the average absolute error for Fattah et al. correlation was 18.6 % with a SD equal to 0.049). The proposed new correlation improved extensively the average absolute error for gas condensate fluids. The average absolute error for the new correlation was 10.54 % with a SD equal to 0.035. Also, the hit-rate (R2) of the new correlation was 0.9799 and the fitness variance was 0.012. The importance of the new correlation comes from depending only on readily available production data in the field and can have wide applications when representative PVT lab reports are not available.


Introduction
Material balance equation is a useful method of reservoir performance analysis. It is routinely used to estimate oil, and gas reserves and predict future reservoir performance. Schilthuis, in 1936, was among the first to formulate and apply material balance analysis. As time progressed, more sophisticated material balance models evolved, each striving for greater generality.
Application of two-hydrocarbon-component, zerodimensional material balance model had been restricted to black-oil or dry-gas reservoirs. As gas condensate reservoirs exploration increases, there has been a growing need to address this limitation. Spivak and Dixon (1973) introduced the modified black oil (MBO) simulation approach. The PVT functions for MBO simulation and material balance calculations of gas condensate are (condensate-gas ratio R v , solution gas-oil ratio R s , oil formation volume factor B o , and gas formation volume factor B g ). The MBO approach assumes that stocktank liquid component can exist in both liquid and gas phases under reservoir conditions in gas condensate reservoir.
A few authors have addressed the question of how best to generate the PVT properties for gas condensate. Whitson and Trop (1983) used laboratory constant volume depletion (CVD) data to calculate ''MBO'' PVT fluid properties B o , R s , B g and R v for gas condensate fluids. Coats (1985) suggested a different approach from Whitson and Torp's (W&T) to calculate the MBO properties for gas condensates. Walsh and Towler (1994) suggested a new simple method to compute the black-oil PVT properties of gas condensate reservoirs. Fevang et al. (2000) presented guidelines to help engineers choose between MBO and compositional approaches. Fattah et al. (2009) presented new correlations to develop MBO PVT properties when PVT fluid samples reports are not available.
Most of the methods in the literature for generating MBO PVT fluid properties (B o , R s , B g and R v ) for gas condensate need a combination of lab experiments and elaborate calculation procedures.
This study involves two parts: the first part includes a comparison between the different correlations used to calculate the solution gas-oil ratio (R s ) for gas condensate to determine the most accurate one. The second part involves the development of a new correlation to calculate R s for gas condensate reservoir using genetic algorithm methods. Validation of the new correlation is achieved through comparison between the new correlation value of R s and R s generated by Whitson and Torp method from PVT lab data.

Fluid samples
Eight gas condensates (GC) samples are used in this study. The samples were obtained from reservoirs representing different locations and depth, and were selected to cover a wide range of gas condensate fluid characteristics. Some samples represent near-critical fluids as explained by McCain and Bridges (1994). Table 1 presents a description of the major properties of these eight fluid samples.
EOS models in a commercial simulator (Eclipse simulator) were used to develop an EOS model for each sample in Table 1. Tuning the EOS model that matched as best as possible the experimental results of all available PVT laboratory experiments (CCE, DL, CVD, and separator tests) was constructed. The procedure suggested by Coats and Smart (1986) to match the laboratory results was followed. For consistency, all EOS models were developed using Peng and Robinson (1976) EOS with volume shift correction (3-parameter EOS).

Approach
The developed EOS model for each sample in Table 1 was used to output MBO PVT properties (R v , R s , B o , and B g ) at six different separator conditions using Whitson and Trop (1983) procedure. The extracting data for the MBO PVT properties involves 1,836 points from the different eight gas condensate samples. The first part was to compare between the extracted R S and the most common R s correlations to determine the most accurate one. The second part involved the development of a new correlation to calculate R s for gas condensate reservoir using genetic algorithm program.

Genetic programming
Genetic algorithms, evolution strategies and genetic programming belong to the class of probabilistic search procedures known as evolutionary algorithms that use computational models of natural evolutionary processes to develop computer-based problem-solving systems. Solutions are obtained using operations that simulate the evolution of individual structures through mechanism of reproductive variation and fitness-based selection. Due to their reported robustness in practical applications, these techniques are gaining popularity and have been used in a wide range of problem domain. The main difference between genetic programming and genetic algorithm is the representation of the solution. Genetic programming creates computer programs as solution, whereas genetic algorithm creates a string of numbers to represent the solution. Genetic programming is based on the Darwinian principle of reproduction and survival of the fittest and analogs of naturally occurring genetic operations such as crossover and mutation (Koza 1992). Genetic programming uses four steps to solve a problem (Koza 1997): 1. Generate an initial population of random compositions of the functions and terminals (input) of the problem. 2. Execute each program in the population and assign a fitness value. 3. Create a new offspring population of computer programs by copying the best programs and creating new ones by mutation and crossover. 4. Designation of the best computer program in the generation.

Solution gas-oil ratio (R s ) correlations
This part presented the comparison between the common correlations used to calculate the solution gas-oil ratio (R s ) for gas condensate in the literature. The comparison of the Vasques and Beggs correlation (1980) with the observed R s for gas condensate result in average absolute error of 52.22 % with a SD equal to 0.512. Figure 1    Developed gas-oil ratio correlation (R s ) for gas condensate using Genetic Program The second part in this study involved the development of the new correlation to calculate R s for gas condensate reservoir using genetic algorithm program. A commercial Genetic Programming system called Discipulus was used to develop the new R s correlation (Foster 2001, Francone 2004. ''Discipulus is a steady state genetic programming system, using tournament selection in which two pairs of individuals compete each round for reproduction. All the usual parameters can be adjusted with Discipulus: crossover rate, mutation rate, population size, instruction set, distribution of initial program sizes, termination criteria, and parsimony pressure (fitness advantages for smaller programs).'' (Foster 2001) The default settings for a Discipulus project work quite well for almost all projects. In fact, Discipulus automatically sets, randomizes, and optimizes the Genetic Programming parameters for the runs that comprise a project. For that the default setting was used in our run. The values for the default setting are: the selection method is tournament selection, the probability of mutation rate frequency is 90 %, the crossover frequency is 50 %, and the population size (sets the number of programs in the population that Discipulus will evolve) is 500. There are two parameters that control the size of the programs evolved using Discipulus. Initial Program Size (in bytes) sets the size of the programs in the first population created by Discipulus at the start of a run (80 byte in our project). Maximum Program Size sets the maximum length of the body of an evolved program in the population (512 bytes in our project). The Genetic Programming algorithm uses a ''fitness function'' to determine which evolved programs survive and reproduce. The fitness function used depends on whether you present a classification problem or a regression problem to Discipulus, our problem is a regression problem. Generally speaking, the better an evolved program models your training data, the more fit it will be. Discipulus calculates the fitness of evolved programs by determining how closely the outputs of the evolved program and the target outputs in the training data match up. The closer the match, the fitter the evolved program. The two parameters used as fitness measurements are the hit-rate (R 2 ) of the best genetic program and the fitness variance. The input data files for this software are classified into three semi-equal groups, ''training data'', ''validation data'' and ''applied data''. These input files include measured inputs and outputs parameters for our correlation.
The inputs parameters for our correlation are: • Pressure (P), psi; • Reservoir temperature (T), R°; • API gravity of the reservoir fluid; • Specific gravity of surface gas (SG g ); • Specific gravity of surface oil (SG o ).
The output is the solution gas-oil ratio, R s . Discipulus program gives different types of data and charts that show how the run in progress improved its performance. Discipulus creates thousands of models (programs) from given data files that allow us to predict outputs from similar inputs and for each model (program) gives us its performance [the hit-rate (R 2 ) and the fitness variance]. At the end of the run, we choose the best model (program) depending on its hit-rate (R 2 ) and fitness variance to calculate the solution gas-oil ratio; R s . Figure 6 shows the fitness improvement of the best genetic program for our correlation with time. The hit-rate (R 2 ) of the best genetic program (the new correlation), was 0.9799 and the fitness variance was 0.012. Figures 7, 8 and 9 present the match between the observed R s and the calculated R s for the new correlation. Each figure shows that the match between all input points of the observed R s and the calculated R s for the same point from the best program developed by the software, the best team (During a project, Discipulus assembles the best programs into teams. The output from all of the programs that comprise a team are  assembled into one collective output that is frequently better than any particular member of the team), and the selected program (almost is the best program if the default of the software is not changed). The results for our case indicated that the new correlation (best program output from the software) almost completely matches the observed R s data. The model outputs from the software are created as computer programs in Java, C?? code, or assembler program. For that, this correlation is a regression model correlation. So the second step to use this correlation is to run the code resulting from the genetic algorithm to get the value of the correlation. The output C?? code of the genetic model correlation from the Discipulus to calculate the new R S correlation is given in the Appendix. This code was used with C?? compiler to develop a windows interface program to calculate R s value (Fig. 10). This code can be modified to generate a solution gas-oil ratio array for different temperature, pressure values of a given reservoir.
For more model validation, cross-plots between observed and calculated R s were drawn (Fig. 11) and the average absolute error and the SD for the new correlation was calculated and equal to 10.54 % and 0.035, respectively. Table 2 summarizes the statistical comparison between the different correlations and the new correlation. From this table, we found that the new correlation is the best matched correlation.
The new correlation presented in this work can be used with other set of correlations to generate MBO PVT properties for material balance calculation, or reservoir simulation without the need for fluid samples or elaborate procedure for EOS calculations. The application of these correlations is of particular importance, especially when representative fluid samples are not available.

Conclusions
Based on work presented in this study, the following conclusions were made:  Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix
This appendix gives the output C?? code of the best genetic program from the Discipulus to calculate the new R s correlation. This program is a sequential type program.