New proxy models for predicting oil recovery factor in waterflooded heterogeneous reservoirs

To predict the recovery factor (RF) in waterflooded layered oil reservoirs, two empirical relationships were derived. Both correlations use four independent variables. These are reservoir heterogeneity (characterized by permeability variation coefficient), permeability anisotropy (ratio of vertical to horizontal permeability), viscosity of the injected water, and water injection rate. One of the correlations estimates RF at water breakthrough time (RFBT) and the other evaluates RF at the end of project (RFEOP). Each correlation comes in an expanded form with more parameters and a reduced form with fewer parameters. Both models are based on the global linear model. Eclipse black-oil simulation was used to determine RF for generic reservoirs with different combinations of permeability variation, permeability anisotropy, injected water viscosities, and water injection rates. A total of 192 data sets have been generated. Out of these, 144 data sets (about 75% of the generated sets) were used for model development and 48 data sets (about 25% of the generated sets) were used for model testing and validation. The expanded forms of the new developed correlations gave reliable estimates of RFBT and RFEOP with absolute average percent difference (AAPCD) of 6.9 and 1.02, respectively. The reduced forms yielded slightly higher AAPCDs of 8.30 and 1.04, respectively. When tested against 48 simulation-generated data sets, the expanded forms yielded excellent fits for RFBT and RFEOP with AAPCDs of 14 and 6.5, respectively. The reduced forms showed comparable fit with AAPCDs of 16.9 and 6.70, respectively. The highest RFEOP of 50.6% was achieved for a generic reservoir with a permeability variation in V = 0.1 and a permeability anisotropy of kz/kx = 1.0. This particular reservoir needs to be waterflooded using a water viscosity of µw = 1.0 cp and a water injection rate of qi = 10,000 bpd. Finally, when tested against the Guthrie–Greenberger and the API statistical study, using a single field data set, the proposed correlations gave higher absolute percent difference of 22.9 and 22.7 compared to 0.758 and 19.2 for Guthrie–Greenberger and the API statistical study, respectively.


Introduction
Solving oil recovery problems requires the use of reservoir models. The computing time involved increases with increasing model complexity. One way of handling the increased computing time is to develop parallel algorithms that takes care of the heterogeneity of the computing system. However, every time the initial data used for predicting oil recovery is changed the calculations must be changed which, could significantly increase the computing time and resources. Another way of doing things is to use machine learning methods (MLM), which can be used for big data processing. The application of MLM in the oil industry has been expanding (Li et al. 2020). There were many attempts to apply MLM to oil recovery estimation in waterflooding systems. Guthrie and Greenberger (1955) related oil recovery by water drive empirically to reservoir rock and fluid properties. The authors considered 73 sandstone reservoirs. Some had a water drive, others were under a combination drive. Oil recovery was expressed as a function of permeability, porosity, oil viscosity, formation thickness, connate water saturation, depth, oil reservoir volume factor, drainage area, and well spacing. The developed correlation (Eq. 1) produced an excellent fit. It predicted oil recovery within 6.2% of the reported value in 50% of the time and 9.0% of the reported value in 75% of the time.
In Guthrie-Greenberger's empirical model, RF is the recovery factor, k is the absolute permeability in md, S wi is the initial water saturation, ϕ is the porosity, h is the formation thickness in ft, and µ o is the oil viscosity in cp. This equation infers that water drive recovery efficiency is lower in reservoirs of higher porosity.
An API statistical study used a database from 312 reservoirs (Arps et al. 1967) undergoing water drive to develop an empirical equation to predict recovery efficiency. The correlation (Eq. 2) yielded a coefficient of determination, R 2 , of 0.958, indicating an impeccable data fit.
In the API empirical equation, RF is the recovery factor, ϕ is the porosity, µ w is the water viscosity in cp, µ o is the oil viscosity in cp, k is the absolute permeability in md, S wi is the initial water saturation, p i is the initial pressure in psia, p a is the abandonment pressure in psia, and B oi is the oil formation volume factor at initial reservoir pressure in rb/stb. Another secondary recovery model was developed to predict waterflood performance for different reservoir properties and under various design conditions (Khan 1971). Multiple regression equations used eight input variables to predict injection rate, ultimate secondary reserves, response time, peak oil rate, peak production year, and the production profile as a function of time. Historical data from 12 waterfloods located in the San Jorge Basin were used to develop the regional predictive model. To generate five output variables, six geometrical factors and two reservoir quality parameters were used. Input variables are reservoir depth, net sand thickness, pore volume, number of sand layers, number of injectors and producers, porosity, and primary recovery factor. Output variables are injectivity, secondary recovery factor, response time, project life, and recovered reserves (1) after injecting 28% of the required number of pore volumes (R28). Recently, a new correlation for estimating oil recovery factor under waterflooding in core samples at constant water injection rate was proposed (Balhasan and Jumaa 2017). The coefficients of correlation and determination were calculated using nonlinear regression analysis. The empirical model (Eq. (3)) correlated RF to reservoir-to-surface temperature ratio, oil-to-water viscosity ratio, and oil specific gravity.
where RF is the recovery factor, %, T r is the reservoir temperature, °F, T s is the surface temperature, °F, μ o is the oil viscosity, cp, μ w is the water viscosity, cp, γ o is the oil specific gravity. The authors noted that oil recovery factor increased to 48.8% at 194 °F, compared to 38% at 95 °F when one pore volume was injected. Furthermore, the authors indicated that the developed correlation was also applicable for three sandstone reservoirs in Libya and one sandstone reservoir in Kuwait.
Artificial intelligence (AI) was also used in the estimation of the oil recovery factor in waterflooded reservoirs (Mahmoud et al. 2019). In this study, the authors collected a data set of 173 cases and analyzed it statistically. Data outliers within ± 0.3 standard deviation (SD) were removed. Five cases were therefore discarded, and the remaining 168 cases were used to develop the AI models. These models were trained using 77% of the data. The remaining 23% were used for model testing and validation. The parameters used are divided into four groups (asset size, rock parameters, fluid properties, and reservoir energy). The authors noted that their developed model outperformed all the available empirical equations in the literature. The model yielded a coefficient of determination of 0.94 compared to only 0.55 obtained from Gulstad's correlation (Gulstad 1995). Gulstad's is considered as the industry standard. Sharma et al. (2010) used Tertiary Oil Recovery Information Systems (TORIS) for oil reservoirs and Gas Information System (GASIS) for gas reservoirs to match multivariant linear regression. These authors highlighted the accuracy of their linear model. Using descriptions of 34 low-permeability reservoirs, Han and Bian (2018) illustrated the application of a model based on support vector machine in combination with the particle swarm optimization (PSO-SVM) technique for oil recovery. Aliyuda and Howell (2019) demonstrated their successful application of support vector machine using a data set of 93 reservoirs descriptions from the Norwegian continental shelf. Using a huge data set of more than 2000 reservoirs, Makhotin et al. (2020) applied advanced MLMs to estimate oil recovery factor. They concluded that their proposed data-driven approach overcomes several limitations of the traditional methods and that, it is suitable for reliable and rapid estimation of oil recovery factor. In this work, new models were developed for predicting oil recovery factors at water breakthrough time and at end of project in waterflooded multilayered reservoirs. The new models incorporate four key parameters believed to impact the performance of such reservoirs. The statistical database was generated using ECLIPSE black-oil simulation, and the new empirical models were developed through multiple regression analysis using Global Linear Model.

Data preparation for the simulator and assumptions
The Dykstra-Parsons method (Ahmed 2002) was applied to generate permeability distribution across reservoir thickness for different reservoir heterogeneities. Each reservoir heterogeneity corresponds to a specific value of Dykstra-Parsons permeability variation coefficient (V). By setting an average permeability of a 10-layer reservoir (k 50 ) equal to 68 md and selecting several values of V, the corresponding permeability distributions in a descending order are determined using Eqs. 4 and 5. For each selected value of V, Eq. 4 is solved for k 84.1 (at one standard deviation from the average value) as depicted in Eq. 5. Results of these calculations are presented in Table 1. Using a log-probability graph paper with permeability on the log scale and percent cumulative frequency on the probability scale and assuming log-normal permeability distribution, Fig. 1 is generated. The permeability distribution for each selected value of V was then deduced from  Table 2.  Corey's correlations (Ahmed 2002) were used to generate the relative permeability curves as presented in Eqs. (6) and (7), and the results of calculations are shown in Fig. 2.
Additional input data for the simulator were necessary to generate the various recovery performances for various scenarios. The assumed data are listed in Table 3.
The basic assumptions made in this work are as follows: (1) water-wet reservoir, (2) no free gas saturation at all times during the flooding process, (3) waterflooding affects one quadrant of a five spot pattern with one injector and one producer, (4) capillary pressure effect is ignored, (5) constant porosity, thickness and initial water saturation for all layers, (6) low volatility black oil, homogeneous individual layers, and (7) a lateral sweep efficiency of 100%.

Combination scenarios of various key parameters
Eclipse black-oil simulation was used to generate production performance profiles for various combination scenarios. In this work, four key parameters with a dominant impact on waterflooding projects were considered. These parameters are water injection rate, water viscosity, reservoir anisotropy, and reservoir heterogeneity. Table 4 presents values of key parameters used in the combination scenarios and thus, a total of 192 combination scenarios were obtained.

Generation of simulation data for statistical analysis
In addition to other performance profiles, the main output from eclipse after running the above combination scenarios includes water cut versus time and oil recovery factor versus time. A complete set of results of simulator calculations can be found in Ref. Al-Jifri (2020).

Application of minitab to develop the new empirical correlation
The simulator-generated data were then used as input data for the Minitab software for further statistical analysis. The objective of using the GLM is to generate the proposed empirical correlation for oil recovery factor in terms of the four key parameters listed in Table 4. The Global Linear Model (GLM) was implemented because it predicts values for new observations, identify the combination of predictor values that jointly optimize one or more fitted values,  7   78  130  190  400  76  110  160  300  74  97  130  220  73  93  120  170  71  87  110  140  70  84  95  130  69  78  87  105  68  75  78  87  67  72  72  75  66 68 65 65

Fig. 2
Relative permeability curves generated by Corey's correlations and create surface plots, contour plots, and factorial plots. Also, it can identify the key parameters of highest and lowest impact on the predicted oil recovery, water cut, cumulative oil production, and reservoir pressure.

Validation of the new empirical correlation
The empirical correlation (s) thus developed by the GLM using the simulator-generated data were then validated using simulation-generated data not considered in their development. Further validation was accomplished using field data.

Simulated reservoir performance during waterflooding
Plots of specific performances for selected combination scenarios are shown in Figs. 3,4,5,6,7,8,9,and 10. These plots show the simulated performances of oil recovery factor and water cut during the waterflooding project.

Effect of water viscosity
Selected combination scenario: V = 0.5, k z /k x = 1, q i = 10,000 stb/day, and µ w = 0.25-1.0 cp. This selected scenario investigates the effect of changing water viscosity on the waterflood performance. In this scenario, all parameters were held constant and only water viscosity was changed and the results of the predicted performances are shown in Figs. 3 and 4. As can be observed from these plots, the most effective case is when the water viscosity approaches the oil viscosity which leads to favorable mobility ratio of one or less than one. Under such conditions, the highest RF of 45% is achieved for µ w = 1.0 cp as shown in Fig. 3 at end of project. In addition, a water viscosity of 1 cp has been found to yield later breakthrough time and lower water cut than those predicted for the lower water viscosities as depicted in Fig. 4. Thus, favorable mobility ratio can be achieved as the water viscosity approaches the oil viscosity resulting in improved overall waterflooding performance.

Effect of water injection rate
Selected combination scenario: V = 0.7, k z /k x = 1, µ w = 1 cp, and q i = 2000-10,000 stb/day. In this scenario, the effect of water injection rate on oil recovery factor and water cut was investigated as illustrated in Figs. 5 and 6. In these plots, the water injection rate was changed between 2000 and 10,000 stb/d. From Fig. 5, it can be observed that the oil recovery factor is increased from 23 to 51% as the water injection increased from 2000 to 10,000 stb/day. The negative aspect of injecting water at high rate, however, is revealed in Fig. 6, as it leads to earlier water breakthrough at the producing end and significant increase in the produced water cut. This negative aspect of high water injection rate of 10,000 stb/d may well be counterbalanced by the significant increase in cumulative oil production of around 40 million stb at the end of the project (not shown). Therefore, the feasibility of any waterflooding project should be assessed based on similar performances as described above, and the final decision would be a compromise of the above effects.

Effect of reservoir heterogeneity
Selected combination scenario: k z /k x = 1, µ w = 0.5 cp, q i = 10,000 stb/day, and V = 0.1-0.7 In this scenario, the effect of changing permeability variation coefficient on waterflooding performance was investigated and the results are plotted in Figs. 7 and 8. In these plots, V varied between 0.1 and 0.7 and other parameters  were held constants to observe the significance of reservoir heterogeneity on waterflooding performances. From Fig. 7, it can be observed that the highest RF of 46% is achieved in a homogeneous reservoir with V = 0.1 and that the lowest RF of 33% is realized in a heterogeneous reservoir with V = 0.7. The effect of reservoir heterogeneity on water cut performance is not as pronounced as shown in Fig. 8. The selected combination scenario addresses conditions of full cross-flow between layers and high water injection rate. In addition, the descending order of permeability across reservoir leads to what is known as coarsening upward situation in layered reservoirs which would normally lead to the development of piston-like displacement across the macroscopic section (Dake 2001). The high injection rate of water, however, reduces the effect of gravity and thus the displacement front will not become so perfect which may explain the lower RF and later BT for V = 0.7 as shown in Fig. 8.
In this scenario, the effect of changing permeability anisotropy on the performance of waterflood projects was studied. It can be observed in Figs. 9 and 10 that changing k z / k x from 0.1 (limited cross-flow between layers) to one (full  cross-flow between layers) can have significant impacts on the various performances considered in this work. For example, in Fig. 9, the RF is increased from 35 to 45% for k z /k x values of 0.1 and 1, respectively. In addition, higher water cuts are observed limited cross-flow which is indicative of unfavorable waterflood performances as shown in Fig. 10. The coarsening upward of permeability and the water injection rate-gravity relationship dictates the favorable displacement in layered reservoirs with cross-flow.

Predicting RF at water breakthrough time (RF BT )
A total of 144 simulation data sets have been generated, 96 data sets (about 75% of the generated sets) were used to develop the correlations and 48 data sets (about 25% of the generated sets) were used to test for the accuracy of the derived empirical models. The first correlation (Eq. 8), the expanded form, uses the four key parameters considered in this work. The second correlation (Eq. 9), the reduced form, which is based on the GLM analysis, still uses the same independent variables but considers fewer parameters.
In Eqs. 8 and 9, RF BT is the oil recovery factor at breakthrough, k z /k x is the anisotropy, µ w is the water viscosity in cp, V is the permeability variation coefficient, and q i is the water injection rate in Mstb/d.
The results of Minitab software in terms of significance of parameters and model summary for RF BT are illustrated in Table 8 and Fig. 19 of "Appendix".

Generating the new empirical correlation(s)
To develop new correlations for the oil recovery factor, simulation-generated data were used in Minitab.

Fig. 14 Comparison between predicted and simulated values of RF EOP
The results of Minitab software in terms of significance of parameters and model summary for RF EOP are illustrated in Table 9 and Fig. 20 of "Appendix".

Validation of the expanded forms using 144 data points
The accuracy of the proposed correlations developed in the previous section was tested by comparing the values of RF generated by the simulator with those predicted by the new correlations. The results of these comparisons are illustrated in Figs. 11 and 12.

Validation of the reduced forms using 144 data points
Similar comparisons were performed for testing the reduced forms expressed in Eqs. 9 and 11 and the results are shown in Figs. 13 and 14. The 45° line in Figs. 11,12,13,14,15,

Validation of the expanded forms using the remaining 48 data points
The second part of the validation was accomplished by considering the remaining 48 data of the total 192 data points. These 48 data points represent data which were not used in the development of the new correlations. Equation 8 is applied to calculate the RF BT for the 48 data points and the results were compared with the simulation-generated results. The results of this comparison are shown in Fig. 15.
A similar comparison between RF EOP predicted by Eq. 10 and by simulation is illustrated in Fig. 16.
In addition, the reduced forms, Eqs. 9 and 11, have been applied for the 48 data points and the results of the comparison are shown in Figs. 17 and 18. A summary of the results of calculations of average absolute percent difference (AAPCD) for the individual combination scenarios of RF BT and RF EOP is listed in Table 5.

Validation of the new correlations using field data
The new correlations were also validated using Field A data listed in Table 6 (Espinel 2010). The recovery factor at end of project, RF EOP , for this field has been determined by Visual Basic for Applications (VBA) program and found equal to 0.396. The value of RF BT is not available for this field and thus, the only possible comparison was between the predicted values of RF EOP by various methods and the above field value.
The value of RF EOP was estimated by three empirical correlations, namely Guthrie-Greenberger correlation, Eq. 1, API Statistical Study, Eq. 2, and the proposed new correlations, Eqs. 10 and 11. The predicted values of RF EOP were then compared with the field observation and the absolute percent difference for each method was calculated. The results of these calculations are illustrated in Table 7.

Eclipse simulation results
The relative permeability curves generated by Corey's correlations (Fig. 2) intersect at water saturation of 0.65 which is indicative of a water-wet system. The reservoir is assumed to consist of ten layers which have different permeabilities and that there is a significant variation in the values of permeability of these layers across the reservoir thickness. This variation is illustrated in Table 2 which shows that as the permeability variation coefficient (V) increases from 0.1 to 0.7, the reservoir becomes more heterogeneous. For all selected values of V, the permeabilities of the ten layers were arranged in a descending order which represents a situation known as coarsening upward across the reservoir (Dake 2001). Such permeability arrangement scenario would promote gravity effects during the process of waterflooding provided that cross-flow exists between layers. The net effect of gravity and rate of water injection would significantly impact the shape of the advancing displacement front during the process of waterflooding. To a lesser importance is the water-oil end point mobility ratio impact on the development of the shape of the displacement front in flooding heterogeneous reservoirs. These factors, namely rate of water  injection, gravity, layering, cross-flow, and end point mobility ratio, are the key parameters in waterflooding projects. The success of a waterflood project significantly relies on the right combination of these parameters. Accordingly, these key parameters were considered in this work to develop the new correlations and to interpret the simulator results presented in Figs. 3,4,5,6,7,8,9,and 10. For the hypothetical layered reservoir considered in this study, the lateral sweep efficiency has been assumed equal to 1.0.

Validity of proposed correlations (GLM results)
The GLM was used to generate the new correlations for predicting RF BT and RF EOP . The program can also identify the relative importance of the four key parameters which were included in the developed correlations. Using Global Linear Model (GLM), the water injection rate has been found as the most effective parameter and that water viscosity as the least effective parameter as far as the recovery factor is concerned. The accuracy of the expanded forms of the new proposed correlations (Eq. 8 for predicting RF BT , and Eq. 10 for predicting RF EOP ), which include all four key parameters, was tested against: (1) the 144 simulation-generated data points used in their development, (2) against the remaining 48 simulation-generated data points not included in the development of the new correlations, and (3) against real field case data. Similarly, the reduced forms of the new proposed correlations (Eq. 9 for predicting RF BT and Eq. 11 for predicting RF EOP ) were tested as described above. The results of comparisons of (1) and (2) are shown in Figs. 11,12,15,and 16. From Figs. 11 and 12,one can observe that the nearly perfect match between the predicted values and simulatorgenerated values of RF BT and RF EOP , respectively. From Figs. 15 and 16, however, the new correlation Eqs. 8 and 10 seem to generally under predict the values of RF BT and RF EOP , respectively. Similar results were observed for Eqs. 9 and 11 as shown in Figs. 13,14,17,and 18. A summary of the AAPCD of comparisons of (1) and (2) is shown in Table 5. For both validity cases considered in this study, and as expected, it can be observed that the proposed expanded forms yield more accurate results than the proposed reduced forms. For the 144 data points case, an AAPCD as low as 1.02 has been obtained for predicting RF EOP with Eq. 10.
The reliability of the proposed correlations was further tested using published data of a waterflood project with data listed in Table 6 (Espinel 2010). The value of field RF EOP was compared with RF EOP predicted by three methods, namely Guthrie-Greenberger method, API Statistical Study, and the proposed correlations. The results of these comparisons are presented in Table 7. These results are not indicative of the superiority of any of the methods considered in this study simply because a single field data point has been used. Nevertheless, the high APCD values of the proposed correlations are comparable to those of the API method. The fact that the permeability variation coefficient for this field data (V = 0.8) is higher than the maximum value considered in the development of the proposed correlations (V = 0.7) may explain the relatively high APCD values in Table 7. Another reason for the relatively high APCD obtained by the new proposed correlations can be realized by observing Figs. 16 and 18. In these two figures, the proposed new correlation under predicts the RF EOP and noticeably at higher values of this parameter. Additional field data are necessary to examine the proposed correlations.

Limitations of the proposed correlations
The new empirical correlations developed in this study are based on four key parameters believed to impact the overall performance of waterflood operations. Therefore, the proposed correlations will depend very much on the availability of these key parameters, which puts a limitation on their application. Moreover, the selected four parameters may not be enough to evaluate the effectiveness of the waterflood performance in terms of oil recovery and water cut profiles. Other reservoir characteristics, such as wettability preference, initial free gas saturation, dip angle, and rock consolidation, could very much affect the accuracy of the proposed correlations. The proposed empirical correlations were developed for specific ranges of key parameters as shown in Table 4. Therefore, the application of Eqs. 8 through 11 outside these ranges, and specifically for V and q i , may not yield accurate RF BT and RF EOP .

Conclusion
Based on the results of this study, the following conclusions are drawn: • Two sets of new empirical correlations have been developed to predict the performance of a five-spot waterflood in a stratified reservoir. These correlations encompass four key parameters believed to significantly affect the oil recovery factors in waterflood operations. These key parameters include water injection rate, water viscosity, permeability anisotropy, and reservoir heterogeneity. • When tested against 144 simulation-generated data points, the expanded forms of the new correlations have been found to give reliable estimates of RF BT and RF EOP with AAPCD of 6.9 and 1.02, respectively. The reduced forms were found to yield slightly higher AAPCDs for the same data set.
• When tested against 48 simulation-generated data points representing ranges of key parameters outside the ones used in their development, the expanded forms of the new correlations gave fairly good estimates of RF BT and RF EOP with AAPCDs of 14 and 6.5, respectively. • The new correlations gave more accurate estimates of RF EOP than for RF BT . The highest RF EOP of 50.6% was achieved for a combination scenario with q i = 10,000 bpd, µ w = 1.0 cp, k z /k x = 1.0, and V = 0.1. • When tested against two published empirical correlations, using a single field data point, the proposed correlations were found to produce a comparable fit with the API statistical study. • The new developed correlations can be used to get quick and reliable estimates of RF BT and RF EOP . • The water injection rate has been found as the most effective parameter in predicting the oil recovery factor, followed by the Dykstra-Parson coefficient, followed by the k-anisotropy. The water viscosity has been found to be the least effective parameter as far as the recovery factor is concerned.

Recommended measures to improve the accuracy of the proposed correlations
To improve the accuracy and reliability of the proposed correlations, it is recommended to: • Include the effects of free gas saturation, angle of dip, and wettability preference indicator. • Benchmark with other analytical methods and simulation results using more field data. • Increase the total number of simulation-generated data points and consider additional combination scenarios.

Results of GLM for RF EOP
See Table 9 and Fig. 20. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.