The back propagation based on the modified group method of data-handling network for oilfield production forecasting

Guo, Jia; Wang, Hongmei; Guo, Fajun; Huang, Wei; Yang, Huipeng; Yang, Kai; Xie, Hong

doi:10.1007/s13202-018-0582-9

The back propagation based on the modified group method of data-handling network for oilfield production forecasting

Original Paper - Production Engineering
Open access
Published: 19 November 2018

Volume 9, pages 1285–1293, (2019)
Cite this article

Download PDF

You have full access to this open access article

Journal of Petroleum Exploration and Production Technology Aims and scope Submit manuscript

The back propagation based on the modified group method of data-handling network for oilfield production forecasting

Download PDF

Jia Guo ORCID: orcid.org/0000-0002-1007-480X¹,
Hongmei Wang¹,
Fajun Guo¹,
Wei Huang¹,
Huipeng Yang¹,
Kai Yang¹ &
…
Hong Xie²

1703 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, a novel hybrid forecasting model combining modified group method of data handling (GMDH) and back propagation (BP) is introduced for time series oilfield production forecasting. The proposed model takes advantages of both the modified GMDH networks in effective parameter selection and the BP network in excellent nonlinear mapping and provides a robust simulation ability for oilfield production with higher precision. Various production parameters of an actual oilfield were utilized to analyze and test the annual output predicted by proposed model (modified GMDH-BP). The performance of the proposed model was compared with the multiple linear regression (MLR), GMDH, modified GMDH, BP, and the hybrid model combining group method of data handling and back propagation (GMDH-BP) using time series annual production data. The relative error, correlation coefficient (R), root mean square error, mean absolute percentage of error, and scatter index were utilized to investigate the performance of the presented models. The evaluation results indicate that the hybrid model provides more accurate production forecasts compared to other models and exhibits a robust simulation ability for capturing the nonlinear relation of complex production time series prediction of oilfield.

A systematic review and meta-analysis of artificial neural network, machine learning, deep learning, and ensemble learning approaches in field of geotechnical engineering

Article Open access 13 May 2024

Machine learning for earthquake prediction: a review (2017–2021)

Article 17 March 2023

A systematic review of data science and machine learning applications to the oil and gas industry

Article Open access 24 September 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The production of oil field is a fundamental indicator that providing decision-making basis for oil production investment and adjustment disposition. The productivity of oilfield is a complex nonlinear system with comprehensive characterization gathering geological cognition, development policies, and production performance, which is becoming an important topic and generated abundant research achievements (Lia et al. 1997; Bardi 2005; Wang et al. 2015). The forecast methods of oilfield production can be classified into two categories: the traditional approach mainly based on knowledge driven and the artificial intelligence algorithm presented by the data-driven models (Samsudin et al. 2011). The traditional methods include experience formula (Tang et al. 2010), physical simulation (Liu and McVay DA 2009), and curve fitting (Arps 1945; Hubbert 1980; Wang and Feng 2016) and have been well developed in the past decades. This method is widely used for the advantage of simple principle and handy calculation, but has a larger prediction error when dealing with complex nonlinear system.

Recent years have witnessed the vigorous development of the artificial neural network (ANN) and it is widely applied in engineering, computer science, and information science (Haykin 1998). Although lacking of physical interpretation and insight of production rules, but it can provide sufficiently accurate and reliable results. In the field of oil exploitation, the ANN models have recently accepted as an effective tool to predict reservoir properties such as permeability (Ahmadi and Shadizadeh 2012; Ahmadi et al. 2013; Ahmadi 2012; Ahmadi and Goudarzi 2013), minimum miscible pressure (Ahmadi 2012), asphaltene precipitation (Ahmadi and Golshadi 2012), condensate-to-gas ratio (Zendehboudi et al. 2012), and forecast oil flow (Jr et al. 2009; Mollaiy Berneti and Shahbazian 2013). Especially, for the back propagation (BP) neural network, which is one of the most popular algorithm in ANN, has been proved with excellent advantages in the aspects of reservoir dynamic performance from single pattern recognition (Balch et al. 1999; Tapias et al. 2001) to multi-factor forecasting (Yi-Bao et al. 2005; Yu et al. 2008). For complex nonlinear systems, the major advantage of BP is strong adaptive, power fault tolerance, and high fitting accuracy. However, this method is sensitive to the topological construction and different types and quantities of input factors may lead to different results (Yu et al. 2008).

It is very difficult to describe the nonlinear system, since a priori knowledge is needed for system identification (Kashyap 1973). However, the group method of data-handling (GMDH) algorithm has a automatically selection ability and higher forecast ability without any prior knowledge (Mori and Tsuzuki 1990). The GMDH algorithm was first developed by Ivakhnenko as a tool for identifying the relationship between the input layer and output layer in nonlinear system. In the past decades, the GMDH model has been successfully applied in a widely fields such as engineering (Najafzadeh and Barani 2011), education (Abdel-Aal and El-Alfy 2009), medicine (Abdel-Aal 2005), and economy (Parks et al. 1975). In petroleum engineering, the GMDH model has been paid little attention and only a few applications to predict oil price (Mohsen et al. 2010) and determine physical properties (Al-Ajmi et al. 2012; Ghorbani et al. 2014) have been carried out. However, conventional GMDH has shortcomings of difficulty in determining the best partition of data sets and elimination of effective parameters untimely, which induce the model precision exists subtle differences.

In this paper, the GMDH algorithm is improved by randomly drawing method and original variable preservation method to improve the selection performance. Then, the optimized GMDH model is combined with the Back Propagation (BP) neural network, and the effective parameters selected by the modified GMDH algorithm are used as the input neurons of the BP network. Excellent mapping modeling ability of complex systems of BP network makes it available to accurately predict oil production.

Definition of parameters for reservoir production forecasting

The production of oilfield is one of the most direct indicators of the oilfield development which is a complicated system engineering with various development indexes. This paper makes a qualitative analysis of the factors influencing oil production and initially determined the representative factors based on reservoir engineering principle. In theory, the production of oil fields will pass through rising, stable, and declining periods successively during development process, as shown in Fig. 1a. For a particular oilfield, the actual oil production in recent years generally followed this rule, in which stable production period and decline period contributed most of the oil production (Fig. 1b). It is well known that the factors influencing oil reservoir production can be divided into geological factors and development factors. For a reservoir with certain reserves, the dynamic oil production is closely related to the development of technical policies. Specifically, the development technology policy involves exploitation method, well type, well density, injection strength, deployment scale for new wells, and other factors, which has a different impact on the oil production.

The dynamic change of oil production is mainly affected by changes in liquid production, water injection rate, and reservoir water cut, which will have a lasting influence upon the whole life of petroleum recovery. Especially, in the later period of development, the amount of liquid production is basically maintained in a stable state and the speed of oil production decline keeps rising in the same time. Thus, the change of oil production is only affected by the change of water cut in this period. The production composition will change dramatically in the medium-later stage of oilfield development, which is mainly manifests in two aspects: one is that the proportion of new wells in oil production will decreased significantly with the speed of oil and water wells growing slower, as shown in Fig. 1c; the other is that is the principal strategies of oilfield development shifting from controlling water cut rising too fast to seek optimizing recovery percent of geological reserves and oil recovery rate, as shown in Fig. 1d. Based on the analysis above, we have initially take these influence factor into consideration: the total amount of oil and water wells (x₁), the number of active wells (x₂), the number of new wells (x₃), the injection rate of last year (x₄), water cut (x₅), the oil production rate (x₆), recovery percent of reserves (x₇), and oil production of last year (x₈).

Methodology

Modified group method of data handling

The algorithm of group method of data handling (GMDH) is a feed forward neural network for modeling and identification of complex systems, which was proposed by Ivakhnenko in the 1960s (Ivakhnenko 1971). General form of the network can be expressed by a complicated polynomial series in the form of the Volterra series, known as the Kolmogorov–Gabor polynomial:

$$\bar {y}={a_0}+\sum\limits_{{i=1}}^{n} {{a_i}{x_i}+\sum\limits_{{i=1}}^{n} {\sum\limits_{{j=1}}^{n} {{a_{ij}}{x_i}} } } {x_j}+\sum\limits_{{i=1}}^{n} {\sum\limits_{{j=1}}^{n} {\sum\limits_{{k=1}}^{n} {{a_{ijk}}{x_i}{x_j}{x_k}+ \cdots } } } $$

(1)

where $x$ represents the input of the system, $n$ is the number of the inputs, and $a$ is coefficient.

In general, however, the polynomial proposed above is used in the form of multiple binary quadratic equations in each layer just like

$$\bar {y}={a_1}x_{1}^{2}+{a_2}x_{2}^{2}+{a_3}{x_1}+{a_4}{x_2}+{a_5}{x_1}{x_2}+{a_6}.$$

(2)

All pairs of the neurons in each layer are calculated in the form (2), and then, the difference between the actual output $y$ and the fitted value $\bar {y}$ can be obtained. By introducing mean square error (MSE) as the principle of screening for each layer, the difference mentioned above can be minimized continuously until the downtrend stopped:

$${\text{MSE}}=\frac{1}{N}\sum\limits_{{i=1}}^{N} {{{({y_i} - {{\hat {y}}_i})}^2}} .$$

(3)

Compared to the traditional neural network, the GMDH network has two significant benefits: (1) determine automatically the number for both network layers and neurons in each layer, which diminish the artificiality successfully in the simulation process and (2) build the connection between the selected parameters and the output in the form of polynomials, which differs from other neural networks with black box model. However, different division modes for data sets lead to multifarious results, which cannot get global optimum solution. Meanwhile, fluctuant threshold based on selection rules will produce the possibility that different parameters in each layer are eliminated untimely.

In this section, the algorithm of GMDH network was improved specifically in two ways. First, to avoid the case that different division modes of the data sets lead to various constructed models with distinct differences, we use randomly drawing method to realize the division of the data sets into training sets and testing sets. The constructed model can get global optimum ultimately with higher precision than traditional partition manner. Second, the intermediate polynomials in each layer only related to the upper layer by screening with external criterion, so the selection of effective variables is independent among the network layers, which lead to the circumstance that partial variables are estimated with less influence and eliminated untimely in some layers. Point to this situation, we introduced original variable preservation method to optimize the selection of variables establishing the GMDH network. More details of the optimum method were presented in the lecture (Guo et al. 2017). In this investigation, the function of the modified GMDH algorithm was set to provide effective variables as the input of the back propagation.

Hybrid modified GMDH-BP algorithm

The modified GMDH network is good at estimating the relationship between the effective variables and the output in higher precision, and the back propagation network has great advantages in regression problems. In this section, the combination of the modified group method of data handling (GMDH) and back propagation (BP) as a hybrid model is proposed to improve the precision of oil production, which overcomes the shortcomings of back propagation for variable selection problems. First, the input parameters are selected by the function of the modified GMDH network, which has been enhanced the ability of screening variables by the improved algorithm. Then, based on the selected variables, the BP network is used to forecast the output of oilfield. The whole procedure of the proposed hybrid model can be described in the following manner:

Step 1: The original data sets are normalized first and then separated into the training sets and testing sets with the randomly drawing sample method.

Step 2: With the input variables $\{ {x_1},{x_2}, \ldots ,{x_m}\} $, each two of them are generated and the number of the combination are $C_{m}^{2}=\frac{{m(m - 1)}}{2}$. Compare the value $\bar {y}$ calculated by the formula (2) with the true value $y$ and determine new input variables $\{ {x_{11}},{x_{12}}, \ldots ,{x_{1j}}\} $ for next layer.

Step 3: Merge the new input variables $\{ {x_{11}},{x_{12}}, \ldots ,{x_{1j}}\} $ and the original variables$\{ {x_1},{x_2}, \ldots ,{x_m}\} $ into a new pair of input variables and repeat step 2. Sort the MSE of all pairs of the neurons in each layer and select the variables with fewer error into next layer.

Step 4: The iteration stops when the smallest MSE of each layer cannot keep decreasing and record the MSE of the last layer. Return to the step 1, repeat steps 1–4 until the number of iterations reach the upper limit or the required precision of MSE is obtained.

Step 5: The process of the modified GMDH network ends and export the selected effective variables $\{ {x_1},{x_2}, \ldots ,{x_n}\} $ to the BP network. The whole structure of the network is designed as single-hidden layer and the number of hidden nodes is computed based on the actual situation.

Step 6: The input variables are propagated forward through the designed network in each layer; when it reached the output layer, the difference between the output and the true value are calculated by a loss function. Then, the error is propagated from the output layer back through the same network and completes the weight update. The configuration of the proposed hybrid model is shown in Fig. 2.

Application of hybrid algorithm based on modified GMDH and BP networks

Data collection

The data sets provided by Guo (2009) were utilized in this study, which present the dynamic process of a low permeability reservoir development with a time span of 1980–2006. Based on the research conducted by Wang et al. for using multiple linear regression (MLR) to predict the output of oilfield, Guo built an improved MLR model for predicting annual output of oilfield by analyzing the statistics and the important information from the regression parameters. The detailed data sets are shown in Table 1; meanwhile, the performance of the MLR model is shown in Table 2.

Table 1 Original data sets of parameters for production forecasting of oilfield

Full size table

Table 2 Summary of the evaluation results for different models

Full size table

Effective parameter selection based on the GMDH-type algorithm

After the normalization procedure, the data sets were used in the proposed hybrid model discussed above. The modified GMDH algorithm was used to perform effective parameter selection, and the corresponding polynomials of oilfield dynamic output for selective parameters are as follows:

$$\begin{aligned} Q_{4}^{1} & = - 0.24729N_{{{\text{new}}}}^{2} - 0.75502Q_{{{\text{last}}}}^{2}+0.1695{N_{{\text{new}}}} \cdot {Q_{{\text{last}}}} \\ & \quad +0.062865{N_{{\text{new}}}}+1.2909{Q_{{\text{last}}}} - 0.014572 \\ \end{aligned} $$

(10)

$$Q_{{12}}^{1}={N_{{\text{act}}}}$$

(11)

$$\begin{aligned} Q_{4}^{2} & =0.31998{(Q_{4}^{1})^2}+0.12064{(Q_{{12}}^{1})^2}+1.1412Q_{4}^{1} \cdot Q_{{12}}^{1} \\ & \quad +0.51057Q_{4}^{1}+0.031821Q_{{12}}^{1}+0.0079427 \\ \end{aligned} $$

(12)

$$Q_{{17}}^{2}=R$$

(13)

$$\begin{aligned} Q_{3}^{3} & = - 0.28403{(Q_{4}^{2})^2} - 0.070727{(Q_{{17}}^{2})^2}+0.31146Q_{4}^{2} \cdot Q_{{17}}^{2} \\ & \quad +0.93718Q_{4}^{2}+0.022181Q_{{17}}^{2} - 0.0010487 \\ \end{aligned} $$

(14)

$$Q_{{18}}^{3}={Q_{{\text{last}}}}$$

(15)

$$\begin{aligned} Q_{1}^{4} & = - 3.7867{(Q_{3}^{3})^2} - 3.4284{(Q_{{18}}^{3})^2}+7.2835Q_{3}^{3} \cdot Q_{{18}}^{3} \\ & \quad +1.1992Q_{3}^{3} - 0.22921Q_{{18}}^{3}+0.002322 \\ \end{aligned} $$

(16)

$$Q_{{16}}^{4}=\upsilon $$

(17)

$$\begin{aligned} Q_{2}^{5} & = - 0.0067055{(Q_{1}^{4})^2}+0.0037029{(Q_{{16}}^{4})^2} - 0.010687Q_{1}^{4} \cdot Q_{{16}}^{4} \\ & \quad +0.99635Q_{1}^{4} - 0.0086437Q_{{16}}^{4}+0.004273 \\ \end{aligned} $$

(18)

$$Q_{{11}}^{5}={N_{{\text{total}}}}$$

(19)

$$\begin{aligned} Q_{1}^{6} & = - 0.63468{(Q_{2}^{5})^2} - 0.7048{(Q_{{11}}^{5})^2}+1.3469Q_{2}^{5} \cdot Q_{{11}}^{5} \\ & \quad +0.94474Q_{2}^{5}+0.052598Q_{{11}}^{5} - 0.00012446. \\ \end{aligned} $$

(20)

As a contrast, the traditional GMDH algorithm was considered and taken to predict the relationship between the output and the parameters, and the results are presented as follows:

$$\begin{aligned} Q_{4}^{1} & = - 0.24729N_{{{\text{new}}}}^{2} - 0.75502Q_{{{\text{last}}}}^{2}+0.1695{N_{{\text{new}}}} \cdot {Q_{{\text{last}}}} \\ & \quad +0.062865{N_{{\text{new}}}}+1.2909{Q_{{\text{last}}}} - 0.014572 \\ \end{aligned} $$

(21)

$$\begin{aligned} Q_{7}^{1} & =2.1287N_{{{\text{new}}}}^{2}+0.86016{\upsilon ^2}+0.69915{N_{{\text{new}}}} \cdot \upsilon \\ & \quad - 0.87815{N_{{\text{new}}}} - 1.4141\upsilon +0.57713 \\ \end{aligned} $$

(22)

$$\begin{aligned} Q_{{10}}^{1} & =0.92075N_{{{\text{new}}}}^{2}+0.4325Q_{{{\text{inj}}}}^{2}+0.16661{N_{{\text{new}}}} \cdot {Q_{{\text{inj}}}} \\ & \quad - 0.38459{N_{{\text{new}}}}+0.63473{Q_{{\text{inj}}}}+0.0014875 \\ \end{aligned} $$

(23)

$$\begin{aligned} Q_{1}^{2} & =0.65114{(Q_{4}^{1})^2}+0.02469{(Q_{7}^{1})^2}+0.19724Q_{4}^{1} \cdot Q_{7}^{1} \\ & \quad +0.69265Q_{4}^{1} - 0.012697Q_{7}^{1}+0.01097 \\ \end{aligned} $$

(24)

$$\begin{aligned} Q_{{10}}^{2} & = - 0.15054{(Q_{4}^{1})^2} - 0.13066{(Q_{{10}}^{1})^2}+0.74814Q_{4}^{1} \cdot Q_{{10}}^{1} \\ & \quad +0.55802Q_{4}^{1}+0.03793Q_{{10}}^{1}+0.0075964 \\ \end{aligned} $$

(25)

$$\begin{aligned} Q_{1}^{3} & =6.569{(Q_{1}^{2})^2}+5.9129{(Q_{{10}}^{2})^2} - 12.4899Q_{1}^{2} \cdot Q_{{10}}^{2} \\ & \quad +0.77332Q_{1}^{2}+0.22963Q_{{10}}^{2} - 8.1951 \times {10^{ - 5}}. \\ \end{aligned} $$

(26)

As can be seen from the analytical equations, the modified GMDH algorithm runs within six rounds of iteration, with the MSE of each layer reduces from 0.1999 to 0.0231. Meanwhile, the effective parameters selected by the modified GMDH algorithm are $\{ {N_{{\text{total}}}},{N_{{\text{act}}}},{N_{{\text{new}}}},\upsilon ,R,{Q_{{\text{last}}}}\} .$ By contrast, the traditional GMDH algorithm runs within three rounds of iteration, with the MSE reduces from 0.1999 to 0.0631, and the output parameters are $\{ {N_{{\text{new}}}},{Q_{{\text{inj}}}},\upsilon ,{Q_{{\text{last}}}}\} $.

The output prediction based on the BP algorithm

Different types and numbers of the input nodes lead to differentiated outcomes. As mentioned above, the effective parameters selected by the traditional/modified GMDH network fed into the BP network for analysis and prediction.

The structure of the BP network adopted single-hidden layer network. The number of the input nodes equals to the sum of parameters selected by GMDH type model, respectively. Determining the number of nodes in hidden layer is a crucial point for BP network prediction. Either too many or too few nodes in hidden layer will increase the simulating error. Major methods for solving this problem conclude the following rules:

Liu (2008):

$$l=2m+1$$

(27)

or

$$l=\sqrt {mn} $$

(28)

Fu and Zhao (2010):

$$l=\sqrt {n+m} +a.$$

(29)

In which $l$ is the number of nodes in hidden layer, $n$ and $m$ are the number of nodes in the input and output layers, respectively, and $a$ is constant ranging from 0 to 10. In this paper, the actual number of nodes in hidden layer is computed to be $\sqrt {6+1} +10=13$.

For the hybrid model of modified GMDH and BP network, a total of 22 records used for oilfield output prediction were gathered from 27 records with the production history from 1980 to 2006. The learning rate was 0.1, the stop criterion of error function was set to 0.001 and the maximum number of iteration was 1000. The initial weights and threshold were randomly generated by the computer. In the process of model operation, 22 data samples (81%) were randomly selected for training the BP network and the remaining 5 data samples (19%) were used as testing data sets for model evaluation, the average results of residual error obtained from the six prediction models are presented in Table 2.

Results and discussion

The outputs predicted by the hybrid model are presented in Fig. 3, compared with the traditional equations (MLR) and the artificial neural network models (GMDH, modified GMDH, BP, GMDH-BP). The comparison chart indicates that the hybrid model combining the modified GMDH network and BP algorithm is more approximate to the actual production for the oilfield than other models listed. To investigate the precision in-depth analysis, error (mean relative error), R (correlation coefficient), RMSE (root mean square error), MAPE (mean absolute percentage of error), and SI (scatter index) are utilized to investigate the performance of the presented models:

$${\text{error}}=\frac{1}{M} \cdot \frac{{\left| {{Y_{i({\text{Actual}})}} - {Y_{i({\text{model}})}}} \right|}}{{{Y_{i({\text{Actual}})}}}}$$

(30)

$$R=\frac{{\sum\nolimits_{{i=1}}^{M} {({Y_{i({\text{Actual}})}} - {{\bar {Y}}_{({\text{Actual}})}})({Y_{i({\text{Model}})}} - {{\bar {Y}}_{({\text{Model}})}})} }}{{\sqrt {\sum\nolimits_{{i=1}}^{M} {{{({Y_{i({\text{Actual}})}} - {{\bar {Y}}_{({\text{Actual}})}})}^2} \cdot \sum\nolimits_{{i=1}}^{M} {{{({Y_{i({\text{Model}})}} - {{\bar {Y}}_{({\text{Model}})}})}^2}} } } }}$$

(31)

$${\text{RMSE}}={\left[ {\frac{{\sum\nolimits_{{i=1}}^{M} {{{({Y_{i({\text{Model}})}} - {Y_{i({\text{Actual}})}})}^2}} }}{M}} \right]^{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}}$$

(32)

$${\text{MAPE}}=\frac{1}{M}\left[ {\frac{{\sum\nolimits_{{i=1}}^{M} {|{Y_{i({\text{Model}})}} - {Y_{i({\text{Actual}})}}|} }}{{\sum\nolimits_{{i=1}}^{M} {{Y_{i({\text{Actual}})}}} }} \times 100} \right]$$

(33)

$${\text{SI}}=\frac{{\sqrt {(1/M)\sum\nolimits_{{i=1}}^{M} {{{(({Y_{i({\text{Model}})}} - {{\bar {Y}}_{({\text{Model}})}}) - ({Y_{i({\text{Actual}})}} - {{\bar {Y}}_{({\text{Actual}})}}))}^2}} } }}{{(1/M)\sum\nolimits_{{i=1}}^{M} {{Y_{i({\text{Actual}})}}} }},$$

(34)

where ${Y_{i({\text{Model}})}}$ and ${Y_{i({\text{Actual}})}}$ are the forecasted and observed values, respectively, ${\bar {Y}_{({\text{Model}})}}$ and ${\bar {Y}_{({\text{Actual}})}}$ are the average of the forecasted and observed values, and $M$ is the total of events.

The statistical results of the proposed traditional equations and artificial intelligence approaches with training and testing data are presented in Table 2. It is clear at a glance that the hybrid model combining modified GMDH network and BP algorithm (modified GMDH-BP) is more accurate than other models with higher correlation (R = 0.9986) and lower error (error = 0.0099, RMSE = 13.7979, MAPE = 2.9889, SI = 0.0376). In general, the neural network tools perform better with a relatively higher correlation and lower error in prediction precision. Comparing with the hybrid model combining traditional GMDH network and BP algorithm (GMDH-BP), the proposed model (modified GMDH-BP) improves overall precision in forecasting, with higher correlation from 0.9949 to 0.9986 and lower error from 0.0197 to 0.0099 (error), from 24.8495 to 13.7979 (RMSE), 5.4415 to 2.9889 (MAPE), and 0.0675 to 0.0376 (SI), respectively.

Figure 4 shows the comparison of the six models mentioned above with time series and scatter plots for predicting the production of oilfield. The results derived from the whole six models are in agreement with the actual production, indicating that these prediction algorithms are applicable for modeling oilfield production series data. However, the dashed line generated from modified GMDH-BP is the closest than other models to the solid line which indicating the actual output of oilfield. By means of using correlation coefficient to evaluate the degree of fitting, the hybrid model combining the modified GMDH network and BP algorithm (modified GMDH-BP) is slightly superior to other models taken into consideration. The successful performance obtained in this paper indicates that the hybrid model (modified GMDH-BP) is a powerful tool to simulate the oilfield production time series and has the ability to provide a better prediction performance. In conclusion, the evaluation results suggest that the best performance can be obtained by the hybrid model (modified GMDH-BP), followed by modified GMDH, BP, GMDH-BP, MLR and GMDH models in turn.

Conclusion

Yearly production estimation of oilfield is vital in oilfield development programming and plenty of models predicting the dynamic output have been proposed in recent years. In this paper, we have demonstrated systematically how the yearly production of oilfield could be represented by a hybrid model combining the modified GMDH and BP models. To illustrate the capability of the hybrid model (modified GMDH-BP), an actual oilfield with various production parameters was chosen to be analyzed and used to test the annual output predicted by six models. The forecast of the oilfield production is a complex issue, including various parameters in which share different impact on each other. Therefore, the first step of model construction is the effective parameter selection on the basis of using modified GMDH network. The modified GMDH network proposed in this paper performs better in selecting input variables, owing to the improvement of introducing randomly drawing method for original datasets and original variable preservation method to optimize the selection of variables in each layer. In addition, the excellent precision of the BP algorithm is another favorable factor for obtaining the best degree of fitting for the modified GMDH-BP model. By comparing the performances of the six models, the proposed hybrid model (Modified GMDH-BP) provides the best forecast precision with highest correlation (R = 0.9986) and lowest error (error = 0.0099, RMSE = 13.7979, MAPE = 2.9889, SI = 0.0378). It should be mentioned that the hybrid model (modified GMDH-BP) provides a robust simulation ability of capturing the nonlinear relation of complex production time series prediction of oilfield and thus producing more accurate forecasts.

References

Abdel-Aal RE (2005) GMDH-based feature ranking and selection for improved classification of medical data. J Biomed Inform 38(6):456–468
Article Google Scholar
Abdel-Aal RE, El-Alfy ESM (2009) Constructing optimal educational tests using GMDH-based item ranking and selection. Neurocomputing 72(4):1184–1197
Article Google Scholar
Ahmadi MA (2012) Withdrawn: prediction of minimum miscible pressure by using neural network based hybrid genetic algorithm and particle swarm optimization. J Pet Sci Eng. https://doi.org/10.1016/j.petrol.2012.08.015
Article Google Scholar
Ahmadi MA, Golshadi M (2012) Neural network based swarm concept for prediction asphaltene precipitation due to natural depletion. J Pet Sci Eng 98–99(6):40–49
Article Google Scholar
Ahmadi MA, Goudarzi A (2013) Retracted article: combining artificial neural network and unified particle swarm optimization for oil flow rate prediction: case study. Neural Comput Appl 23(2):565–565
Article Google Scholar
Ahmadi MA, Shadizadeh SR (2012) Permeability prediction of carbonate reservoir by combining neural network and shuffled frog-leaping. J Am Sci 8(2):529–534
Google Scholar
Ahmadi MA, Zendehboudi S, Lohi A, Elkamel A, Chatzis I (2013) Reservoir permeability prediction by neural networks combined with hybrid genetic algorithm and particle swarm optimization. Geophys Prospect 61(3):582–598
Article Google Scholar
Al-Ajmi RM, Abou-Ziyan HZ, Mahmoud MA (2012) Performance evaluation of 14 neural network architectures used for predicting heat transfer characteristics of engine oils. Int J Comput Methods Eng Sci Mech 13(1):60–75
Article Google Scholar
Arps JJ (1945) Analysis of decline curves. Pet Trans 160:228–247
Google Scholar
Balch RS, Stubbs BS, Weiss WW, Wo S (1999) Using artificial intelligence to correlate multiple seismic attributes to reservoir properties. Soc Petrol Eng
Bardi U (2005) The mineral economy: a model for the shape of oil production curves. Energy Policy 33(1):53–61
Article Google Scholar
Fu H, Zhao H (2010) Application design of MATLAB neural network. Machinery Industry Press, Beijing
Google Scholar
Ghorbani B, Ziabasharhagh M, Amidpour M (2014) A hybrid artificial neural network and genetic algorithm for predicting viscosity of Iranian crude oils. J Nat Gas Sci Eng 18(18):312–323
Article Google Scholar
Guo L (2009) Application of improved multivariate linear regression to output prediction of an oilfield. J Xidian Univ 17(7):1251–1257
Google Scholar
Guo J, Huang W, Mao Q, Wang X, Wang X, Song T (2017) Modified GMDH networks for oilfield production prediction. Geosyst Eng (7), 1–9
Haykin S (1998) Neural networks: a comprehensive foundation, 3rd edn. Macmillan, London
Google Scholar
Hubbert MK (1980) Techniques of prediction as applied to production of oil and gas. In: Proceedings, US Department of Commerce symposium, Washington, DC, vol 16(6), pp 8–20
Ivakhnenko AG (1971) Polynomial theory of complex systems. IEEE Trans Syst Man Cybern SMC-1(4):364–378
Article Google Scholar
Jr OL, Nunes U, Rui A, Schnitman L, Lepikson HA (2009) Applications of information theory, genetic algorithms, and neural models to predict oil flow. Commun Nonlinear Sci Numer Simul 14(7):2870–2885
Article Google Scholar
Kashyap R (1973) System identification. Autom Control IEEE Trans 18(1):85–86
Article Google Scholar
Lia O, Omre H, Tjelmeland H, Holden L, Egeland T (1997) Uncertainties in reservoir production forecasts. AAPG Bull 81(5):775–802
Google Scholar
Liu X (2008) Artificial neural network and particle swarm optimization. Beijing University of Posts and Telecommunications Press, Beijing
Google Scholar
Liu C, McVay DA (2009) Continuous reservoir simulation model updating and forecasting using a markov chain monte carlo method. Soc Petrol Eng
Mohsen M, Nafiseh B, Mehdi A, Mohsen M (2010) Forecasting volatility of crude oil price using the GMDH neural network. Quart Energ Econ Rev 7(25):89–112
Google Scholar
Mollaiy Berneti S, Shahbazian M (2013) An imperialist competitive algorithm artificial neural network method to predict oil flow rate of the wells. Int J Comput Appl 26(10):47–50
Google Scholar
Mori H, Tsuzuki S (1990) Comparison between backpropagation and revised GMDH techniques for predicting voltage harmonics. In: IEEE international symposium on circuits and systems, vol 2, pp 1102–1105. IEEE, New York
Najafzadeh M, Barani GA (2011) Comparison of group method of data handling based genetic programming and back propagation systems to predict scour depth around bridge piers. Sci Iran 18(6):1207–1213
Article Google Scholar
Parks PC, Ivakhnenko AG, Boichuk LM, Svetalsky BK (1975) A self-organizing model of the British economy for control with optimal prediction using the balance-of-variables criterion. Int J Comput Inf Sci 4(4):349–379
Article Google Scholar
Samsudin R, Saad P, Shabri A (2011) River flow time series using least squares support vector machines. Hydrol Earth Syst Sci 15:1835–1852
Article Google Scholar
Tang X, Zhang B, Höök M, Feng L (2010) Forecast of oil reserves and production in Daqing oilfield of China. Energy 35(7):3097–3102
Article Google Scholar
Tapias O, Soto CP, Sandoval J, Perez HH, Bejarano A (2001) Reservoir engineer and artificial intelligence techniques for data analysis. Soc Petrol Eng
Wang J, Feng L (2016) Curve-fitting models for fossil fuel production forecasting: key influence factors. J Nat Gas Sci Eng 32:138–149
Article Google Scholar
Wang J, Feng L, Steve M, Xu T, Gail TE, Mikael H (2015) China’s unconventional oil: a review of its resources and outlook for long-term production. Energy 82:31–42
Article Google Scholar
Yi-Bao LI, Zhang XY, Jian-Guo MA, Wang LJ (2005) Study of improving algorithms based on the bp neural network. J Hefei Univ Technol 28(6):668–671
Google Scholar
Yu S, Zhu K, Diao F (2008) A dynamic all parameters adaptive bp neural networks model and its application on oil reservoir prediction. Appl Math Comput 195(1):66–75
Google Scholar
Zendehboudi S, Ahmadi MA, James L, Chatzis I (2012) Prediction of condensate-to-gas ratio for retrograde gas condensate reservoirs using artificial neural network with particle swarm optimization. Energy Fuels 26(6):3432–3447
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the support received from the Major Science and Technology Projects of PetroChina Co Ltd under Grant numbers 2017E-1505.

Author information

Authors and Affiliations

Exploration and Development Institute, PetroChina Huabei Oilfield Company, Renqiu, China
Jia Guo, Hongmei Wang, Fajun Guo, Wei Huang, Huipeng Yang & Kai Yang
CNPC Bohai Drilling Engineering Company Second Logging Company, Renqiu, China
Hong Xie

Authors

Jia Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hongmei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fajun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Huipeng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia Guo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Guo, J., Wang, H., Guo, F. et al. The back propagation based on the modified group method of data-handling network for oilfield production forecasting. J Petrol Explor Prod Technol 9, 1285–1293 (2019). https://doi.org/10.1007/s13202-018-0582-9

Download citation

Received: 07 August 2018
Accepted: 02 November 2018
Published: 19 November 2018
Issue Date: June 2019
DOI: https://doi.org/10.1007/s13202-018-0582-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The back propagation based on the modified group method of data-handling network for oilfield production forecasting

Abstract

Similar content being viewed by others

A systematic review and meta-analysis of artificial neural network, machine learning, deep learning, and ensemble learning approaches in field of geotechnical engineering

Machine learning for earthquake prediction: a review (2017–2021)

A systematic review of data science and machine learning applications to the oil and gas industry

Introduction

Definition of parameters for reservoir production forecasting