Chaos-generalized regression neural network prediction model of mine water inflow

Artificial neural network (ANN) provides a new way for mine water inflow prediction. However, the effectiveness of prediction using ANN model would not be guaranteed if the influencing factors of water inflow are difficult to quantify or there are only a few observation data. Chaos theory can recover the rich dynamic information hidden in time series. By reconstructing water inflow time series in phase space, the multi-dimensional matrix could be obtained, with each column representing an influencing factor of water inflow and its value representing the change of the influencing factor with time. Therefore, a new prediction model of mine water inflow can be established by combining ANN with chaos theory when lacking data on the influencing factors of water inflow. In the present study, the No. 12 coal mine of Pingdingshan China was selected as the study site. The Chaos-GRNN model and Chaos- BPNN model of mine, water inflow were established by using the water inflow data from February 1976 to December 2013. The model was verified by using the water inflow values in the 24 months from 2014 to 2015. The number embedded dimension (M) of influencing factors of water inflow determined by phase space reconstruction was 7, meaning that there were 7 influencing factors of water inflow and 7 neurons in GRNN input layer, and the time delay was 13 months. The value of GRNN input layer neurons was determined accordingly. The maximum Lyapunov index was 0.0530, and the prediction time of GRNN was 19 months. The two models were evaluated by using four evaluation indices (R, RMSE, MAPE, NSE) and violin plot. It was found that both models can realize the long-term prediction of water inflow, and the prediction effectiveness of Chaos-GRNN model is better than that of Chaos-BPNN model.


Introduction
In China, more than 200 water inrushes from Ordovician limestone have occurred in North China coalfields over the past 40 years, resulting in economic losses of more than 30 billion yuan and 1300 deaths [22,25].Water inrush problems in deep mining become increasingly serious as the mining depth increases. Mine water inflow is one of basic technical conditions for coal mine development, which is the total amount of surface water, fissure water, karst water, old kiln water, and other water flowing into mine pit. Accurately forecast the water inflow of coal mines can help to prevent coal mine accidents and ensure the safety of coal mine production and efficient use of mine water [34,40]. Therefore, scholars have gradually developed the prediction of mine water inflow from deterministic methods (such as hydrogeological analytical method, water balance method, numerical methods, etc.) to non-deterministic stochastic prediction (such as analog method, fuzzy mathematical model, gray system theory, etc.) [1,24,33,41]. However, mine water inflow is related to many factors, such as strata pressure, mining disturbance, geologic structures, and water pressure, and these factors have obvious nonlinear characteristics [11,16]. Moreover, with the increasing depth of coal mining, the hydrogeological conditions of mine water filling are becoming more and more complex, and the control factors of mine water inflow are increasing. Consequently, these prediction methods are usually time-consuming and less reliable.
Artificial neural network (ANN) model is able to learn the underlying relationship between input and output signals of a sequential process with no need to take explicit physical rules into onsideration [18]. So many scholars use ANN to predict mine water onflow [3], and these models have higher prediction accuracy. Generalized Regression Neural Network (GRNN) is a radial basis function network based on mathematical statistics [5]. It has the advantages of simple model structure, few parameters to adjust, strong approximation ability, fast learning speed, and high prediction accuracy, which has been widely used in many fields [5,36]. Wang et al. [31] selected three main influencing factors of mine water inflow as the input neurons in GRNN to predict the mine water inflow. Using GRNN to predict the mine water inflow, the numbers of input layer neurons and their values are the key to determining the accuracy of the model, which are determined by the choice of the influencing factors of mine water inflow and their values. However, the data of many influencing factors of mine water inflow usually are difficult to collect and quantify, moreover, it is difficult to determine the key factors among all the influencing factors of mine water inflow.
Fortunately, chaos theory considers that there are a lot of information about related factors in the time series of mine water inflow, and the number and value of related factors can be obtained through the reconstructed phase space [19,29]. Chaos theory is an effective tool for studying complex systems [13], which can restore the rich dynamic information hidden in a single variable of the system [27].
So, this study combined chaos theory with GRNN, using reconstructed phase space of mine water inflow time series to determine the number of GRNN input layer neurons and their values (reducing the subjectivity and limitations of the original GRNN), establishing Chaos-Generalized Regression Neural Network (Chaos-GRNN) model to achieve a more effective prediction model of mine water inflow. Additionally, this model is used to predict the mine water inflow for a long time (24 months) to verify practicality of the model. The paper is organized as follows. Section 2 briefly introduces the basic theory behind reconstructed phase space, and GRNN, outlines the proposed Chaos-GRNN model approach. Section 3 describes the application of Chaos-GRNN model to predict mine water inflow. Section 4 discusses the rationality, the range of parameters of the Chaos-GRNN model, and the comparison with the prediction results of the Chaos-BPNN model. Section 5 lists the main conclusions.

Reconstructed phase space
Takens' time-delay embedding theorem (1981) paved the way for the analysis of chaotic time series. In the reconstructed m-dimensional space, the regular trajectory of the motive system can be restored [8,10]. The key to reconstruct phase space is to determine the time delay τ and the embedding dimension m. In the paper, the embedding dimension m was determined by the Cao method [2], the time delay τ was determined by the autocorrelation function method [7,38].Further details about chaos theory can be found in Karunasingha et al. [13] and Huang et al. [12].
(1) Time delay τ Let x i denotes the observed mine water inflow, with x denoting their means, and n denoting the number of data points considered, τ denoting the number of time moving. Autocorrelation functiont C(τ) can be expressed as The curve draws with τ as the independent variable and C(τ) as the dependent variable. When C(τ) drops to (1-1/e) of the initial value, the corresponding τ is the time delay of reconstructing space.
(2) Embedding dimension m The distance change (i, m) of the nearest point in phase space under different embedding dimensions is calculated by formula (2) firstly; then the mean E(m) of (i, m) is calculated by formula (3), E 1 (m) is calculated by formula (4). Finally, to draw the curve between E 1 (m) and m. When the change of E 1 (m) gradually becomes stable, the m at the stable position is what is required. (1)

The maximum Lyapunov index λ max
The maximum Lyapunov exponent λ max not only is the identification parameter of the chaotic feature (if λ max > 0, then the system has chaotic characteristics), but also can determine the step size R (R = 1/λ max ) of prediction model [30,32].
After reconstructing the phase space, to calculate the distance (L i ) between two adjacent points Y(j) and Y(i)in the m-dimensional space by the formula (6): Then, λ max can be obtained by the formula (7): The step size R of prediction model can be obtained by the formula (8):
The function f (x, y) is defined as the joint probability density function (PDF) of two random variables x and y. When f (x, y) is known, the condition mean of y on a given x 0 can be calculated by In fact, f (x, y) is usually unknown in many systems. GRNN uses a group of measured x and y values to estimate f (x, y) based on the consistent estimators proposed by Parzen [26,36]. For a given measured sample data point {x i , y i |I = 1, 2,…,N}, these timated PDF can be expressed as where n is the number of measured samples, m is the dimension of vector random variable x, and σ is the smooth parameter of the Gauss function (also called the spread factor).
By substituting the estimated PDF in formula (10, 11 and 12) into the condition mean in formula (9) and interchanging the order of integration and summation, the desired condition mean of y for a given x 0 can be yielded: GRNN is structurally composed of an input layer, a pattern layer, a summation layer, and an output layer ( Fig. 1). The training set consists of values of inputs x, each with a corresponding value of an output y. This regression method produces the estimated value of y, which minimizes the squared error [14].The network has the advantages of simple structure, few parameters, and fast prediction speed, however, there is also a disadvantage that the selection of input layer neurons is subjective.
The MATLAB software (R2015b) is utilized to implement analysis of chaotic characteristics and GRNN for mine water inflow estimation.

Chaos-GRNN model
Giving full play to the respective advantages of chaos theory and GRNN, and combining them organically to establish a Chaos-GRNN prediction model for mine water inflow (Fig. 2).
The steps to establish the Chaos-GRNN model are as follows: First, by chaos theory to construct the sample set of GRNN. To calculate the time delay τ and the embedding dimension m and the maximum Lyapunov exponent λ max of the time series{x 1 , x 2 ,…,x n }. By formula (5), the time series{x 1 , x 2 ,…,x n }will transform into the sample set: And the sample set will be divided into the training samples and test samples.
Second, to determine input layer neuron sand pattern layer neurons of GRNN. m is the number of factors affecting the mine water inflow, which is equal the number of both input layer neurons and pattern layer neurons of the Chaos-GRNN model; the value of the input layer neuron is determined by formula (5): Third, to optimize the smooth parameter of GRNN. Based on the neural network toolbox in MATLAB, the cross-validation method is used to optimize the optimal Spread parameter value of newgrnn() function to obtain the smoothing factor σ.
Fourth, to set up model and forecast water inflow. Combine the trained input layer, mode layer data and optimization parameters to construct an optimized Chaos-GRNN model and import the data to predict. The step size of Chaos-GRNN model is determined by formula (8).
Last, to evaluate the model. The model evaluation is based on four performance metrics described below. Let x o (i) and x e (i) denote, respectively, the estimated and observed mine water inflow, with x o , x e denoting their means, and n denoting the number of data points considered.
(1) The coefficient of correlation (R) can be expressed as where the closer the coefficient R is to1,the more perfect is the positive linear relationship. (2) The root mean squared error (RMSE) can be expressed as: (3) The mean absolute percentage error (MAPE) can be expressed as: The closer the values of RMSE or MAPE are to 0, the more reliable is the model performance.
(4) Nash-Sutcliffe efficiency (NSE) can be expressed as: The value of NSE is − ∞ to 1. NSE is close to 1, indicating high reliability of the model; NSE is close to 0, indicating that the model is generally reliable, but the process simulation error is large; NSE is far less than 0, indicating that the model is not credible.
There are many performance metrics to evaluate a model. In this study, the above four indices were chosen, because these four indices are most basic and used most widely [6].

Sources of mine water inflow of the study sites
Pingdingshan No.12 Coal Mine is located in Pingdiangshan city of Henan province, China (Fig. 3).The study area, approximately 12.87 km 2 , is part of Pingdingshan Mining Area. So far, there have been 9 water inrush accidents in Pingdingshan No.12 Coal Mine.
At present, the normal mine water inflow is about 129 m 3 /h. The direct sources of mine water inflow are the water in the goaf of the early mining, the sandstone water in the roof of the coal seam, and the limestone water in the coal floor. The indirect sources of mine water inflow are the atmospheric precipitation and surface water.

Characteristics of water inflow
Monthly mine water inflow data are obtained from Pingdingshan No.12 Coal Mine between February 1976 and December 2015. Thus, the time series data set contains 479 points (Fig. 4).
The embedding dimension m = 7 indicates that there are 7 factors affecting the mine water inflow, so as to determine the number of input layer neurons in the Chaos-GRNN model is 7. λ max = 0.053 > 0, indicates that the mine water inflow time series has chaotic characteristics. R = 1/λ max = 18.86 ≈ 19, indicates that prediction step size of the Chaos-GRNN model is 19 steps.
The phase space of mine water inflow time series in Pingdingshan No. 12 Coal Mine is reconstructed to obtain a 7-dimensional vector space Y with a time delay of 13 months:

Water inflow forecast
Due to the test samples using the water inflow data from January 2014 to December 2015, so need to assume that the values of x 456 , …, x 479 are unknown. That is, in the formula (18), the last column number of Y 378 ~ Y 401 is unknown, so the training samples must be selected before Y 378 . Additionally, the step size of the Chaos-GRNN model is 19steps. So the detailed procedure to predict mine water inflow is shown as follows: Step one: To forecast x 456 .
Selecting (Y 360 , Y 361 ,…,Y 377 ) T as the training sample in GRNN. Output layer output is Ŷ 378 , that is, the predicted value of Y 378 is: In Ŷ 378 , x 456 is predictive value of x 456 .
Step two: To forecast x 457 . Output layer output is Ŷ 379 , that is, the predicted value of Y 379 is: In Ŷ 379 , x 457 is predictive value of x 457 .
Step three: To forecast x 458 .
T as the training sample in GRNN. Output layer output is Ŷ 380 , that is, the predicted value of Y 380 is: Step four: Repeat the above steps to obtain the predicted values of x 459 ,…,x 479 , namely, the verified values of mine water inflow from January 2014 to December 2015 in Pingdingshan No. 12 Mine.
In the prediction process, the initial value of the smoothing factor σ is set to 0.1, the step size is 0.1, and the final value is 2. Under MATLAB, the optimal smoothing factors for the 24 groups of models selected by the cross-validation method are from 0.1to 0.9 (Table 1 and Fig. 6).

Chaos Parameters of mine water inflow time series
Parameters like time delay τ, embedding dimension m and Maximum Lyapunov exponent λ max are very important for describing the chaotic characteristics and predicting of mine water inflow. In this paper, we have calculated the chaotic parameters of mine water inflow in some coal mines China (Table2).
In Table 2, we can see that: (1) All the maximum Lyapunov exponent λ max in different coal mines are greater 0. By chaos theory, λ max > 0 means that the mine water inflow time series has chaotic characteristics. Therefore, this result lays the foundation to use the Chaos-GRNN model for mine water inflow prediction.
(2) τ, m, λ max of the mine water inflow time series are different, which reflects the diversity and complexity of factors related to mine water inflow. According to chaos theory, the embedding dimension m of mine water inflow time series is equal to the number of factors affecting the mine water inflow [15,19,29]. In Table 2, the value of m is between 4 and 8, which indicating that the influencing factors of mine water inflow in different coal mines are roughly 4-8. This result reflects the difficulty of accurately predicting the amount of mine water inflow. On the other hand, it also shows the advantage of chaos theory in mine water inflow prediction (influencing factors of mine water inflow and their values are quantified by Y i ). (3) However, the value of Y i is still the amount of mine water inflow, not the value of each influencing factor of mine water inflow, so this quantification is not a complete quantification. This also led to the slow application of chaos theory.
ANN simulates the way the human brain nerves process information. It has powerful self-adaptation and selflearning functions for samples, and just has the ability to process this information. Therefore, the chaos theory and ANN are coupled to establish the Chaos-ANN model (Fig. 7), which provides a powerful tool for the effective prediction of mine water inflow.

Comparison of Chaos-BPNN and Chaos-GRNN prediction models
Artificial neural network (ANN), which owns powerful adaptive and self-learning functions for samples, can effectively obtain output from input neurons. Therefore, the chaos theory and ANN can be organically combined (the number and value of input neurons and predicted  Among the artificial neural networks, BP neural network (BPNN) is the most widely used now. So in order to compare with the prediction effect of the Chaos-GRNN model, we also establish a Chaos-BPNN model for predicting the water inflow in Pingdingshan No. 12 Coal Mine.
The Chaos-BPNN model consists of three parts: the input layer, the hidden layer, and the output layer, and the layers are connected by weights. Since the embedding dimension m = 7, the Chaos-BPNN model sets 7 input elements, and 1 output element. The number of hidden layer neurons is determined by empirical Equation [42] and trial and error method [23,37]. The Chaos-BPNN model sets 7 hidden layer elements.
After normalizing the data, we conduct network training by MATLAB. The prediction results are shown in Fig. 8.
It can be seen from Table3    Chaos-GRNN model is better than that of Chaos-BPNN model.

Conclusions
Chaos theory can recover the rich dynamic information hidden in time series. By reconstructing water inflow time series in phase space, the multi-dimensional matrix could be obtained, with each column representing an influencing factor of water inflow and its value representing the change of the influencing factor with time. Therefore, a new prediction model of mine water inflow is established by combining ANN with chaos theory when lacking data on the influencing factors of water inflow. The two models were evaluated by using four evaluation indices (R, RMSE, MAPE, NSE) and violin plot. It was found that both models can realize the long-term prediction of water inflow, and the prediction effectiveness of Chaos-GRNN model is better than that of Chaos-BPNN model.
However, whether all the time series of mine water inflow has chaotic characteristics, there is no clear answer yet. Therefore, before using the Chaos-GRNN model, it is necessary to verify the chaotic characteristics of the water inflow time series firstly. Once the water inflow time series does not have chaotic characteristics, other predict methods need to be used.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.