Height prediction of water-flowing fracture zone with a genetic-algorithm support-vector-machine method

Prediction of the height of a water-flowing fracture zone (WFFZ) is the foundation for evaluating water bursting conditions on roof coal. By taking the Binchang mining area as the study area and conducting an in-depth study of the influence of coal seam thickness, burial depth, working face length, and roof category on the height of a WFFZ, we proposed that the proportion of hard rock in different roof ranges should be used to characterise the influence of roof category on WFFZ height. Based on data of WFFZ height and its influence index obtained from field observations, a prediction model is established for WFFZ height using a combination of a genetic algorithm and a support-vector machine. The reliability and superiority of the prediction model were verified by a comparative study and an engineering application. The results show that the main factors affecting WFFZ height in the study area are coal seam thickness, burial depth, working face length, and roof category. Compared with multiple-linear-regression and back-propagation neural-network approaches, the height-prediction model of the WFFZ based on a genetic-algorithm support-vector-machine method has higher training and prediction accuracy and is more suitable for WFFZ prediction in the mining area.


Introduction
When mining a coal seam, the roof rock will move, forming a fracture zone (Gao 1996). Fractures in this zone may provide a channel for accumulated water in the overlying aquifer, not only destroying groundwater resources in the aquifer but also causing groundwater to flow into the mine along the water-flowing fracture zone (WFFZ), threatening mine safety and production (Wu et al. 2000(Wu et al. , 2016Li and Li 2012;Garritty 1983;Zhang and Yang 2018;Peng and Zhang 2007). Therefore, coal scientists should pay attention not only to the prediction of water abundance in roof aquifers but also to the prediction of WFFZ height.
To date, many important results have been obtained through similar-material simulations (Lin et al. 2010;Gao and Wu 2011;Zhao et al. 2011;Sui et al. 2015;Dong et al. 2015;Zhang et al. , 2017, numerical simulation experiments (Liu et al. 2018a;Wu et al. 2014;Liu et al. 2018b;Meng et al. 2018;Zhu et al. 2018;Yin et al. 2016;Wu et al. 2015), theoretical analysis (Xu and Sun 2008;Shi et al. 2012;Liu et al. 2018b;Qiao et al. 2017;Yin et al. 2016), physical exploration , and drilling tests (Luan et al. 2010;Sui et al. 2015;Dong et al. 2015;Zhang and Peng 2005;Yin et al. 2016) for the study of WFFZ height in the roof after coal mining. However, these methods mainly study height prediction of the WFFZ under specific geological and mining conditions, and it is difficult to conduct such studies under complex and variable geological and mining conditions (Wang 2006).
In recent years, many scholars have proposed multifactor-analysis methods for WFFZ height prediction under complex and variable geological conditions. In terms of the main factors that affect WFFZ height, Liu first considered two control factors: the roof category and the coal seam thickness (Liu 1995). Afterwards, some scholars found that the height of the roof WFFZ in coal mining is also controlled by the length of the coal face, the coal seam pitch, and the burial depth of the coal seam (Hu et al. 2012;Liu 2010;Du and Weng 1997). Using multi-factor analysis, early scholars described the relationship between WFFZ height and the main control factors by using linear models, such as multiple linear regression (MLR) and the entropy method (Hu et al. 2012;Wang et al. 2018;Xu and Sui 2013). With the development of system theory and computer technology, a nonlinear method using back-propagation neural-networks (BPNNs) was applied (Chen et al. 2005). Height prediction of WFFZs has been extensively studied using multi-factor analysis. The main factors affecting WFFZ height have been revealed, and various advanced methods have been applied for its prediction. Among these factors, the roof category is divided into three types: weak, medium-hard, and hard. In fact, the influence of roof category on WFFZ height is relatively complicated. Thus, it is necessary to characterise roof classification quantitively to predict WFFZ height accurately, and determining a reasonable index to characterise roof classification is essential. In addition, many nonlinear methods, such as BPNNs, overcome the limitations of linear analysis, but the use of these methods requires large amounts of observational data. Therefore, a new method is urgently needed to establish a prediction model for WFFZ height based on small amounts of observational data.
In this study, we used single-factor analysis to study the influence of coal seam thickness, burial depth, working face length, and roof category on WFFZ height and created an index to characterise the roof category. Using measured values of WFFZ height and its influence indexes, we established a prediction model for this height based on a genetic algorithm (GA) and a support-vector machine (SVM) to obtain better predictions when using small amounts of observational data and to provide the theoretical basis for the prediction and prevention of roof-gushing (inrush) water disasters in coal mining in the future.

Descriptions of study area
The Binchang mining area covers an area of about 670 km 2 , located in the middle of the Huanglong Jurassic Coalfield in Shaanxi Province, and is one of the most important coal-industry bases in China (Fig. 1). The landform consists of the Loess gully and hilly areas, and the terrain is high in the northwest and low in the southeast. The Yan'an Group of Jurassic strata is a coal-bearing stratum with a total of eight coal seams, among which coal seam number 4 is the current main coal seam with an average coal thickness of about 10 m.
Jing River, Hei River, and Hongya River constitute the main river system in the study area. The Jing River flows through the study area from northwest to southeast. The Hei River flows into the study area from the west and merges into the Jing River. The Hongya River flows into the study area from the east and merges into the Jing River. The groundwater in the study area can be divided into loose-pore unconfined aquifers and bedrock pore-fracture confined aquifers. The bedrock-fracture aquifer of the Cretaceous Luohe Group is the most important and is the main source of domestic and industrial water in the study area, as well as an inrush source that affects the safety of mine production.

Data preparation 3.1 In situ observation of WFFZ height
In coal mining, the overburden rock collapses under gravity, causing the stress of surrounding rock to be redistributed, resulting in tensile failure above the minedout area and shear failure above the coal wall, forming a WFFZ. Therefore, the rock-mass fractures in a WFFZ are more developed than those in normal rock mass. When an exploration hole is drilled into a WFFZ, the consumption of flushing fluid in the borehole will increase markedly, and the water level will continue to decrease. Therefore, WFFZ height can be determined from the consumption of flushing fluid and the variation characteristics of the water level in Fig. 1 Location of the study area and distribution of the WFFZ height observation points Height prediction of water-flowing fracture zone with a genetic-algorithm support-vector-… 741 the borehole. A total of 28 boreholes were arranged in the study area to observe WFFZ height (Fig. 1). The observational results in Table 1 show that WFFZ height in the study area is mainly between 100 and 240 m.

Factors of WFFZ height
It is necessary to analyse the factors affecting WFFZ height before predicting it. Based on previous research results and practical experience, the height of the WFFZ roof after coal mining is mainly controlled by coal seam thickness, working face length, coal seam burial depth, and roof category ( Table 1). The data show that the stratum in the study area is inclined to the north, the pitch changes little, and the influence of pitch on WFFZ height is small. The influence of other factors on WFFZ height is showed as follows.

Coal seam thickness
After coal mining, the influence range of roof displacement and failure is directly controlled by coal seam thickness. With similar conditions of working face length, representative data are selected from Table 1 to analyse the relationship between WFFZ height and coal seam thickness (Fig. 2a). The results show that the height of the WFFZ roof increases significantly with increasing coal seam thickness; therefore, coal seam thickness is an important factor affecting WFFZ height. In addition, with similar conditions of coal seam thickness, when the length of the working face is large, the height of the water diversion fracture zone is also large.

Working face length
Working face length is an important index of the displacement and failure degree of roofs in coal mining. When the working face length is small, a balanced arch structure is formed in the periphery of the roof-failure area, which leads to a low WFFZ height. Before full mining, WFFZ height increases with increasing working face length, but after full mining, this increase ceases. According to the observational data obtained in the study area (Fig. 2b), under similar conditions of coal seam thickness, roof category, and other factors, WFFZ height increases with increasing working face length.

Coal seam burial depth
Coal seam burial depth controls the original stress of the surrounding rock. Deeper burial coal seam entails higher original stress on the surrounding rock. With coal mining, Height prediction of water-flowing fracture zone with a genetic-algorithm support-vector-… 743 the overburden rock collapses under gravity, causing the stress of surrounding rock to be redistributed, resulting in tensile failure above the mined-out area and shear failure above the coal wall, forming the WFFZ. Accordingly, with the increasing of coal seam burial depth, the unloading and fracture range of the surrounding rock increases gradually after mining. The observational data (Fig. 2c) show that WFFZ height increases markedly when the coal seam burial depth increases from 300 to 500 m, and increases slowly when the burial depth exceeds 500 m.

Roof classification
There is a close relationship between WFFZ height and roof classification.  (Xu et al. 2009(Xu et al. , 2012. In the area close to the coal seam, a hard rock layer will form connected fissures and control the synchronous fracture of soft rock for a given range, conducting the development of a WFFZ. Away from the coal seam, either the hard-rock layer is less affected by mining, which inhibits WFFZ development. Therefore, the proportion of hard rock in the WFFZ is used to characterise the properties of the lithologic combination of the roof, which obscures the effect of different layers of hard rock on the overburden failure process and may lead to large deviations prediction results. In our study, the roof is divided into three parts: A, B, and C (Fig. 3). The observational data (Fig. 2d) show that WFFZ height increases significantly with increasing hard-rock proportion when the distance from the coal roof to its upper strata is 0-7 times the coal seam thickness. With increasing hard-rock proportion in roof B, WFFZ height increases slowly (Fig. 2e).
There is a negative correlation between hard-rock proportion in roof C and WFFZ height (Fig. 2f). To describe the effect of the lithologic combination of the roof at different positions in the overburden failure process, the hard-rock proportions of roofs A, B, and C are proposed to jointly characterise the influence of roof category on WFFZ height.

Methodology
The SVM model is a prediction model based on smallsample learning, which can be divided into a supportvector classifier and a support-vector regression (SVR) (Vapnik 1995;Fang et al. 2018). The main idea of SVR is to find an optimal surface based on the Vapnik-Chervonenkis generalization theory to minimise the structural risk of the model (Vapnik 1998). GA is a kind of simulated evolutionary algorithm proposed by Professor Holland of Michigan University in 1969 and summarized by DeJong, Goldberg, and others (Holland 1975;DeJong 1975;Goldberg 1989). The algorithm comes from Darwin's theory of evolution, Weizmann's theory of species selection, and Mendel's theory of population genetics. GA is a type of self-organizing and adaptive artificial-intelligence technology that simulates the process and mechanism of biological evolution in nature to solve extreme-value problems.
SVM parameters greatly impact the prediction ability of the SVM model, therefore, reasonable selection of parameters is important to establish an SVM model. GA is an effective method to select parameters with characteristics of global optimization and computational stability (Lin et al. 2013;Lewis and Randall 2017;Ouyang et al. 2017). Therefore, GA is used to select SVM parameters, avoiding the subjectivity of artificially selecting parameters and improving the prediction ability of SVM. Compared with other methods, SVM optimised by GA (GA-SVM) has the advantages of being only weakly dependent on training samples and having a strong predictive ability.

Results and verification
Using the observed WFFZ height and coal seam thickness, working face length, coal seam burial depth and hard rock proportion of roof for the ZK1-ZK21 borehole as training samples, a prediction model for WFFZ height was established by GA-SVM. The ZK22-ZK28 borehole data were applied to the prediction model as verification samples. Through comparative analysis and an engineering application, the superiority and reliability of the model are verified.

Application of the GA-SVM method
When the GA-SVM method is used, the reciprocal of the sum-squared error of the verification samples is defined as a fitness function. When the population evolves to 80 generations, the sum-squared error of the verification samples and the fitness function (Figs. 4) reach an optimal condition, and the optimum output parameters are C = 8.001 and g = 0.0884.
The optimum parameters C and g are assigned to the SVM parameters, and the SVM model is established by the training samples: where X i and W i are the support vectors and their coefficients, respectively (Table 2), X is the normalised vector of the prediction-sample factors, and ||X i -X|| is the twonorm distance between the support vector and the normalised vector of the prediction-sample factors.

Verification
To verify the reliability and superiority of the GA-SVM prediction model, the same training samples were used to establish an MLR and a BPNN prediction model. Then, all sample data were substituted into the three models to obtain predictions of WFFZ height (Fig. 5a). Using to the predicted and observed heights, the residual values of the WFFZ heights were obtained (Fig. 5b).
The determination coefficient R 2 largely reflects the effect of the prediction model. The equation used to calculate R 2 is given below (Gujarati 2003): andŶ are the observed, mean, and predicted values of WFFZ height, respectively; n is the sample size; and p is the number of variables.
The calculation results show that the determination coefficients R 2 of the GA-SVM-based prediction model for the training and verification samples are 0.925 and 0.974, respectively. The corresponding determination coefficients R 2 of the MLR-based prediction model are 0.859 and 0.970, respectively, and those of the BPNN-based prediction model are 0.986 and 0.581, correspondingly. The results show that the SVM-and MLR-based prediction models closely fit the training and verification samples, making the models suitable for WFFZ-height prediction in the study area (Fig. 5a). The determination coefficient of the BPNN-based prediction model for the training samples is much larger than that for the verification sample, and  over-fitting of the training data occurs without considering the effect of the main factors on WFFZ height. Therefore, compared with GA-SVM and MLR, BPNN performs poorly in WFFZ-height prediction in the study area. The residual plot (Fig. 5b) shows that the residual value of the GA-SVM prediction model is between -28.69 and 15.57 m in all samples, and the sample size with a residual value between -5 and 5 m is 22, accounting for 78.57% of the total samples. The residual value of the MLR prediction model is between -30.66 and 14.79 m, and the sample size with a residual value between -5 and 5 m is 14, accounting for 50% of the total samples. The results above show that compared with the MLR model, the GA-SVM model can better eliminate the influence of outlier points on the model and can establish a model that most samples obey, achieving accurate fitting of most of the sample data. Therefore, the GA-SVM model has more advantages in the prediction of WFFZ height.
To further verify the reliability of the model, the GA-SVM-based model was used in the first mining area of the Wenjiapo coal mine (Fig. 6).
The Wenjiapo coal mine is located to the east of the study area, with an area of about 79.69 km 2 . Coal seam No. 4 is the current main coal seam, with a thickness of 1.08-15.00 m and a burial depth of 300-500 m. The main aquifer is the bedrock-fracture aquifer of the Luohe Group. The thickness of the combined aquifuge between coal seam No. 4 and the aquifer in the first mining area of the Wenjiapo coal mine is between 180 and 235 m (Fig. 7). From the northwest to the southeast of the first mining area, the thickness of the combined aquifuge first increases and then decreases.
Using drilling data from the first mining area of the mine, thematic maps of coal seam thickness, burial depth, and hard-rock proportions in different layers were drawn (Fig. 8). A thematic map of coal seam thickness (Fig. 8a) shows that the thickness of coal seam No. 4 in the first mining area of the Wenjiapo coal mine is between 3 and 15 m, with an average thickness of 11.3 m, low thickness in the east and high thickness in the west, reaching a maximum in the southwest of the first mining area. The thematic map of burial depth (Fig. 8b) shows that the burial depth of the coal seam is between 250 and 550 m, with an average burial depth of 395 m. From the northwest to the southeast, the burial depth of the coal seam is deep-shallow-deep. The thematic maps of hard-rock proportion, when the distance from the coal roof to its upper strata is 0-7 times, 7-10 times, and 10-15 times the coal seam thickness, show that the proportion of hard rock in the different roof-thickness ranges is basically between 0.3 and 0.9. In addition, the mining data show that the length of the coal face in the first mining area is 240 m. The thematic maps of coal seam thickness, coal seamburial depth, and hard-rock proportion for roofs A, B, and C in the first mining area were applied to the established GA-SVM prediction model. Using the predicted height of the WFFZ and the thickness of the combined aquifuge in the first mining area, a connectivity map of the roof aquifer after mining was drawn (Fig. 9).
The connectivity map of the roof aquifer after mining (Fig. 9) shows that of the three coal faces mined in the first mining area, coal faces 4101 and 4102 have not been connected to the roof aquifer of the Luohe Group, while the WFFZ formed after mining coal face 4103 has been connected.
According to the measured working face mine-water inflow (Fig. 10), since mining of coal face 4103 began in August 2018, the mine-water inflow from coal faces 4101 and 4102 is about 10 m 3 /h and 90 m 3 /h, respectively, while the mine-water inflow from coal face 4103 is 180 m 3 /h, the latter continuing to exhibit an increasing trend. This indicates that after the mining of coal faces 4101 and 4102, the roof aquifer of the Luohe Group is not connected to the goaf, and the mine-water inflow is relatively small. However, the aquifer of the Luohe Group was connected to the goaf after the mining of coal face 4103, which led to a Height prediction of water-flowing fracture zone with a genetic-algorithm support-vector-… 747 significant increase of mine-water inflow. This result is basically consistent with the results of the prediction model and further shows that the GA-SVM-based prediction model is reliable. After completing the WFFZ-height prediction in the study area, some additional information about the influence of each factor on WFFZ height is obtained. Specifically, Fig. 11 shows 6 variables sorted according to average decreasing accuracy. The average decrease in accuracy is a measure of the contribution of each variable in the SVM model. The more the accuracy of the SVM model decreases due to the exclusion of variables, the more important is the variable; therefore, the greater the average decrease in accuracy, the more important the variable. According to this index, the most important variable is coal seam thickness, followed by working face length and the ratio coefficient of hard rock proportion in the range of 0-7 times the thickness of the coal seam roof.

Discussion
In the prediction of roof-gushing (inrush) conditions, the study of WFFZ height is important. Research shows that in coal mining, coal seam thickness, burial depth, working face length, and roof category are contributory factors in roof-WFFZ height. A coal seam roof often comprises a combination of brittle-plastic rock layers, and the hard rock in different roof ranges has different effects on WFFZ height after mining. Therefore, roof category cannot be divided simply into three types: weak, medium-hard, and hard. The hard rock proportions in different roof ranges can be used to characterise the influence of roof category on WFFZ height. This is because the effect of hard rock in different roof ranges on WFFZ height differs. Among the factors of WFFZ height, the most important variable is coal seam thickness, followed by working face length and the ratio coefficient of hard rock proportion in the range of 0-7 times the thickness of the coal seam roof.
In addition, the results show that compared with the MLR and BPNN methods, the GA-SVM-based prediction model has a high fitting accuracy for the training and verification samples and can be used to determine WFFZ height in the study area. This is because, compared with other methods, GA-SVM may meet the accuracy requirements when few training samples are provided. Besides, the method may also effectively reduce the impact of outlier values in the model.

Conclusion
Using the Binchang mining area as a study area, we used mathematical statistics to analyse the factors that influence WFFZ height during the mining of coal seams. GA-SVM  (1) The factors that influence WFFZ height in the study area are coal-seam-mining thickness, burial depth, working face length, and roof category. There is a negative correlation between WFFZ height and the ratio coefficient of hard rock proportions in the range of 10-15 times the thickness of the coal seam roof. There is a positive correlation between coal seam thickness, working face length, coal seam burial depth, hard rock proportion in the range of 10-15 times the thickness of the coal seam roof and WFFZ height. Height prediction of water-flowing fracture zone with a genetic-algorithm support-vector-… 749 (2) Among the variables, the most important is coal seam thickness, followed by working face length, and the ratio coefficient of hard rock proportion in the range of 0-7 times the thickness of the coal seam roof.
(3) Compared with MLR and BPNN, the GA-SVM model is more accurate for the verification samples and is more suitable for the prediction of WFFZ-roof height in coal mining.
This research not only proposes a new index to characterise the effect of coal seam-roof category on WFFZ height but also provides a new method for the prediction of WFFZ height in coal mining for the future; this is of great significance for the prediction of roof-gushing (inrush) conditions in coal mining.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. Fig. 9 Connectivity of roof aquifer after mining