Introduction

In mining activities, mine water has always been a great threat to the coal mine safety. According to statistics, more than 25 billion tons of coal resources are at the risk of water inrushes in China1. With increased mining depths in recent years, the hydrogeological conditions of mining become more and more complicated, and the water inrushes from coal-roof strata are increasingly serious2,3. During coal extraction, the strata overlying the coal seams move significantly downward due to the rock pressure, forming multiple fractures and fissures in the coal-roof strata. Once these fractures are interconnected and the impermeability of aquitards is destroyed, various kinds of water bodies from coal-roof strata, including surface water, goaf water and aquifer water, will flow into the mining area through the fractures, resulting in water-inrush accidents. The accidents may cause tremendous loss of life and property. Therefore, in order to effectively prevent water inrushes and ensure the safe production of the coal mines, it is essential to accurately predict the height of fractured water-conducting zone (FWCZ) of coal-roof strata.

Aiming at the prediction of the height of FWCZ, scholars proposed many methods, including empirical formula method, field measured method, theoretical calculation, numerical simulation and so on4,5,6,7,8,9,10,11,12. In the early 1980s, Liu4 proposed an empirical formula by the regression analysis of the limited field measured data collected from several large-scale coalmines in North China. But the formula only considers a few factors so that it is unable to precisely reflect the complicated development mechanism of the water-conducting fractures. For this problem, Hu5 summarized the nonlinear statistical relation between the FWCZ and multiple mining factors including mining height, hard-rock lithology ratio, working face length, mining depth and so on. Shi6 analyzed the movement characteristics of the overlying strata and the division theory of the “four zones” in overlying strata, then proposed theoretical formulas considering multiple mining factors. To ensure the mining safety of shallow coal seams under water-rich aquifers and determine the development of the fractured water-conducting zone, Liu7 built a numerical model to analyze the damage zone distribution in Flac3D model. Furthermore, the mathematical theories were gradually adopted in this field. For instance, Yang8 used the analytic hierarchy process and the fuzzy cluster analysis method to predict the height of FWCZ. Generally, the theoretical calculation model and the numerical simulation have the problems of over idealization and the difficult selection of the mechanical parameters. Meanwhile, the methods were also combined to determine the height of FWCZ. Based on the traditional empirical formula, the crack-measure system and the borehole television detecting system, Gao9 developed the theoretical calculation, quantitative analysis and detection of the FWCZ of coal-roof strata. At present, the most effective method is the field measurement by using borehole video camera system, water injection system and other direct monitoring equipment. However, these systems require large quantities of engineering with enormous expenses.

In recent years, with the rapid development of artificial intelligence technologies, the application of machine learning algorithms (MLAs), such as decision tree (DT), support vector machine (SVM), artificial neural network (ANN) and so on, to predicting the height of FWCZ has gradually been a trend13,14,15,16,17. For example, Sun13 proposed a synthetic calculation system that coupled genetic algorithm (GA) and support vector regression (SVR). This system reflected the relationship between the height of FWCZ and the mining factors effectively. Wu14 presented a radial basis function neural networks (RBFNN) model to predict the height of FWCZ for fully mechanized longwall mining with sublevel caving. However, these MLAs also have certain limitations in practical applications. For instance, numerous data pre-treatment is required in the DT model, and it tends to fall into local optimum; as for the ANN model, it has the shortcomings of over learning, slow convergence speed and local minimum value2.

Considering all the aforementioned problems, this paper proposed a predicting model of the height of FWCZ based on random forest regression (RFR), which is a non-parametric regression approach introduced by Breiman18 in 2001. RFR is a nonlinear modeling tool, coupling the main advantages of two major learning techniques: bagging and random feature selection18. It is suitable for the problems with unclear priori knowledge and incomplete data. Further, unlike simple DTs and neural networks (NNs), RFR runs efficiently on high-dimensional data sets. But if there are a lot of irrelevant variables, the DTs does not perform well. The objective of decision tree is to find the interaction between variables, and the weakness of the neural network is its inability to explain its reasoning process and reasoning basis18. Compared with the traditional intelligence algorithms, such as ANN and SVM, RFR has high prediction accuracy and good tolerance to outliers and noises18,19. Because of its superior performance, RFR has been widely applied to various fields such as biology, medicine, economics, management, remote sensing and other fields in recent years19,20,21,22,23,24,25. RBFNN has strong nonlinear fitting ability, it can map any complex nonlinear relation, and its learning rule is simple, which is easy to realize by computer. However, the theory and learning algorithm need to be further improved26,27,28,29. The group method of data handling (GMDH)-type neural network does not need the preset network structure, and the classification rules are expressed by some simple polynomials. However, the GMDH training algorithm can obtain good results only in the case that the noise and interference are distributed by Gaussian, otherwise, the training algorithm often overfits the network30,31. In the current study, RFR was used to predict the height of FWCZ. To verify the effectiveness of the generated RFR model, it was applied to Hongliu Coal Mine in Northwest China. Also, a SVM model was constructed for comparison. The results indicate that the RFR model has a better performance, and the prediction results are in good accordance with the field measured data observed using borehole video camera system (BVCS).

Material and Methods

Study area

The Hongliu Coal Mine is situated in the middle east of the Ningxia Hui Autonomous Region in Northwest China, approximately 80 km northwest of Yinchuan City (Fig. 1). The mine is distributed in the NW-SE direction, and it has an area of 79.55 km2 with a length of 15 km and a width of 5.5 km. The general elevation is approximately 1400 m above sea level. Topographically, the mine is located in the west of Mu Us Desert, and the landform in the study area is classified as hilly terrain. It has a semiarid-desert continental-monsoon climate with a mean annual precipitation of 216.3 mm.

Figure 1
figure 1

Location of the study area and geological structure map.

Most areas of the mine are covered by aeolian sand of Quaternary, except that sporadic bedrocks are exposed in certain local regions in the southwest of the mine field. According to drilling data, the main strata include: Shangtian Formation of Upper Triassic, Yan’an Formation of Medium Jurassic, An’ding Formation of Upper Jurassic, Qingshuiying Formation of Paleogene (Oligocene) and Quaternary.

Hydrogeologically, aquifers in the mine area can be divided into five groups according to the type of aquifer media and void: Quaternary loose alluvial pore aquifer group; Cretaceous rock pore and fractured aquifer group; Yan’an formation (Jurassic System) rock pore and fractured aquifer group; Upper Triassic fractured aquifer group; Permian sandstone and Carboniferous thin limestone aquifer group.

Structurally, the overall structural complexity in this area is moderate. In general, the Hongliu Coal Mine takes on a linear structure in NW direction. The crisscrossed faults are widely distributed in the study area. According to statistics, 44 faults and five large-scale folds have been exposed by drilling in the study area.

The coal-measure strata in the mining area are in the Yan’an formation of medium Jurassic System. There are 18 coal seams. The main stable and minable coal seams are No. 2 and No. 4 with the average thickness of 4.61 m and 2.97 m, respectively.

Division zones of the coal-roof strata after mining

After the mining of the coal seam, the coal-roof strata are destroyed in various degrees, and have an obvious zoning property. According to the damage degrees and the movement characteristics, the coal-roof strata are divided into three zones: caved zone, fractured zone and continuous bending zone4,17,32, as illustrated in Fig. 2. The fractured water-conducting zone studied in this paper consists of the caved zone and fractured zone.

Figure 2
figure 2

Division zones of the coal-roof strata after mining.

Caved zone

Caved zone is at the bottom of the overlying strata. With the moving forward of the mining working face, the immediate roof strata bear imbalance stress. When the load applied on the strata exceeding their bearing capacity, fractures generate. Finally, the strata crush, and the rocks irregularly fall into the void zone until it is filled. Thus, if an aquitard is located within the caved zone, its impermeability will become invalid in different degrees. So the caved zone provides ideal passages of the water from above aquifers to the working face.

Fractured zone

Fractured zone is above the caved zone, and the strata in this zone still maintain a certain continuity compared with the caved zone. The vertical fractures, inclined fractures and horizontal abscission-layer are heavily developed and distributed in the rocks at the bottom of this zone. The damage extent gradually decreases from the bottom of the fracture zone to the upper part, leading to the decrease of the fractures upward to the integrity rocks. This zone makes it possible that the fractures connect the aquifers, causing water inrushes from coal-roof strata. This zone is the main part of the water-conducting zone.

Continuous bending zone

Continuous bending zone refers to the strata between the fractured zone and the ground surface. The strata in this zone present the basic properties of downward movement without fractures developed within the rocks, especially the soft rock and loose soil strata. The movement of the strata almost hardly affects the impermeability of the aquitards in this zone, and it plays a protective role of the aquitards. A few fissures may appear in certain tension positions, but in general the strata maintain continuous17.

Random forest regression (RFR)

RFR, introduced by Breiman in 2001, is an ensemble learning algorithm of multiple regression trees. Compared with simple decision trees, RFR runs efficiently on high-dimensional data sets, and it is more accurate and robust to noise18,19. Besides, RFR has great advantages over traditional intelligent algorithms18,19,20,21,22,23,24. On the one hand, it has a very fast learning process and can handle a large number of input variables while assessing the importance of variables. On the other hand, when building a forest, it can internally estimate the generalization error and estimating missing data can maintain high accuracy even if most of the data is lost.

RFR is an ensemble of regression trees (RTs) to predict the value of a variable. It draws multiple samples based on the bootstrap resampling method from the original samples, and then constructs the decision trees model for the samples. Finally, the prediction output is obtained by calculating the average value of all prediction trees18. Figure 3 shows the sketch map of the RFR structure, and the specific implementation procedures of the RFR algorithm are as follows:

Figure 3
figure 3

Sketch map of the RFR structure.

(1) Draw k samples randomly from the original training set X (N samples) using bootstrap resampling method, and then k regression trees are constructed. In this process, the probability that each sample wouldn’t be drawn is p = (1−1/N)N. If N tends to infinity, p ≈ 0.37, as indicates that about 37% of the samples in the original training set X are not drawn, these data are called out-of-bag (OOB) data. These OOB data can be used to be test samples.

(2) For k bootstrap samples, k unpruned regression trees are created respectively. In the tree growing process, for each node, m attributes are randomly selected from the total M attributes as internal nodes (m < M). Then, according to the minimum Gini index principle, an optimal attribute is selected from m attributes as a split variable to make the branches grow.

(3) The generated k regression trees constitute the final random forest regression model. The model estimation performance could be evaluated based on the indices: mean square error of OOB (MSEOOB) and coefficients of determination (\({R}_{RF}^{2}\)).

$$MS{E}_{OOB}=\frac{\sum _{i=1}^{n}{({y}_{i}-{\hat{y}}_{i})}^{2}}{n}$$
(1)
$${R}_{RF}^{2}=1-\frac{MS{E}_{OOB}}{{\hat{\sigma }}_{y}^{2}}$$
(2)

where n is the total number of the OOB samples; yi is the observed output value; ŷi is the predicted output obtained by the generated RFR regression model; \({\hat{\sigma }}_{y}^{2}\) is the predicted variance of the OOB output.

Variables importance measures

The RFR model provides two ways to calculate the importance degree of each variable index: mean decrease in Gini index and mean decrease in accuracy18,19,20.

The mean decrease in Gini index means the total impurity decrease of each variable at each tree node. The method evaluates the importance of the variables by calculating the Gini index based on the Equation (3), and then accumulates the total impurity decrease of all the trees.

$${I}_{Gini}=1-\sum _{i=1}^{N}{p}_{i}^{2}$$
(3)

where pi is the probability of the samples belonging to the i-th leaf; N is the number of the leave; IGini is the Gini index.

The basic principle of the OOB error estimation method is: when the noise is added to a related feature which plays an important role in the accuracy, the prediction accuracy of the RFR will decrease significantly. The main procedures are as follows: firstly, for the generated RFR, the OOB error et of each decision tree is calculated according to the OOB data; secondly, the j-th eigenvalue Xj of the OOB data is changed randomly (namely the noise interference is added artificially); then, the OOB data with noise are used to test the performance of the RFR and a new OOB error \({e}_{t}^{j}\) is obtained. Finally, the importance degree of the variable Xj can be calculated according to the Equation (4):

$$I({X}^{j})=\frac{1}{n}\sum _{t=1}^{n}({e}_{t}^{j}-{e}_{t})$$
(4)

where Xj is the j-th eigenvalue of the OOB data; et is the initial OOB error; \({e}_{t}^{j}\) is the OOB error with noise; n is the number of the decision trees; I(Xj) is the importance of the variable Xj. The greater the OOB error caused by the change of the variable Xj, the more the decrease in accuracy, indicating the more important the variable is.

Construction of the main controlling factors system

The development of FWCZ of coal-roof strata is influenced by multiple factors. And it has a complex nonlinear relationship with the strata geological features, rock mechanics and mining conditions17. Based on a large number of field observations for fully-mechanized mining and theoretical studies, five main controlling factors were selected, including mining depth, mining height, lithology type of the overlying strata, working-face length and coal-seam dip angle. A brief overview of the five factors is described as follows.

Mining depth

According to the theories of mining engineering geology and rock mechanics, the situ stress of the strata around the underground excavation space has a great impact on the destruction scope of the surrounding rock. Generally, the primary rock stress of the surrounding rock is proportional to the mining depth. With the increase of the mining depth of the coal seam, the in situ stresses and the displacement of the overlying rock gradually increase, which will lead to more fractures developed in the coal-roof strata.

Mining height

Mining height is the decisive factor of the fractured zone height. The greater the mining height, the larger the range of the coal-roof plastic zone. And a greater space available to the caving rock will form, resulting in a greater height of the fractured zone. In the traditional empirical formula prediction method, mining height is the only factor that controls the FWCZ height.

Lithology type

When the overlying rock is disturbed by the mining activities, the brittle rock with higher hardness (such as limestone and sandstone) is apt to crack and produce fractures. While, for the soft rock (such as mudstone and shale), the plastic deformation mainly occurs, and fractures rarely appear. After the extraction of the coal seams, the compressive strength of the overlying rock directly affects the rock failure degree. The rock with a greater compressive strength will be not prone to be destroyed. Generally, according to the uniaxial compressive strength of the rock, the lithology of the overlying strata is classified into four types13,14,15: hard, medium hard, medium soft and soft, with the quantitative values of 4, 3, 2 and 1, respectively.

Working-face length

Working-face length, like the mining height, is an index that reflects the influence of the mining space size on the fractured water-conducting zone. According to the material mechanics theory, the curvature of a rock beam with two ends fixed is proportional to the span. The greater the length of the working face, the greater the downward curvature of the coal-roof strata. Thus, the break probability of the rock beam increases, resulting in a higher height of the fractured zone.

Coal-seam dip angle

The influence that the coal-seam dip angle on the overlying strata is mainly embodied in the different failure forms of the strata. When the coal seam is horizontal, the form of the fractured zone is nearly symmetrical, showing a saddle shape. With the increase of the dip angle, the failure form of overlying rock gradually develops into parabola and arch shapes.

Results and Discussion

Datasets used

The collection of the datasets is the most important part for any machine learning algorithm. In this study, 85 field measured datasets for fully-mechanized mining were collected from several large-scale coalmines in North China, referring to the previous research documents13,14,15,16,17. Each case contains the field measured data of the aforementioned five main-controlling factors and the height of FWCZ. Of the 85 datasets, 60 (70%) were randomly selected for training (Table 1), while the remaining 25 (30%) for model testing. Figure 4 shows the detailed flowchart of the methodology used in this study.

Table 1 Field measured sample datasets for model training.
Figure 4
figure 4

Detailed flowchart of the proposed methodology.

Establishment of the RFR model

In the RFR, two parameters are required to define: the number of trees in the forest (ntree), and the number of the random variables of the split nodes (mtry). To maximize the model accuracy, it is necessary to optimize the combination of the parameters mtry and ntree18. When ntree is defined with a small value, the RFR prediction error is uncontrollable and the model performance cannot achieve the optimal identification. Conversely, if the parameter ntree is too large, the computation time and required memory will increase accordingly. By repeated operation, it is found that when ntree = 200, the MSEOOB tends to be stable and the model does not tend to over fitting. According to Breiman18, mtry &lt; M. In this case study, there are five variables, namely M = 5. To assess the optimal value of mtry, three RFR models were created for mtry = 1, mtry = 2 and mtry = 3 (Fig. 5). Figure 5 shows the change of the error depending on the number of the trees ntree. The results show that when ntree = 200, the error of the model is stable, and when mtry = 1, the MSEOOB is lowest at about 6.9 m2. Therefore, considering both the accuracy and computation cost, the two optimized parameters of the RFR are as follows: mtry = 1 and ntree = 200.

Figure 5
figure 5

The OOB error of the RFR model.

The contribution of each factor to the generated RFR model is shown in Fig. 6. As shown, the importance degree of each factor is measured based on two ways: mean decrease in Gini index and OOB mean decrease in accuracy. According to the Gini index, mining height and mining depth have the highest importance, followed by coal-seam dip angle and working-face length, while lithology type has the lowest importance. Regarding the OOB mean decrease in accuracy, the order of the importance degree is consistent with the result obtained by Gini index method. Based on both of the features of importance, mining height and mining depth are the two most important factors out of the five factors, as suggests that they contribute overwhelmingly to the development of the FWCZ height.

Figure 6
figure 6

Importance degree of the main controlling factors determined by two ways: (a) Mean decrease in Gini index; (b) Mean decrease in accuracy. (MH: mining height; MD: mining depth; CSDA: coal-seam dip angle; WFL: working-face length; LT: lithology type).

SVM model for comparison

For comparison, support vector machine (SVM) regression model was also used for the prediction of the height of fractured water-conducting zone. SVM has superior prediction performance in various fields for data modeling and function optimization because of its ability to represent non-linearities23. The radial basis function (RBF) was adopted as the kernel function, and the two main parameters RBF kernel coefficient γ and penalty coefficient C were determined as 0.1 and 0.5. And then the SVM regression model was constructed using the same training data aforementioned.

Model evaluation

The model evaluation is an important procedure before the model application. The root mean square error (RMSE) and the coefficient of determination R2 were utilized to evaluate the performance of the two generated regression models. RMSE is generally used for measuring the residual errors, and it reflects the difference between original and modeled values. The lower the RMSE, the better the model performs. R2 provides a measure of how well the predicted output of the regression model fits the observed data. The value of R2 varies between 0 and 1. A higher R2 indicates that the regression model fits the data better. The two indices are defined as follows:

$${\rm{RMSE}}=\sqrt{\frac{1}{n}\sum _{i=1}^{n}{({y}_{i}-{\hat{y}}_{i})}^{2}}$$
(5)
$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{({y}_{i}-{\hat{y}}_{i})}^{2}}{{\sum }_{i=1}^{n}{({y}_{i}-\bar{y})}^{2}}$$
(6)

where n is the total number of the test samples; yi is the observed output value of the test samples; yi is the predicted value by the generated models; ŷi is the average output value of the test samples.

Table 2 lists 25 sample datasets for model testing to evaluate the performance of the models. Using the two regression models generated above, the heights of FWCZ of the 25 cases were predicted. Figure 7 shows the predicted value against the observed data with the test data using SVM and RFR, respectively. Based on the Equations (5) and (6), the RMSE and R2 of the two models were calculated as Table 3. As it shown, the RFR model has the lower RMSE and higher R2 with the value of 2.363 and 0.968, respectively (compared to 4.396 and 0.902 for SVM model). Therefore, it is concluded that both models are reasonable, and RFR has a better performance compared with the SVM.

Table 2 Field measured sample datasets for model testing.
Figure 7
figure 7

Comparison of the observed and predicted height with the test data by using: (a) SVM; (b) RFR.

Table 3 RMSE and R2 of the RFR and SVM models.

Model application

Engineering background and predicted results

The No. 1121 working face of the No. 2 coal seam, located in the center of Hongliu Coal Mine, is the initial mining face of the mine. The length of the working face is 1379 m, and the average mining depth is 265 m. The fully-mechanized longwall mining method is adopted in the mining process. The No. 2 coal seam belongs to the Yan’an Formation of the Jurassic System, with an average thickness of 5.28 m. The dip angle of the No. 2 coal seam varies from 5° to 15°, with the average value of 10°.

Figure 8 displays the typical geological column of the No. 1121 working face overlying strata. As it shown, the strata directly overlying the coal mainly consist of the silt and fine sandstones in the lower Zhiluo formation, which are considered as aquitards. The average total thickness of these strata is 52.2 m. According to the rock division rule aforementioned, the sandstone is considered to be hard rock, so the lithology type of the strata is quantified as 4. The first aquifer overlying the No. 2 coal seam is about 51.28 m distance from the coal. It consists of grit sandstones with great thickness, and it has a rich water-abundance property. Thus, in order to evaluate the risk of water inrushes from overlying the coal seam and take corresponding measures, it is necessary to precisely predict the height of the FWCZ. By applying the above generated SVM and RFR models to the No. 1121 working face of the No. 2 coal seam, the height of FWCZ is predicted as 64.17 m and 62.96 m, respectively.

Figure 8
figure 8

Typical geological column of the No. 1121 working face overlying strata.

Practical situation of the No. 1121 working face

When the No. 1121 working face moved forward about 56 m, a large amount of water from the overlying aquifer leaked into the working face. The maximum water inflow was up to 1817 m3/h, so the mining operation had to be terminated. For drawing up the water-inrush prevention measures scientifically, the borehole video camera system (BVCS) was used to observe the height of the fractured water-conducting zone. BVCS is an exploration technology which can directly observe the inner conditions of the boreholes based on the optics theory. The system can be used to observe the strata lithology, geological structures properties, fracture-zone development conditions, groundwater levels change and so on10.

In this study, the BVCS was used to observe the change of the fractures development degree with the increase of the borehole depth and determine the top boundaries of the fractured zone and the caved zone. Figure 9 shows the video camera images of the borehole HL-1. According to the images, the rocks above 279.07 m are sandstone and mudstone interbed, and they are relatively integrated except that a few small cracks in the horizontal direction appear in certain local positions (Fig. 9a).

Figure 9
figure 9

Video camera images of the borehole HL-1: (a) Integrate rock without fracture; (b) Fractured zone with various forms of fractures; (c) Caved zone.

Figure 9b shows the fractured zone images with various forms of fractures: a nearly vertical fracture with a small width appears at the borehole depth of 279.07–279.27 m; there are many abscission-layer phenomena between 286.3 m and 294.68 m; the crisscrossed fractures with large displacements are distributed in the rocks below 294 m. Therefore, according to the fractures development conditions described above, the depth of 279.07 m is considered as the top boundary of the fractured zone.

As Fig. 9c shown, the rocks below the depth of 298.9 m were damaged seriously, and there is a vast void area with obvious mining collapse characteristics. So the depth of 298.9 m is determined as the top boundary of the caved zone.

Based on the formula proposed by China Coal Industry Bureau20, the height of FWCZ of coal-roof strata can be calculated as follows:

$$H=H^{\prime} -h-M$$
(7)

where H is the maximum height of FWCZ (m); H′ is the depth of the coal-seam floor (m); h is the depth of the fractured zone’s top boundary (m); M is the thickness of the mining coal seam (m).

According to the drilling data and the video camera images of borehole HL-1, the depth of the No. 2 coal-floor is 345.88 m, and the thickness of the coal seam is 5.28 m. Therefore, combined with the Equation (7), the height of FWCZ is calculated to be H = (345.88–279.07–5.28) = 61.53 m. Table 4 shows the comparison between the field measured data and the prediction results obtained by the proposed methods.

Table 4 Comparison between the field measured data and the prediction results obtained by the SVM and RFR.

The results show that compared with the field measured data, the SVM and RFR methods have the relative error of 4.29% and 2.32%, respectively. It indicates that both of the prediction results are generally in good agreement with the field-observed result, and the RFR model has a better performance in the application of the study area, which is in accordance with the above conclusion.

Summary and Conclusions

To ensure the safe production of coal mines, this study proposed a prediction model of the height of FWCZ based on RFR. RFR is a robust machine learning method that can be used to evaluate the variable importance and predict the height of FWCZ. Compared with the traditional MLAs, RFR has numerous advantages, especially, its high prediction accuracy and it is well suitable for the problems with unclear priori knowledge and incomplete data. For the objective problems faced in this study, for instance, the lack of data samples, the RFR model can still maintain a high degree of accuracy. Then, the RFR model was applied to Hongliu Coal Mine in Northwest China. And the main conclusions are reached as follows.

  1. (1)

    Five variables were selected to construct the main controlling factors system. And according to the importance degree measurement by the mean decrease in Gini index and OOB mean decrease in accuracy, mining height and mining depth are the top two most important factors out of the five variables.

  2. (2)

    For comparison with the generated RFR model, a SVM model was also constructed using the same training datasets. By the validation of the two models, the RFR model has the lower RMSE and higher R2 with the value of 2.363 and 0.968, respectively (compared to 4.396 and 0.902 for SVM model).

  3. (3)

    The two generated models were applied to the No. 1121 working face in Hongliu coal mine to verify the effectiveness of the models. The prediction heights of the FWCZ by using RFR and SVM are 62.96 m and 64.17 m, respectively. Field measured data by borehole video camera system is 61.53 m, and the RFR and SVM have the relatively error of 2.32% and 4.29%, respectively. It is concluded that RFR has a better performance in the application of the study area compared with the SVM.

  4. (4)

    This study shows the potential to provide a novel approach to predict the height of FWCZ. The results provide a reference for water-inrush risk management, prevention and reduction in the study area.