Introduction

Coal is the main primary energy in China, accounting for more than 70% of the total energy consumption (Qiu et al. 2019; He et al. 2020a; Liu et al. 2020), which contributes to the rapid development of the social economy. However, coal mining causes many safety problems. Due to the influence of geological conditions and man-made operations, coal mines are prone to floods, fires and gas explosions during the mining process. Herein, floods caused the most direct economic losses and the second causalities with the second occurrence. In coal mine floods, accidents caused by the water inrush are particularly typical (Rui et al. 2018; Zhang et al. 2016, 2020c). In recent years, with the development of the coal mine disaster prediction technology, the occurrence of coal mine disasters and casualties is gradually decreasing. However, the situation is still bad that from January 2015 to December 2019, there were 223 coal mine disaster accidents, which lead to 1656 people deaths, including 52 water inrush accidents and 321 deaths (Qiu et al. 2020). Because of the differences in conditions of sedimentary deposition of coal seam, only some countries and regions are subject to the water inrush problems (Bai and Miao 2009; Miao et al. 2010; Yu et al. 2016; Liu et al. 2019; He et al. 2020b). Therefore, studies on water inrush problems have certain regional restrictions. In addition, the influence on water inrush problems is interacted by artificial inducements and disturbances from natural conditions, leading to the complexity in studying water inrush problems in coal mines.

A number of studies on water inrush prediction have been conducted previously, using data mining, neural network, support vector machine (SVM), extreme learning machine (ELM) and other technologies (Zhang et al. 2022a, b). Du et al. (2014) improved the decision tree model of the CART algorithm for data mining and used it to establish the water inrush prediction model, which solved the limitations of high computational workload and low accuracy of the original algorithm. Yan et al. (2008) designed a new method of top-down classification and construction of H-SVMs based on the maximum interval of SVM to predict the water inrush in coal mine. Wu and Zhou (2008) built a water inrush prediction model based on the vulnerability index method. Liu et al. (2009) applied a binary logistic regression model to analyze the influence of the main controlling factors on the water inrush from the coal seam floor. Li (2010) developed a coal mine floor water inrush prediction model using geographic information system (GIS). Liu et al. (2011) and Jin et al. (2011) proposed a real-time monitoring system for the water inrush problem in a coal mine and established a prediction model based on data mining classification technology. Wang et al. (2012) developed a water inrush prediction method based on rock mass limit equilibrium theory. Zhao et al. (2013) combined principal component analysis (PCA) and ELM to predict the water inrush. The above-mentioned studies improved the coal mine water inrush prediction model, but they still have some limitations: (1) Linear models such as Fisher model and Bayes model generally have low accuracy in discriminating multi-factor, nonlinear and discrete coal mine water inrush accidents (Zhang et al. 2021b). (2) Artificial neural networks (Zhang et al. 2020a), such as backpropagation neural networks (BPNN) and support vector machines (SVM), have problems of low convergence efficiency and easy to fall into local minima.

The ELM algorithm randomly generates the connection weight between the input layer and the hidden layer and the threshold of the hidden layer neurons, and no adjustment is required in the training process. The only optimal solution can be obtained by setting the number of hidden layer neurons. Compared with the previous traditional training methods, the ELM method has the advantages of fast learning and good generalization performance. However, the original ELM model had over-fitting problem, slow convergence speed and weight deviation caused by random parameter allocation, which affected the prediction accuracy of the model (Zhang et al. 2021a, c). In view of the shortcomings of deep neural network algorithm, it is the mainstream view to optimize and repair it. Among them, the simulated annealing particle swarm optimization (SAPSO) algorithm has the advantages of simple rules, parallel calculation, global optimization, easy implementation, high precision, fast convergence, etc., which has attracted the attention of the academic circle and demonstrated its advantages in solving practical problems.

Therefore, SAPSO algorithm was innovatively proposed in this paper to optimize the connection weight ω and threshold B in ELM, so as to accelerate the convergence of iterative calculation and avoid local optimization and overfitting problems. Considering that coal mine water inrush is a complex process affected by synergistic effects of various factors, this paper divides these factors into 5 main indicators and 14 branch factors based on long-term accumulated data of mine water inrush cases (as shown in Fig. 1). Then, factor analysis (FA) method was used to add comprehensive factor indexes to enrich the data and reduce the dimension of influencing factors of water inrush. The discriminant factor obtained from FA was used as the input variable of SAPSO-ELM model to create a new intelligent prediction model of coal mine water inrush, and the performance of the same data set was compared with that of the traditional algorithm, as described below.

Fig. 1
figure 1

The division of the influencing factors of coal mine water inrush (Zhang and Yang (2021))

Algorithm theory

Extreme learning machine

ELM algorithm is a new algorithm for single hidden layer feedforward neural network (see Fig. 2). The traditional feedforward neural network has defects such as slow training speed, easy to fall into local minimum point and sensitivity to initial selection (Zhang et al. 2020b, 2021a, c). However, since the ELM algorithm randomly generates the connection weight of the input layer and the hidden layer and the threshold of the hidden layer neuron, and no adjustment is required in the training process, only the number of hidden layer neuron is needed to obtain the unique optimal solution. Compared with the previous traditional training methods, the ELM method has the advantages of fast learning and good generalization performance. In terms of setting key parameters of ELM code, it is generally assumed that the input layer, hidden layer and output layer of a neural network respectively have \(n\), \(l\), \(m\) neurons and the activation function of the hidden layer is g(x) which is Sigmoid type; the mathematical model of ELM is noted as

$$t_{i} = \sum\limits_{i = 1}^{L} {\beta_{i} } g_{i} \left( {x_{i} } \right) = \sum\limits_{i = 1}^{L} {\beta_{i} } g_{i} \left( {\omega_{i} x_{j} + b_{i} } \right)$$
(1)

where \(x_{i} = \left( {x_{i1} ,x_{i1} , \cdot \cdot \cdot x_{in} } \right)^{T} \in R_{n}\) is the input vector;\(t_{i} = \left( {t_{{i{1}}} ,t_{{i{2}}} , \cdot \cdot \cdot t_{{i{\text{n}}}} } \right)^{T} \in R_{n}\) is the output vector;\(i = 1,2, \cdot \cdot \cdot ,N\);\(\left( {\omega_{1} ,\omega_{2} , \cdot \cdot \cdot ,\omega_{n} } \right)\) is the connection weight between i-th neuron in the input layer and neurons in the hidden layer;\(\beta_{i} = \left( {\beta_{1} ,\beta_{{2}} , \cdot \cdot \cdot \beta_{m} } \right)^{T}\) is the connection weight between i-th neuron in the hidden layer and neurons in the output layer, \(j = 1,2, \cdot \cdot \cdot ,N\); \(b_{i} = \left( {b_{1} ,b_{2} , \cdot \cdot \cdot ,b_{m} } \right)^{T}\) is i-th bias node in the hidden layer.

Fig. 2
figure 2

The structure of ELM model

The output of the neural network is

$$H\beta =T$$
(2)

where in matrix form is noted as

$$\begin{array}{*{20}l} {H = \left( {\begin{array}{*{20}c} {g\left( {\omega_{1} x_{1} + b_{1} } \right)} & {g\left( {\omega_{2} x_{1} + b_{2} } \right)} & \cdots & {g\left( {\omega_{L} x_{1} + b_{L} } \right)} \\ {g\left( {\omega_{1} x_{2} + b_{1} } \right)} & {g\left( {\omega_{2} x_{2} + b_{2} } \right)} & \cdots & {g\left( {\omega_{L} x_{2} + b_{L} } \right)} \\ \vdots & \vdots & {} & \vdots \\ {g\left( {\omega_{1} x_{N} + b_{1} } \right)} & {g\left( {\omega_{2} x_{N} + b_{2} } \right)} & \cdots & {g\left( {\omega_{L} x_{N} + b_{L} } \right)} \\ \end{array} } \right)NxL} \\ {\beta_{{\text{i}}} = \left( {\beta_{1}^{T} ,\beta_{2}^{T} , \cdot \cdot \cdot ,\beta_{L}^{T} } \right)^{T} } \\ {T = \left( {t_{1}^{T} ,t_{2}^{T} , \cdot \cdot \cdot ,t_{m}^{T} } \right)^{T} } \\ \end{array}$$
(3)

where H is the output matrix of the hidden layer in the neural network.

Simulated annealing particle swarm optimization algorithm (SAPSO)

Particle swarm optimization (PSO) is a swarm-based intelligent optimization algorithm (Cai et al. 2021). With using real numbers to solve and fewer parameters to adjust, PSO is kind of a universal global search algorithm. However, PSO is subject to falling into local extreme points and slow convergence in the later stage of evolution; the idea of simulated annealing is hereby introduced. Simulated annealing (SA), using a thermodynamic system in simulation and optimization problems, has the following properties (Yuchi et al. 2021). In early stage, SA with high temperature has a strong global search capability. As the iteration progresses, the temperature of SA decreases, and a fine search is conducted following the rule of Metropolis sampling with a probabilistic jumping distribution, which can effectively avoid falling into a local minimum solution.

Steps of optimized ELM with SAPSO

The main process of the SAPSO algorithm is particle swarm optimization. After introducing the solid annealing principle and combining the adaptive inertia weight adjustment strategy and the group fitness variance, the global and local search capabilities of the PSO algorithm are balanced. For randomly generated input weights and hidden layer thresholds from ELM, more neurons of hidden layer are required to result in an over-fitting problem. The fitness variance of population is used as the basis for judging the precocity of particle swarms, combined with SAPSO’s global search and the ability to jump out of local optimal solutions to dynamically optimize the input weights and hidden layer thresholds of ELM, so that ELM only needs less neurons in the hidden layer to achieve a rather good prediction, thereby, avoiding the algorithm falling into local extreme points and improving the speed and accuracy of convergence in later stage of evolution and the generalization of the network. To optimize the input layer weightand the hidden layer threshold of ELMwith SAPSO algorithm, more specific steps are described (in Fig. 3). More specific steps are as follows:

  • Step 1. Normalized transformation of the sample data; set the number of neurons and hidden nodes; select the activization function.

  • Step 2. Initialize model parameters, including the initial population size M, the initial temperature T and cooling rate \(\alpha\), the velocity and position of all particles in the target space, learning factors c1 and c2, inertia factor \(\omega\). Set the termination conditions of iteration, such as the maximum number of iterations K and the minimum error standard E. Minimum error is adopted as the iteration termination condition of this paper through error extreme value judgment; of course, there is manual participation in the judgment process of selection.

  • Step 3. Load the standardized training samples and then train according to Eqs. (2) and (3). Calculate the fitness function using the mean square error of the training sample error, according to Eq. (4).

    $$E = \frac{1}{N}\sum\limits_{i = 1}^{N} {\sqrt {\left( {y_{i} - y_{i} } \right)^{2} } }$$
    (4)
  • Step 4. In each iteration, a new solution x is generated, and the fitness function value of each particle is calculated. The increment is calculated in Eq. (5), where f(x) is the evaluation function.

    $$\Delta E = f^{K + 1} \left( {x_{i} } \right) - f^{k} \left( {x_{i} } \right)$$
    (5)
  • Step 5. If \(\Delta E < 0\), accept x as the new current solution and update the system temperature according to Eq. (7); otherwise, the temperature remains unchanged and \(T\) gradually decreases and tends to 0 after iteration.

    $$\exp \left( { - \frac{\Delta E}{T}} \right) > rand,\Delta E < 0$$
    (6)
    $$T^{K + 1} = \alpha T^{K}$$
    (7)

    where \(\alpha\) is cooling rate.

  • Step 6. Update the velocity and position of particles according to Eqs. (8) to (10), and adjust the adaptive inertia weight \(\omega\) and calculate the fitness function of each particle. Find out the current optimal position of all particles \(p_{id}\) and the global optimal position \(p_{gd}\).

    $$V^{\prime}_{id} = \omega V_{id} + c_{1} r_{1} \left( {p_{id} - X_{id} } \right) + c_{2} r_{2} \left( {p_{gd} - X_{gd} } \right)$$
    (8)
    $$X^{\prime}_{id} = X_{id} + V_{id}$$
    (9)
    $$\left\{ {\begin{array}{*{20}c} {v_{id} = v_{\max } } & {ifv_{id} \triangleright v_{\max } } \\ {v_{id} = v_{\min } i} & {fv_{id} \triangleleft v_{\min } } \\ \end{array} } \right.$$
    (10)
    $$\omega_{t} = \omega_{\max } - t\frac{{\omega_{max} - \omega_{\min } }}{T}$$
    (11)

    where \(i = 1,2,3, \cdots ,M;\) \(d = 1,2,3, \cdots ,D;\) \(\omega\) is the inertia weighting factor for adjusting the flying velocity of particles; \(x_{id}\) and \(v_{id}\) are the position and velocity of particle \(i\) in the \(d\)-th dimension.

  • Step 7. Determine whether the system meets the iteration termination condition. If it reaches, the iteration stops and \({P}_{gd}\) at this time is the optimal \(\left(\omega ,b\right)\), which is substituted into the ELM network for training. The output weight \(\beta\) and the actual output matrix T are calculated according to Eqs. (2) and (3); otherwise, continue the iteration from Step 3.

Fig. 3
figure 3

Optimization flow chart of SAPSO-ELM

Since particle swarm optimization (PSO) has the parameter adjustment mechanism of communication among particles, the training of given samples is as close as possible. The quality of the model can only be judged by the error of the prediction validation sample.

Data sources and preprocessing

Data sources

The process of coal mine water inrush is a complex dynamic system. This paper uses the water inrush data collected from the working face floor and measured normal mining data of a mining area in China where water inrush happened. The original data is collected at different times and locations and has not been sorted and grouped. A total of 180 groups of measured water inrush data in a coal mine were used as the total data set (partially shown in Table 1) to establish a coal mine water inrush prediction model. Among which, 80% (144 groups) were used as the training group, and the remaining 36 groups were used as test samples to test the learning effect of the model.

Table 1 Sample data of SAPSO-ELM

Pretreatment of influencing factors

Various factors control the process of water inrush in mining. The original data (influencing factors) need to be preprocessed to determine the correlation between coal mine indicators and water inrush and test whether dimension reduction can be carried out. All selected indicators were analyzed by Pearson in SPSS 17.0 (a data analysis software) to reduce information overlap among indicators. The correlation coefficients of statistic sig. function and t test were calculated by correlation matrix. The effects of these factors are described and explained as shown in Fig. 4 (Zhang and Yang 2021).

Fig. 4
figure 4

Description of influencing factors

The analysis results show that the absolute correlation coefficients between the two indicators in 14 indicators (X1–X14 in Table 1) are all greater than 0.7; Sig. function value is less than 0.01, and the correlation coefficient is not 0, indicating that these 14 indicators have overlapping influence on water inrush from coal floor. The input data of the network is the result of water inrush, which are classified as stable and water burst. The SAPSO-ELM algorithm is used to establish a coal mine water inrush prediction model. Based on this algorithm, how the selected indicators affect the water inrush was studied through training samples data.

Influencing factor analysis

Apparent correlation between 14 indicators was confirmed by Pearson correlation coefficient analysis. If these 14 indicators were set as input vectors of SAPSO-ELM prediction model directly, data redundancy may reduce the reliability of prediction model. Therefore, it is necessary to use factor analysis (FA) to improve the independence between discriminant indicators. To ensure the reliability of attribute selection, it is essential to check whether it is suitable for factor analysis before performing factor analysis. Kaiser–Meyer–Olkin (KMO) and Bartlett’s test on these 14 inner-correlated indicators were finished in SPSS, and results are shown in Table 2. The value of KMO is 0.843 indicating 14 indicators are correlated; the Sig. value of Bartlett’s test is 0.000 < 0.010. Both two tests suggested these 14 indicators are suitable for FA, and the purpose of dimensionality reduction can be achieved.

Table 2 KMO and Bartlett test

Results and discussion

SAPSO-ELM water inrush prediction model based on FA

A total of 144 sets of measured data of coal mine water inrush are used as training samples for the prediction model, and another 36 sets of sample data are used as test samples for the prediction model. In the coal mine water inrush forecasting, in order to reduce the information redundancy between indicators and improve the mutual independence, 14 correlated discriminant indicators were analyzed by FA, and the total variance of common factors was obtained, as shown in Table 3. It can be seen in Table 3 and Fig. 5 that the accumulation of the first 8 common factors accounts for 90.49% of total variance; therefore, these 8 components which can explain the variance of the original variables are determined to be common factors in the model.

Table 3 Total variance explaining table of common factor
Fig. 5
figure 5

Total variance explaining of common factor

The original factors are so comprehensive that cannot reflect the influence on water inrush independently. A loading factor matrix is rotated to maximize the variance, so that the number of high-loading variables on each factor is minimized, thereby reducing the comprehensiveness of factors and obtaining the rotated matrix (shown in Table 4). The magnitude of the absolute value of the numerical value in Table 4 indicates the degree of correlation between the public factor variable and the original variable. The public factor F1 reflects X1 and X2; F2 reflects X4 and X5; F3 reflects X6 and X7; F4 reflects X10 and X11; F5 reflects X3 and X8; F6 reflects X9; F7 reflects X12; F8 reflects X13 and X14. All values of these 8 factors in matrix were obtained from the regression method and shown in Table 1.

Table 4 Rotated component matrix

These 8 main discriminant factors: F1, F2, F3, F4, F5, F6, F7, F8 are the input vectors of the coal mine water inrush prediction model which is a SAPSO-ELM model established with FA. A training program of Matlab was used, and its basic parameters were initialized according to the experimental and trial calculations of two original models SAPSO and PSO, which are population size 20, maximum number of iterations 200, learning factors c1 = 2.6, c2 = 1.5, inertia weight \({\omega }_{max}=\) 0.9, \({\omega }_{min}\) =0.5, starting and ending temperature T0 = 200,T = 10, cooling rate \(\alpha =\) 0.9, and the allowable error is 0.01. A total of 144 sets of data from Table 1 were trained, and a nonlinear model of discriminant factors and water inrush was established.

The fitness curve of PSO-ELM and SAPSO-ELM was seen in Fig. 6. In PSO-ELM, after 160 times of iteration, system error became stable around 0.23 while in SAPSO-ELM was after 120 times and with a lower system error 0.10. Results showed that SAPSO-ELM has a faster optimization speed and higher convergence accuracy. The learning efficiency of ELM model is significantly improved by SAPSO.

Fig. 6
figure 6

Curve of the fitness values

Comparative analysis of SAPSO-ELM model

In order to verify the reliability of the SAPSO-ELM model in terms of coal mine water inrush, the back-substitution estimation method was used to conduct multiple test during the training of the model. The judgment of the model was consistent with the actual results of coal mine water inrush, and the misjudgment rate was 5.6%. To further verify the superiority of the SAPSO-ELM model, the prediction results of the commonly used SVM model and the BPNN are compared and the comparison results are shown in Figs. 7 and 8. Also, the coal mine water inrush prediction models based on ELM, PSO-ELM and SAPSO-ELM were compared from the perspective of running time and accuracy. The results are shown in Figs. 9 and 10.

Fig. 7
figure 7

Comparison of prediction results of different models

Fig. 8
figure 8

Comparison of measured data and predicted data in test data of different model

Fig. 9
figure 9

Comparison of the running time of different models

Fig. 10
figure 10

Accuracy comparison of the different models

From Figs. 7 and 8, the prediction result of SAPSO-ELM model was closer to the occurrence of coal mine water inrush in an actual mine. However, the remaining 2 prediction results from the other models are not as good as the SAPSO-ELM. The accuracy of the SVM model for the test sample prediction is 88.3%, and BPNN model is only 72.2%. The misjudgment case of SAPSO-ELM is 5.6% which is much better than SVM and BPNN. It can be obviously concluded that the SAPSO-ELM prediction model established by FA is more accurate and suitable for these intelligent prediction models of multi-factor complex water inrush of coal mine.

Similarly, the same conclusion can be drawn from the comparative analysis of Figs. 9 and 10. After the original parameters were reduced by FA method, the accuracy of ELM, PSO-ELM and SAPSO-ELM models was improved, and the running time was 0.112356 s, 0.243654 s and 0.203254 s, respectively, which were much faster than the original parameters. Therefore, the SAPSO-ELM model optimized by FA method is efficient and accurate in coal mine water inrush prediction. It is proved that the model can be further applied to more coal mine water inrush prediction, and a more universal water inrush prediction system can be trained in a larger case data system.

Conclusion

This paper proposes the SAPSO to optimize the connection weight \(\omega\) and threshold \(b\) in ELM. Subsequently, the SAPSO-ELM-based coal mine water inrush prediction model is established, where the factor analysis (FA) is conducted to enrich the data. To reduce the dimensionality of the influencing factors of water inrush, the discriminant factors are obtained from the FA, which are used as the input variables of SAPSO-ELM model. Therefore, the following conclusions can be drawn based on the above research results:

  1. (1)

    FA method is feasible for correlation analysis of multi-factor analysis. The analysis shows that the X1–X14 index has a strong correlation with water inrush in coal mine.

  2. (2)

    The comprehensive scoring factors F1, F2, F3, F4, F5, F6, F7 and F8 obtained by FA method as the input of SAPSO-ELM model can accelerate the convergence speed of SAPSO-ELM model.

  3. (3)

    Compared with SVM, BPNN, PSO-ELM and ELM, the SAPSO-ELM model established by FA has higher accuracy and faster convergence. SAPSO significantly improved the learning efficiency of ELM model.

Considering that the data set used for learning is mainly from a coal mine and the data volume is relatively small, the superiority of the proposed model is not fully reflected, but it still shows considerable application prospects. More data, more complex geological conditions and more case data are needed to improve the accuracy and efficiency of the prediction model for further research.