Introduction

Coal is a primary source of energy to China, and this situation will last for decades in the future. However, just because of the high production of coal, a series of disasters have occurred in the process of coal mining. Previous studies revealed that five kinds of disasters (water, fire, gas, dust and roof) have brought to mankind, huge economic losses or the death of people (Wu et al. 2013), and the reasons have been attributed to be the complicated geological conditions, the imperfect mining technology or, the subjective neglect of the management.

In response to this situation, two types of strategies have been applied for the prevention of water inrush in coalmines: (1) the active one, which means to predict the possibility of water inrush based on the understanding of the hydrogeological conditions of the coal mine; (2) the passive one, which means to do something to minimize the loss after the water inrush (e.g., draining out of the water and recovering coal production) (Xu and Bu 2015).

Except for the man-made factors, a series of natural factors have been identified responsible for the occurrence of water inrush accident in coal mines: e.g., the quantity of water in the aquifers, the channel for water flow, the hydro-connection between aquifers and the thickness of the water resisting layer (Gui and Chen 2007). And therefore, understanding the hydrological condition of the coalmine before the water inrush is the most fundamental work for the prevention of water inrush-related accidents in coalmines, and getting the reliable information about the source of inrush water is the most important work after water inrush.

Groundwater in the aquifers contains a series of information, including the water level, quantity, temperature, velocity, turbidity and hydrochemistry, among others (Gui and Chen 2007), which provides the possibility of the water source identification among different aquifers, and a large number of methods have been proposed (Liu et al. 2018; Shi et al. 2018; Wu et al. 2019; Wang et al. 2020). For example, the mixing ratio of water inrush sources in the Qidong coal mine, Anhui province, China, has been calculated by factor analysis, and results provide theoretical guidance for the prevention and control of water disasters (Sun and Gui 2015). Moreover, the source of inrush water had been identified by means of Q-model and principal component analysis in Zhaizhen coal mine, Shandong Province, China (Wang and Shi 2019). These studies confirmed that statistical analysis of groundwater chemistry is effective for identifying the inrush water sources in coal mines.

Roofing, gas, dust and other hazards have been fully studied in the Qingdong coal mine, northern Anhui province, China, but the source of inrush water has not received corresponding attention, which is a potential threat to coal production (Liu et al. 2017; Si et al. 2018). Therefore, in this study, forty-six groundwater samples have been collected from the representative aquifer systems in the Qingdong coal mine, and a series of statistical methods (including principal component, cluster and discriminant analysis, as well as the analysis of the Unmix model) have been applied to their major ion concentrations, for getting the information about: (1) identifying the different types of water–rock interactions for determining the chemical compositions of the groundwater; (2) quantifying the contributions of different types of water–rock interactions and (3) establish the water source identification model for the coalmine.

Materials and methods

Study area

The coalfield in the northern Anhui province is an important energy base in China with annual output of coal over 100 million tons. The Qingdong coalmine is one of the thirty coalmines in the region. It is situated 45 km west to the Suzhou City, northern Anhui Province, China (Fig. 1), and the location of the coalmine is 116°25′44″–116°34′44″ (longitude) and 33°36′34″–33°40′29″ (latitude). The study area belongs to the monsoon warm temperate semi-humid climate, with an average temperature of 14.6℃ over the years. The average annual precipitation is 811.8 mm, and the rainfall is mainly concentrated in July and August. The length of the mine is 3.5–6.5 km from south to north, and the width is 13 km from east to west, the total area of the mine is 51.7 km2.

Fig. 1
figure 1

Location of the Qingdong coal mine

The aquifer in the study area from shallow to deep can be divided into four systems: loose layer (LA), coal-bearing sandstone (CA), Taiyuan Formation limestone (TA) and Ordovician limestone (OA). Details of the four aquifer systems are as follows.

LA: The coalmine is covered by the loose layer sediments, including the Tertiary and Quaternary, the average thickness of the Tertiary and Quaternary are 154 m and 92 m, respectively. The main rock types in the LA including clay, sandy clay and calcareous clay. Multi-layer composite structure is observed in the loose layer, and four secondary aquifers were further identified. In these four aquifers, the fourth aquifer is highly water-rich and can be recharged from rainfall. Therefore, the water in the fourth aquifer is a threat for the safety of coal mining due to direct contact with the coal-bearing strata.

CA: Permian is the main coal-bearing strata in the area, and the main constituent in CA including mudstone, siltstone, sandstone (mainly feldspathic quartz sandstone) and coal seams, and three secondary aquifers can be subdivided. The water in this aquifer is stored primarily in sandstone fractures. In recent years, mining activities have disrupted the integrity of the formation and altered the original runoff environment, resulting in water being able to enter the coal mining tunnel by infiltration. The water in the crack of sandstone is the main threat.

TA and OA: Comparatively, the primary rock type in the TA and OA is limestone, to a lesser extent, the mudstone and sandstone. These two aquifers are highly water-rich, with high water pressure and poor runoff conditions without human activities. Therefore, the water in the karst cave is also a threat to the safety of coal mining. In summary, the water from the fourth aquifer, the crack of the sandstone and the karst cave of the limestone threaten the safety of coal mining activities, which is similar to the Renlou coal mine in the similar hydrological unit (Sun 2013).

Sampling and analysis

Forty-six groundwater samples were collected from four representative aquifer systems in the Qingdong coal mine: 7 from the LA, 16 from the CA, and 19 from the TA. LA samples were extracted from the wells using a pump, CA samples were collected directly in the coal mining tunnel and TA samples were collected from boreholes. Other four samples (UN1 and 2 were from the boreholes of the CA, and the UN3 and 4 were from the boreholes of the TA), have been defined as unknown source samples for verifying the effect of water source identification. The contents of the physico-chemical parameters and major ions in the water samples were measured as follows.

pH values were measured by a portable water analyzer with an accuracy of 0.01. TDS values were calculated by subtracting 0.5 times bicarbonate from the sum of all ion concentrations (Unit: mg/L). Concentrations of eight kinds of major ions (Na+, K+, Ca2+, Mg2+, Cl, SO42−, HCO3 and CO32−) were analyzed in this study: Na+, K+, Ca2+, Mg2+, Cl and SO42− were analyzed by ion chromatography, whereas the HCO3 and CO32− were analyzed by acid–base titration. Due to the low concentrations of K+, Na+ and K+ were merged and presented as (Na+ + K+). All of the analyses were conducted in the Engineering and Technological Research Center of Coal Exploration, Anhui Province, China.

After all ion concentration tests are completed, the AE (analytical errors) were checked by calculating the mass balance using the following formula.

$$AE = 100*\left( {\frac{meq/L,cations - meq/L,anions}{{meq/L,cations + meq/L,anions}}} \right)$$

With the exception of one water sample, which had an AE of 12%, the AEs of all other water samples were within −5% to + 5%, indicating that the results of the analyses were reasonable and credible (Nordstrom et al. 1989).

Statistical analyses

In this study, a series of statistical methods were applied to the hydrochemical data of the groundwater: (1) For getting the information about the water–rock interaction (mainly the source of major ions), the factor analysis was applied for getting qualitative information about the source of the major ions, whereas the US EPA (US Environmental Protection Agency) Unmix model was applied for getting quantitative information. (2) For the water source identification, cluster and discriminant analyses were applied along with the method related to the source contributions generated by the Unmix model. Detailed descriptions of these statistical methods are contained in the following text.

  1. 1.

    Correlation analysis: a multivariate statistical analysis method that can be used to evaluate the degree of correlation between two variables at significance levels. Previous studies have confirmed that two or more elements of significant correlation usually come from similar origins or are controlled by the same processes (Sunkari and Abu 2019; Zango et al. 2019).

  2. 2.

    Principal component analysis: an important statistical method that is not only used to simplify and classify the data, but also is used to find the most important variables in the data. During geochemical studies, factor analysis has long been used to track the sources of element (Yidana et al. 2018; Wang and Shi 2019; Gubran et al. 2019; Ramaroson et al. 2020).

  3. 3.

    Cluster analysis: it is a multivariate method applied to find the true groups of data or stations. In clustering, the objects are grouped in order to divide the similar objects into the same class. The method has been applied to the hydrochemical studies for a long time (Tay et al. 2015; Bencer et al. 2016; Chen et al. 2020).

  4. 4.

    Discriminant analysis: it uses linear or quadratic functions of the variables that are "best" divided into two or more predefined groups in a separate case. It is a simple and effective tool for the establishment of the inrush water source identification model (Sun and Gui 2015).

  5. 5.

    Unmix model analysis: it is a mathematical receptor model used for quantifying the sources of contaminants (Lang and Yang 2014; Liu and Sun 2015; Jain et al. 2018; Zhang et al. 2019). The source types are identified by comparing them to measured profiles, whereas the source contributions are used to determine how much each source contributed to a sample. The basic idea of the model is \(C_{m} = \, \sum X_{1} \times S_{1} + \, X_{2} \times S_{2} \ldots X_{n} \times S_{n}\). Where Cm is the measured concentration of sample, Sn is the profile contributed by any types of sources, Xn is the portion of Sn, and \(\left( {X_{n} \times S_{n} } \right) \, /\sum \left( {X_{n} \times S_{n} } \right)\) is the contribution of source n for the Cm. The hydrochemistry of the groundwater is also a multi-source contribution issue, the different types of water–rock interactions can be considered as sources, whereas the degrees of water–rock interaction can be considered as contributions.

Moreover, Unmix model analysis was performed by EPA Unmix software (version 6.0), and RStudio software (Team, Core R 2013) was employed to conduct the correlation analysis, principal component analysis, cluster analysis and discriminant analysis.

Results and discussion

Hydrochemistry

Major ion concentrations of the groundwater samples from the different aquifer systems are shown in Table 1 and Fig. 2. As shown in Table 1, the mean concentration of the major ions in the LA were ordered as follows: HCO3(370 mg/L) > SO42− (275 mg/L) > Na+ + K+ (223 mg/L) > Cl (97 mg/L) > Ca2+ (39 mg/L) > Mg2+ (38 mg/L) > CO32− (21 mg/L), whereas average values of the CA in decreasing order are SO42− (1000 mg/L), Na+ + K+ (647 mg/L), HCO3(334 mg/L), Cl (184 mg/L), Ca2+ (46 mg/L), Mg2+ (23 mg/L) and CO32− (17 mg/L), and the major ion concentrations of the TA have the following order: SO42− (1556 mg/L) > Na+ + K+ (384 mg/L) > Ca2+ (321 mg/L) > HCO3(258 mg/L) > Cl (225 mg/L) > Mg2+ (133 mg/L) > CO32− (1 mg/L).

Table 1 Major ion concentrations (mg/L) of the groundwater samples
Fig. 2
figure 2

Box plot of major ions and physico-chemical parameters contents of groundwater samples in different aquifers

As can be seen from Fig. 2, the Na+ + K+, Cl and SO42− contents of groundwater from the three aquifers were significantly different (p < 0.05, Fig. 2a, d and e), and the LA and CA samples had similar concentrations of Ca2+, Mg2+ and HCO3 (p < 0.05, Fig. 2b, c and f). In general, the groundwater samples from the CA have the highest mean concentrations of Na+ + K+ (647 mg/l) compared with other two aquifer systems, whereas the groundwater samples from the TA have highest mean concentrations of Ca2+ (321 mg/l), Mg2+ (133 mg/l), Cl (225 mg/l), and SO42− (1556 mg/l) and the groundwater samples from the LA have highest mean concentrations of HCO3 (370 mg/l) (Table 1). In common, the major ion concentrations in groundwater are closely related to the types of surrounding rock. Therefore, high content of Na+ + K+ in CA samples are associated with weathering of silicates, while high Ca2+ and Mg2+ contents in TA samples are associated with dissolution of carbonates and sulfates. Moreover, from shallow to deep, the concentration of HCO3 in groundwater shows a decreasing trend, while the concentrations of SO42− and Cl have an increasing trend, implying poor runoff conditions for CA and TA. As shown in Fig. 2g and h, the average TDS content in groundwater of LA, CA and TA increased from 884 mg/L, 2081 mg/L to 2751 mg/L, which means that the degree of water–rock interaction deepens with increasing aquifer depth, and the pH values ranged from 7.1 to 8.9, indicating a weak alkaline environment.

During the groundwater flows from the recharge to the discharge zone, the hydrochemical types will change from HCO3, SO42− to Cl (Tóth, 1999). And therefore, the classification of the hydrochemcial types is important for the understanding of the hydrological evolution of the groundwater system. Classification of groundwater in this study is based on the concentration of cations and anions by using the Piper diagram, and the results are shown in Fig. 3. The results indicate that Na-SO4 type is the most popular one, which occupies 71% of the LA, 100% of the CA and 42% of the TA samples, respectively. Moreover, the rest of the LA samples are Na-HCO3 type (29%), and the 47% and 11% of the CA groundwater samples are classified as Ca-SO4 and Mg-SO4 types, respectively. Comparatively, the UN1 and UN2 groundwater samples are classified into Na-SO4 types, whereas the UN3 and UN4 groundwater samples are classified into Mg-SO4 and Na-SO4 types. It can also be noticed from Fig. 3 that the UN1 and 2 samples are similar to the CA samples, whereas the UN3 and 4 samples are similar to the TA samples.

Fig. 3
figure 3

Piper diagram

Mechanism controlling groundwater chemistry

The Gibbs diagram (Gibbs 1970) can be applied to figure out the relationship between the chemical components of groundwater and the lithology from their respective aquifer (Sunkari et al. 2020; Yu et al. 2020). The three factors controlling groundwater chemistry can be classified by the diagram: precipitation, evaporation and water–rock interaction dominance. The formulations of Gibbs are Gibbs I = Cl/(Cl + HCO3) and Gibbs II = (Na+ + K+)/(Na+ + K+ + Ca2+) (unit in meq/l). In this study, Gibbs I values variation ranged from 0.05 to 0.97 and Gibbs II values variation ranged from 0.38 to 0.99. As can be seen from Fig. 4, all of the samples are located in the water–rock interaction and evaporation dominant areas, which indicated that the water–rock interaction and evaporation have played important roles in the groundwater system for controlling the groundwater chemistry. Meanwhile, from LA to TA, the TDS content in groundwater has an increasing trend, indicating that the degree of water–rock interaction increases with the increase of the aquifer depth.

Fig. 4
figure 4

Gibbs diagrams

Weathering of silicate minerals (feldspars), dissolution of halite and cation exchange reaction are generally considered to be the major sources of Na+ in groundwater (Zango et al. 2019). As can be seen in Fig. 5a and b, most of the water samples in the three aquifers are above the 1:1 line, which may be related to intense cation exchange reaction, because Na+ replacement of Ca2+ and Mg2+ will lead to Na+ enrichment in groundwater. Ca2+ and Mg2+ in groundwater are associated with dissolution of carbonate and sulfate and reverse ion exchange. As shown in Fig. 5c, the (Ca2+ + Mg2+)/(HCO3 + SO42−) values in the groundwater of the three aquifers were different, the TA samples were close to the 1:1 line and the concentrations of (Ca2+ + Mg2+) and (HCO3 + SO42−) were significantly higher than those in the LA and CA samples, indicating the influence of carbonate and sulfate dissolution in TA. Moreover, (Ca2+ + Mg2+)/(HCO3 + SO42−) < 1 in all samples indicates the widespread presence of cation exchange reaction in the three aquifers, especially for CA, which is also consistent with the results of Fig. 5a and b. SO42− in groundwater is usually derived from the dissolution of sulfate minerals and the oxidation of sulfides. The high SO42− content in CA samples relative to Ca2+ and Mg2+ was due to the combined effects of ion-exchange reaction and oxidation of sulfides (Fig. 5c and d). In the study area, there are a large amount of pyrite in CA, and coal mining activities have led to long-term exposure of the CA to air, which provides favorable conditions for the oxidation of pyrite and then leads to the enrichment of SO42− in groundwater (4FeS2 + 13O2 + 2H2O → 4FeSO4 + 2H2SO4 + 2SO2↑).

Fig. 5
figure 5

Major ion relationship in groundwater

Source of major ions

Correlation analysis

In this study, the Spearman correlation method was chosen, which is insensitive to outliers in the data, and the analysis results are shown in Table 2. As shown in Table 2, Ca2+, Mg2+, Cl, and SO42− have significant positive correlations with each other, which can be explained by the dissolution of chlorides and sulfate minerals, because they can release Ca2+, Mg2+, Cl and SO42− into the water. Furthermore, the significant negative correlation between Na+ + K+ and Mg2+ is usually related to the the ion-exchange reaction (Table 2).

Table 2 Spearman correlation of the major ions in groundwater (** represent significant correlation at the level of 0.01)

Principal component analysis

In this study, two principal components were obtained based on the initial eigenvalue higher than one, and the results was shown in Table 3 and Fig. 6. As can be seen from the table, the PC1, which accounts for 52.6% of the total variance explained, has high positive loadings of Ca2+, Mg2+, Cl and SO42−, but high negative loadings of HCO3, the second component with 23.6% of the total variance explained, has high positive loadings of Na+ + K+ and negative loadings of Mg2+. Based on the investigations of previous studies, the main processes controlling the groundwater chemistry in the study area include weathering of silicate minerals, ion exchanged reaction, dissolution of carbonate, sulfate and chloride minerals (Sun and Gui 2015). Therefore, the PC2 can be explained to be the weathering of silicate minerals and ion-exchange reaction, which can lead to high concentration of Na+ + K+ in water, whereas the PC1 is related to the dissolution of chlorides and sulfate minerals, because they can release Ca2+, Mg2+, Cl and SO42− into the water. It is also observed in Fig. 6 that the groundwater of the three aquifers has significantly different scores on PC1 and PC2 (p < 0.05): the LA samples have the lowest PC1 and PC2 scores, and the CA samples have the highest PC2 scores, whereas the TA samples have the highest PC1 scores. Such a phenomenon indicates that groundwater samples from different aquifer systems have undergone different types and degrees of water–rock interactions: CA-dominated by the weathering of silicates and ion-exchange reaction; TA-dominated by the dissolution of chloride and sulfate; LA-low degrees of water–rock interaction relative to the CA and TA.

Fig. 6
figure 6

Principal component scores of groundwater samples in different aquifers

Table 3 Results of principal component analysis

The Unmix model approach

Based on the calculation of the Unmix model, two sources have been identified and the results are listed in Table 4 and shown in Fig. 7. As can be seen in Table 4, the two sources have Min Rsq = 0.98 and Min Sig/Noise = 3.72, higher than the minimum standard of the model (Min Rsq > 0.8 and Min Sig/Noise > 2), indicating that the modeling is invalid (Ai et al. 2014). As suggested in the above text, the weathering of silicate minerals, dissolution of carbonate and evaporate minerals are the main processes controlling the groundwater chemistry in the coal mining related strata (Sun and Gui 2015). Therefore, the Source 1 dominated by Na+ + K+, HCO3 and Cl should be explained to be the weathering of silicate minerals and dissolution of chloride, and the source 2 dominated by Ca2+, Mg2+ and SO42− can be explained to be the dissolution of evaporate minerals (sulfates). Such considerations can be demonstrated by the contributions of these sources (Fig. 7), as the samples from the LA and CA have highest contributions from the source 1 (> 60%), consistent with the assertion that this aquifer system is richer in elastic rocks (silicate minerals are the main phases) relative to the TA, which was mainly contributed by the source 2 (> 50%). As to the samples with unknown sources, the UN1 and 2 have high contributions from the source 1 (similar to the LA and CA samples), whereas the UN3 and 4 have high contributions from the source 2 (similar to the TA samples).

Table 4 Table 4
Fig. 7
figure 7

source contributions. Samples 1–7, 8–23 and 24–42 are the LA, CA and TA samples, respectively, whereas samples and 43–46 are UN samples

Variations of

Source identification of inrush water

Cluster analysis

In this study, all the samples were analyzed by Q-model cluster analysis and the "Ward" linkage and "Euclidean" distance were chosen to obtain better results (Fovell and Fovell 1993; Liu et al. 2015; Chen et al. 2020). As can be seen from Fig. 8, three groups can be identified: all the samples from the LA and two samples from CA have been classified into the Group 1, and most of the samples from the CA have been classified into the Group 2 (except for the CA4 and CA11), whereas all the TA samples have been classified into the Group 3. Therefore, only two samples (CA4 and CA11) have been classified into the wrong category, and the accuracy rate is 95.7%. It can also be noticed from the figure that the UN1 and UN2 samples are classified to be the Group 2, and the UN3 and UN4 are classified to be the Group 3, similar to the results obtained from the plots of factor scores (Fig. 6) and the variations of source contributions (Fig. 7).

Fig. 8
figure 8

Plot of Q-mode cluster analysis

Discriminant analysis

In this study, the LA, CA and TA groundwater samples are assigned to be 1, 2 and 3, respectively, whereas the UN samples are signed to be “blank.” The classical method has been chosen for calculation and the results are shown in Table 5 and shown in Fig. 9. As can be seen from the table, all of the samples with known sources except for one from the CA (18) have been classified to be the right category with high probability (92–100%), and the source-known samples have been identified correctly to be 97.6% (41/42), and the discriminant functions are as follows (unit in mg/l):

  1. 1.

    Function 1 = 0.001 × (Na+ + K+) + 0.002 × Ca2+ − 0.039 × Mg2+ − 0.013 × Cl − 0.002 × SO42− − 0.001 × HCO3 + 7.120;

  2. 2.

    Function 2 = 0.011 × (Na+ + K+) − 0.001 × Ca2+ + 0.018 × Mg2+ + 0.014 × Cl − 0.002 × SO42− + 0.001 × HCO3 − 7.393.

Table 5 Results of discriminant analysis
Fig. 9
figure 9

Discriminant diagram

Based on the discrimination model established by the known samples, the source of the UN1 and 2 samples are classified to be the CA with 78% and 100% probabilities, respectively, and the source of the UN3 and 4 samples are classified to be the TA with 100% and 87% probabilities, respectively, similar to the results of cluster analysis.

Conclusions

Based on the statistical analyses of major ion concentrations of groundwater samples collected from three representative aquifer systems in the Qingdong coal mine, northern Anhui province, China, the following conclusions have been obtained: (1) The major ion concentrations of the groundwater samples from different aquifer systems are different with each other, which suggests that the types and degrees of water–rock interactions in these three aquifer systems are different. (2) Principal component analysis has identified two sources responsible for the chemical variations of the groundwater: weathering of silicate minerals and dissolution of evaporates, and their contribution ratios for the groundwater samples from different aquifer systems are different with each other, which have been quantified by Unmix model. (3) Based on the Q-mode cluster and discriminant analyses, the samples with known sources have been identified correctly to be 95.7% and 97.6%, respectively, and the samples with unknown sources have been determined with high probability (78–100%).