Statistical analyses of groundwater chemistry in the Qingdong coalmine, northern Anhui province, China: implications for water–rock interaction and water source identification

Hydrochemistry of groundwater is important in coal mines because it can be used for understanding water–rock interaction and inrush water source identification. In this study, major ion concentrations of groundwater samples from the loose layer aquifer (LA), coal-bearing aquifer (CA) and Taiyuan Formation limestone aquifer (TA) in the Qingdong coal mine, northern Anhui province, China, have been analyzed by a series of statistical methods for identifying the source of chemical constituents in groundwater and the source of inrush water. The results indicate that the mean concentration of the major ions in the LA were ordered as follows: HCO3− > SO42− > Na+ + K+  > Cl− > Ca2+ > Mg2+ > CO32−, whereas average values of the CA in decreasing order are SO42−, Na+ + K+, HCO3−, Cl−, Ca2+, Mg2+ and CO32−, and the major ion concentrations of the TA have the following order: SO42− > Na+ + K+ > Ca2+  > HCO3− > Cl− > Mg2+ > CO32−, and most of the samples are Na-SO4 and Ca-SO4 types. TDS content in water increases with aquifer depth, whereas the pH values ranged from 7.1 to 8.9, indicating a weak alkaline environment. Two sources (weathering of silicate minerals and dissolution of evaporate minerals) have been identified by principal component analysis responsible for the chemical variations of the groundwater, and their contribution ratios have been quantified by Unmix model. Moreover, based on the Q-mode cluster and discriminant analyses, the samples with known sources have been identified correctly to be 95.7% and 97.6%, respectively, and the samples with unknown sources have been determined with high probability (78–100%).


Introduction
Coal is a primary source of energy to China, and this situation will last for decades in the future. However, just because of the high production of coal, a series of disasters have occurred in the process of coal mining. Previous studies revealed that five kinds of disasters (water, fire, gas, dust and roof) have brought to mankind, huge economic losses or the death of people (Wu et al. 2013), and the reasons have been attributed to be the complicated geological conditions, the imperfect mining technology or, the subjective neglect of the management.
In response to this situation, two types of strategies have been applied for the prevention of water inrush in coalmines: (1) the active one, which means to predict the possibility of water inrush based on the understanding of the hydrogeological conditions of the coal mine; (2) the passive one, which means to do something to minimize the loss after the water inrush (e.g., draining out of the water and recovering coal production) (Xu and Bu 2015).
Except for the man-made factors, a series of natural factors have been identified responsible for the occurrence of water inrush accident in coal mines: e.g., the quantity of water in the aquifers, the channel for water flow, the hydro-connection between aquifers and the thickness of the water resisting layer (Gui and Chen 2007). And therefore, understanding the hydrological condition of the coalmine before the water inrush is the most fundamental work for the prevention of water inrush-related accidents in coalmines, and getting the reliable information about the source of inrush water is the most important work after water inrush.
Groundwater in the aquifers contains a series of information, including the water level, quantity, temperature, velocity, turbidity and hydrochemistry, among others (Gui and Chen 2007), which provides the possibility of the water source identification among different aquifers, and a large number of methods have been proposed (Liu et al. 2018;Shi et al. 2018;Wu et al. 2019;Wang et al. 2020). For example, the mixing ratio of water inrush sources in the Qidong coal mine, Anhui province, China, has been calculated by factor analysis, and results provide theoretical guidance for the prevention and control of water disasters (Sun and Gui 2015). Moreover, the source of inrush water had been identified by means of Q-model and principal component analysis in Zhaizhen coal mine, Shandong Province, China (Wang and Shi 2019). These studies confirmed that statistical analysis of groundwater chemistry is effective for identifying the inrush water sources in coal mines.
Roofing, gas, dust and other hazards have been fully studied in the Qingdong coal mine, northern Anhui province, China, but the source of inrush water has not received corresponding attention, which is a potential threat to coal production (Liu et al. 2017;Si et al. 2018). Therefore, in this study, forty-six groundwater samples have been collected from the representative aquifer systems in the Qingdong coal mine, and a series of statistical methods (including principal component, cluster and discriminant analysis, as well as the analysis of the Unmix model) have been applied to their major ion concentrations, for getting the information about: (1) identifying the different types of water-rock interactions for determining the chemical compositions of the groundwater; (2) quantifying the contributions of different types of water-rock interactions and (3) establish the water source identification model for the coalmine.

Study area
The coalfield in the northern Anhui province is an important energy base in China with annual output of coal over 100 million tons. The Qingdong coalmine is one of the thirty coalmines in the region. It is situated 45 km west to the Suzhou City, northern Anhui Province, China (Fig. 1), and the location of the coalmine is 116°25′44″-116°34′44″ (longitude) and 33°36′34″-33°40′29″ (latitude). The study area belongs to the monsoon warm temperate semi-humid climate, with an average temperature of 14.6℃ over the years. The average annual precipitation is 811.8 mm, and the rainfall is mainly concentrated in July and August. The length of the mine is 3.5-6.5 km from south to north, and the width is 13 km from east to west, the total area of the mine is 51.7 km 2 .
The aquifer in the study area from shallow to deep can be divided into four systems: loose layer (LA), coal-bearing sandstone (CA), Taiyuan Formation limestone (TA) and Ordovician limestone (OA). Details of the four aquifer systems are as follows.
LA: The coalmine is covered by the loose layer sediments, including the Tertiary and Quaternary, the average thickness of the Tertiary and Quaternary are 154 m and 92 m, respectively. The main rock types in the LA including clay, sandy clay and calcareous clay. Multi-layer composite structure is observed in the loose layer, and four secondary aquifers were further identified. In these four aquifers, the fourth aquifer is highly water-rich and can be recharged from rainfall. Therefore, the water in the fourth aquifer is a threat for the safety of coal mining due to direct contact with the coal-bearing strata.
CA: Permian is the main coal-bearing strata in the area, and the main constituent in CA including mudstone, siltstone, sandstone (mainly feldspathic quartz sandstone) and coal seams, and three secondary aquifers can be subdivided. The water in this aquifer is stored primarily in sandstone fractures. In recent years, mining activities have disrupted the integrity of the formation and altered the original runoff environment, resulting in water being able to enter the coal mining tunnel by infiltration. The water in the crack of sandstone is the main threat.
TA and OA: Comparatively, the primary rock type in the TA and OA is limestone, to a lesser extent, the mudstone and sandstone. These two aquifers are highly water-rich, with high water pressure and poor runoff conditions without human activities. Therefore, the water in the karst cave is also a threat to the safety of coal mining. In summary, the water from the fourth aquifer, the crack of the sandstone and the karst cave of the limestone threaten the safety of coal mining activities, which is similar to the Renlou coal mine in the similar hydrological unit (Sun 2013).

Sampling and analysis
Forty-six groundwater samples were collected from four representative aquifer systems in the Qingdong coal mine: 7 from the LA, 16 from the CA, and 19 from the TA. LA samples were extracted from the wells using a pump, CA samples were collected directly in the coal mining tunnel and TA samples were collected from boreholes. Other four samples (UN1 and 2 were from the boreholes of the CA, and the UN3 and 4 were from the boreholes of the TA), have been defined as unknown source samples for verifying the effect of water source identification. The contents of the physico-chemical parameters and major ions in the water samples were measured as follows. pH values were measured by a portable water analyzer with an accuracy of 0.01. TDS values were calculated by subtracting 0.5 times bicarbonate from the sum of all ion concentrations (Unit: mg/L). Concentrations of eight kinds of major ions (Na + , K + , Ca 2+ , Mg 2+ , Cl − , SO 4 2− , HCO 3 − and CO 3 2− ) were analyzed in this study: Na + , K + , Ca 2+ , Mg 2+ , Cl − and SO 4 2− were analyzed by ion chromatography, whereas the HCO 3 − and CO 3 2− were analyzed by acid-base titration. Due to the low concentrations of K + , Na + and K + were merged and presented as (Na + + K + ). All of the analyses were conducted in the Engineering and Technological Research Center of Coal Exploration, Anhui Province, China.
After all ion concentration tests are completed, the AE (analytical errors) were checked by calculating the mass balance using the following formula.
With the exception of one water sample, which had an AE of 12%, the AEs of all other water samples were within −5% to + 5%, indicating that the results of the analyses were reasonable and credible (Nordstrom et al. 1989).

Statistical analyses
In this study, a series of statistical methods were applied to the hydrochemical data of the groundwater: (1) For getting the information about the water-rock interaction (mainly the source of major ions), the factor analysis was applied for getting qualitative information about the source of the major ions, whereas the US EPA (US Environmental Protection Agency) Unmix model was applied for getting quantitative information.
(2) For the water source identification, cluster and discriminant analyses were applied along with the method related to the source contributions generated by the Unmix model. Detailed descriptions of these statistical methods are contained in the following text.
1. Correlation analysis: a multivariate statistical analysis method that can be used to evaluate the degree of correlation between two variables at significance levels. Previous studies have confirmed that two or more elements of significant correlation usually come from similar origins or are controlled by the same processes (Sunkari and Abu 2019; Zango et al. 2019).

Principal component analysis: an important statistical
method that is not only used to simplify and classify the data, but also is used to find the most important variables in the data. During geochemical studies, factor analy-AE = 100 * meq∕L, cations − meq∕L, anions meq∕L, cations + meq∕L, anions sis has long been used to track the sources of element (Yidana et al. 2018;Wang and Shi 2019;Gubran et al. 2019;Ramaroson et al. 2020). 3. Cluster analysis: it is a multivariate method applied to find the true groups of data or stations. In clustering, the objects are grouped in order to divide the similar objects into the same class. The method has been applied to the hydrochemical studies for a long time (Tay et al. 2015;Bencer et al. 2016;Chen et al. 2020). 4. Discriminant analysis: it uses linear or quadratic functions of the variables that are "best" divided into two or more predefined groups in a separate case. It is a simple and effective tool for the establishment of the inrush water source identification model (Sun and Gui 2015). 5. Unmix model analysis: it is a mathematical receptor model used for quantifying the sources of contaminants (Lang and Yang 2014;Liu and Sun 2015;Jain et al. 2018;Zhang et al. 2019). The source types are identified by comparing them to measured profiles, whereas the source contributions are used to determine how much each source contributed to a sample. The basic idea of the model is Where C m is the measured concentration of sample, S n is the profile contributed by any types of sources, X n is the portion of Sn, and is the contribution of source n for the C m . The hydrochemistry of the groundwater is also a multi-source contribution issue, the different types of water-rock interactions can be considered as sources, whereas the degrees of waterrock interaction can be considered as contributions. Moreover, Unmix model analysis was performed by EPA Unmix software (version 6.0), and RStudio software (Team, Core R 2013) was employed to conduct the correlation analysis, principal component analysis, cluster analysis and discriminant analysis.

Hydrochemistry
Major ion concentrations of the groundwater samples from the different aquifer systems are shown in Table 1 and Fig. 2. As shown in Table 1, the mean concentration of the major ions in the LA were ordered as follows:  (Table 1). In common, the major ion concentrations in groundwater are closely related to the types of surrounding rock. Therefore, high content of Na + + K + in CA samples are associated with weathering of silicates, while high Ca 2+ and Mg 2+ contents in TA samples are associated with dissolution of carbonates and sulfates. Moreover, from shallow to deep, the concentration of HCO 3 − in groundwater shows a decreasing trend, while the concentrations of SO 4 2− and Cl − have an increasing trend, implying poor runoff conditions for CA and TA.
As shown in Fig. 2g and h, the average TDS content in groundwater of LA, CA and TA increased from 884 mg/L, 2081 mg/L to 2751 mg/L, which means that the degree of water-rock interaction deepens with increasing aquifer depth, and the pH values ranged from 7.1 to 8.9, indicating a weak alkaline environment. During the groundwater flows from the recharge to the discharge zone, the hydrochemical types will change from HCO 3 − , SO 4 2− to Cl − (Tóth, 1999). And therefore, the classification of the hydrochemcial types is important for the understanding of the hydrological evolution of the groundwater system. Classification of groundwater in this study is based on the concentration of cations and anions by using the Piper diagram, and the results are shown in Fig. 3. The results indicate that Na-SO 4 type is the most popular one, which occupies 71% of the LA, 100% of the CA and 42% of the TA samples, respectively. Moreover, the rest of the LA samples are Na-HCO 3 type (29%), and the 47% and 11% of the CA groundwater samples are classified as Ca-SO 4 and Mg-SO 4 types, respectively. Comparatively, the UN1 and UN2 groundwater samples are classified into Na-SO 4 types, whereas the UN3 and UN4 groundwater samples are classified into Mg-SO 4 and Na-SO 4 types. It can also be noticed from Fig. 3 that the UN1 and 2 samples are similar to the CA samples, whereas the UN3 and 4 samples are similar to the TA samples.

Mechanism controlling groundwater chemistry
The Gibbs diagram (Gibbs 1970) can be applied to figure out the relationship between the chemical components of groundwater and the lithology from their respective aquifer (Sunkari et al. 2020;Yu et al. 2020). The three factors controlling groundwater chemistry can be classified by the diagram: precipitation, evaporation and water-rock interaction dominance. The formulations of Gibbs are Gibbs I = Cl − /(Cl − + HCO 3 − ) and Gibbs II = (Na + + K + )/ (Na + + K + + Ca 2+ ) (unit in meq/l). In this study, Gibbs I values variation ranged from 0.05 to 0.97 and Gibbs II values variation ranged from 0.38 to 0.99. As can be seen from Fig. 4, all of the samples are located in the water-rock interaction and evaporation dominant areas, which indicated that the water-rock interaction and evaporation have played important roles in the groundwater system for controlling the groundwater chemistry. Meanwhile, from LA to TA, the TDS content in groundwater has an increasing trend, indicating that the degree of water-rock interaction increases with the increase of the aquifer depth.
Weathering of silicate minerals (feldspars), dissolution of halite and cation exchange reaction are generally considered to be the major sources of Na + in groundwater (Zango et al. 2019). As can be seen in Fig. 5a and b, most of the water samples in the three aquifers are above the 1:1 line,

Fig. 2 Box plot of major ions and physico-chemical parameters contents of groundwater samples in different aquifers
which may be related to intense cation exchange reaction, because Na + replacement of Ca 2+ and Mg 2+ will lead to Na + enrichment in groundwater. Ca 2+ and Mg 2+ in groundwater are associated with dissolution of carbonate and sulfate and reverse ion exchange. As shown in Fig. 5c, the (Ca 2+ + Mg 2+ )/(HCO 3 − + SO 4 2− ) values in the groundwater of the three aquifers were different, the TA samples were close to the 1:1 line and the concentrations of (Ca 2+ + Mg 2+ ) and (HCO 3 − + SO 4 2− ) were significantly higher than those in the LA and CA samples, indicating the influence of carbonate and sulfate dissolution in TA. Moreover, (Ca 2+ + Mg 2+ )/ (HCO 3 − + SO 4 2− ) < 1 in all samples indicates the widespread presence of cation exchange reaction in the three aquifers, especially for CA, which is also consistent with the results of Fig. 5a and b. SO 4 2− in groundwater is usually derived from the dissolution of sulfate minerals and the oxidation of sulfides. The high SO 4 2− content in CA samples relative to Ca 2+ and Mg 2+ was due to the combined effects of ionexchange reaction and oxidation of sulfides ( Fig. 5c and d).
In the study area, there are a large amount of pyrite in CA, and coal mining activities have led to long-term exposure of the CA to air, which provides favorable conditions for the oxidation of pyrite and then leads to the enrichment of SO 4 2− in groundwater (4F eS 2 + 13O 2 + 2H 2 O → 4FeS O 4 + 2H 2 S O 4 + 2SO 2 ↑).

Correlation analysis
In this study, the Spearman correlation method was chosen, which is insensitive to outliers in the data, and the analysis results are shown in Table 2. As shown in Table 2, Ca 2+ , Mg 2+ , Cl − , and SO 4 2− have significant positive correlations with each other, which can be explained by the dissolution of chlorides and sulfate minerals, because they can release Ca 2+ , Mg 2+ , Cl − and SO 4 2− into the water. Furthermore, the significant negative correlation between Na + + K + and Mg 2+ is usually related to the the ion-exchange reaction ( Table 2).

Principal component analysis
In this study, two principal components were obtained based on the initial eigenvalue higher than one, and the results was shown in Table 3 and Fig. 6. As can be seen from the table, the PC1, which accounts for 52.6% of the total variance explained, has high positive loadings of Ca 2+ , Mg 2+ , Cl − and SO 4 2− , but high negative loadings of HCO 3 − , the second component with 23.6% of the total variance explained, has high positive loadings of Na + + K + and negative loadings of Mg 2+ . Based on the investigations of previous studies, the main processes controlling the groundwater chemistry in the study area include weathering of silicate minerals, ion exchanged reaction, dissolution of carbonate, sulfate and chloride minerals (Sun and Gui 2015). Therefore, the PC2 can be explained to be the weathering of silicate minerals and ion-exchange reaction, which can lead to high concentration of Na + + K + in water, whereas the PC1 is related to the dissolution of chlorides and sulfate minerals, because they can release Ca 2+ , Mg 2+ , Cl − and SO 4 2− into the water. It is also observed in Fig. 6 that the groundwater of the three aquifers has significantly different scores on PC1 and PC2 (p < 0.05): the LA samples have the lowest PC1 and PC2 scores, and the CA samples have the highest PC2 scores, whereas the TA samples have the highest PC1 scores. Such a phenomenon indicates that groundwater samples from different aquifer systems have undergone different types and degrees of water-rock interactions: CA-dominated by the weathering of silicates and ion-exchange reaction; TAdominated by the dissolution of chloride and sulfate; LA-low degrees of water-rock interaction relative to the CA and TA.

The Unmix model approach
Based on the calculation of the Unmix model, two sources have been identified and the results are listed in Table 4 and shown in Fig. 7. As can be seen in Table 4, the two sources have Min Rsq = 0.98 and Min Sig/Noise = 3.72, higher than the minimum standard of the model (Min Rsq > 0.8 and Min Sig/Noise > 2), indicating that the modeling is invalid (Ai et al. 2014). As suggested in the above text, the weathering of silicate minerals, dissolution of carbonate and evaporate minerals are the main processes controlling the groundwater chemistry in the coal mining related strata (Sun and Gui 2015). Therefore, the Source 1 dominated by Na + + K + , HCO 3 − and Cl − should be explained to be the weathering of silicate minerals and dissolution of chloride, and the source 2 dominated by Ca 2+ , Mg 2+ and SO 4 2− can be explained to be the dissolution of evaporate minerals (sulfates). Such considerations can be demonstrated by the contributions of these sources (Fig. 7), as the samples from the LA and CA have highest contributions from the source 1 (> 60%), consistent with the assertion that this aquifer system is richer in elastic rocks (silicate minerals are the main phases) relative to the TA, which was mainly contributed by the source 2 (> 50%). As to the samples with unknown sources, the UN1 and 2 have high contributions from the source 1 (similar to the LA and CA samples), whereas the UN3 and 4 have high contributions from the source 2 (similar to the TA samples).

Cluster analysis
In this study, all the samples were analyzed by Q-model cluster analysis and the "Ward" linkage and "Euclidean" distance were chosen to obtain better results (Fovell and Fovell 1993;Chen et al. 2020). As can be seen from Fig. 8, three groups can be identified: all the samples from the LA and two samples from CA have been classified into the Group 1, and most of the samples from the CA have been classified into the Group 2 (except for the CA4 and CA11), whereas all the TA samples have been classified into the Group 3. Therefore, only two samples (CA4 and CA11) have been classified into the wrong category, and the accuracy rate is 95.7%. It can also be noticed from the figure that the UN1 and UN2 samples are classified to be the Group 2, and the UN3 and UN4 are classified to be the Group 3, similar to the results obtained from the plots of factor scores (Fig. 6) and the variations of source contributions (Fig. 7).

Discriminant analysis
In this study, the LA, CA and TA groundwater samples are assigned to be 1, 2 and 3, respectively, whereas the UN samples are signed to be "blank." The classical method has been chosen for calculation and the results are shown in Table 5 and shown in Fig. 9. As can be seen from the table, all of the samples with known sources except for one from the CA (18) have been classified to be the right category with high probability (92-100%), and the source-known samples have been identified correctly to be 97.6% (41/42), and the discriminant functions are as follows (unit in mg/l): Based on the discrimination model established by the known samples, the source of the UN1 and 2 samples are classified to be the CA with 78% and 100% probabilities, respectively, and the source of the UN3 and 4 samples are classified to be the TA with 100% and 87% probabilities, respectively, similar to the results of cluster analysis.

Conclusions
Based on the statistical analyses of major ion concentrations of groundwater samples collected from three representative aquifer systems in the Qingdong coal mine, northern Anhui province, China, the following conclusions have been obtained: (1) The major ion concentrations of the groundwater samples from different aquifer systems are different with each other, which suggests that the types and degrees (2) Principal component analysis has identified two sources responsible for the chemical variations of the groundwater: weathering of silicate minerals and dissolution of evaporates, and their contribution ratios for the groundwater samples from different aquifer systems are different with each other, which have been quantified by Unmix model. (3) Based on the Q-mode cluster and discriminant analyses, the samples with known sources have been identified correctly to be 95.7% and 97.6%, respectively, and the samples with unknown sources have been determined with high probability (78-100%).  LA  100  24  TA  TA  100  2  LA  LA  100  25  TA  TA  100  3  LA  LA  99  26  TA  TA  100  4  LA  LA  100  27  TA  TA  100  5  LA  LA  100  28  TA  TA  100  6  LA  LA  100  29  TA  TA  100  7  LA  LA  100  30  TA  TA  100  8  CA  CA  99  31  TA  TA  100  9  CA  CA  100  32  TA  TA  100  10  CA  CA  100  33  TA  TA  100