Natural Radioactivity and Metals in Pit Lakes in Sweden Analyzed by Principal Component and Cluster Analysis

The aim of this work was to determine which parameters are sufficient to measure in order to describe the water quality of a pit lake and to identify patterns in the data among different kind of pit lakes. The data consisted of ambient dose equivalent rate, elemental and radionuclide concentration, pH, and specific conductance in surface water and sediment samples collected from different types of mines. Data were tested for normality and log-normality and used in principal component analysis (PCA) and hierarchical cluster analysis (HCA). The normality tests indicated that only 40K was normally distributed, while only the 234,238U isotopes were log-normally distributed. HCA performed on parameters measured in surface water provided clusters that in most cases separated the elements according to their chemical groups. However, when HCA was performed on pit lakes, the clustering seemed to indicate that surface water might not be the preferred sample to differentiate between different types of pit lakes. PCA of surface water data resulted in three components that explained 72% of the variance when pH, SC, concentration of the elements Mg, K, Ca, Cu, Zn, Sr, Pb, activity concentration of 234,238U and 210Po, and ambient dose equivalent rate were included. For surface sediment data, the PCA resulted in three components explaining 83% of the variance when the concentration of Na, Mg, Al, P, K, Ca, Rb, Sr, Y, Tl, activity concentration of 234Th, 226Ra, 210Pb, 232Th (series average), and 40K, and ambient dose equivalent rate were included.


Introduction
When a site has been found to contain valuable ores, an environmental risk assessment is required before mining can proceed. One of the many aspects that must be considered is site restoration, which for open cast mining usually involves assuring that the resulting pit lake will not contaminate the surrounding environment and groundwater (Geological Survey of Sweden 2016). Many bio-and geochemical models and tools for aqueous systems exist that can aid in the risk assessment, such as PHREEQC (Parkhurst et al. 1980), MAGIC (Cosby et al. 1985) and Bio-met (Bio-met 2013). Some are based on empirical models, equilibrium principles, and process-oriented kinetics, but a common feature for most of them is that they are only valid for a certain kind of aqueous system, for a certain pH range, or concentration of dissolved elements. The use of models is further complicated in lakes that are exposed to man-made influences such as liming and for lakes in Scandinavia where the surface can be covered by ice several months per year.
Pit lakes differ in many ways from natural lakes, which have been studied for much longer time periods and where expertise, models, and tools are readily available. One major difference with pit lakes is the underlying bedrock, which could contain a higher concentration of ores or minerals that can significantly alter the chemical properties in the lake, such as oxidation of sulfide-bearing minerals leading to acidic water (Reddy et al. 1995;Santofimia and López-Pamo 2013;Vaughan and Lennie 1991). Furthermore, the final environmental impact cannot be seen until decades after mine closure.
Regarding naturally occurring radionuclides in pit lakes, former uranium mines have been studied extensively (Ferrari et al. 2015;IAEA technical meeting 2004;Rollings et al. 1999;Strømman et al. 2013). However, there are few such studies on non-uranium mine pit lakes. Mines and quarries containing pit lakes can be found in many places in Sweden and these were the focus in this study, along with their naturally occurring radionuclide content.
The data used in this study consists of partially previously published data of 51 pit lakes from 39 sites spread across Sweden, from Malmö in the south to Kiruna in the north (Mantero et al. 2020; Thomas et al. 2022). Only a few had been previously studied (Lu 2004) and most were approached as unknown systems without any prior knowledge with regard to expected concentrations. So, the aim of this work was to determine which water parameters are sufficient to measure in order to describe the water quality of a pit lake, and which are redundant. Raw water and surface sediments were analyzed by spectrometric techniques to obtain a large number of parameters to build baseline data on these pit lakes. Unpublished data together with supplementary data given in previous publications (Mantero et al. 2020;Thomas et al. 2022) were analyzed by principal component analysis (PCA) and hierarchical cluster analysis (HCA) to identify patterns and clusters in the data and classify Swedish pit lakes with similar distributions of water quality parameters.

Material and Methods
Among the parameters that were measured in surface water and sediments in previous studies (Mantero et al. 2020;Thomas et al. 2022), those included here were pH, specific conductance (SC), concentrations of Na, Mg, K, Ca, Mn, Cu, Zn, Sr, Ba, Pb, and for the surface water, activity concentrations of 234,238 U and 210 Po. For surface sediments, the parameters included were the concentrations of *Na, *Mg, Al, Si, *P, S, *K, *Ca, *Ti, Mn, Fe, *Ni, Zn, As, *Rb, Sr, *Y, *Zr, Ba, *Tl, Pb, and the activity concentrations of 40 K, 234 Th, 226 Ra ( 214 Pb, 214 Bi), 210 Pb and 232 Th (* indicates data not previously published in Mantero et al. 2020 andThomas et al. 2022). The activity concentration of 232 Th is the series average from the gamma emitters 228 Ac, 212 Pb, 212 Bi and 208 Tl, which were found to be in secular equilibrium. An average was not used for the 238 U decay series, since a disequilibrium was seen in the decay chain for the sediment samples. The disequilibrium was most severe for 238 U to 210 Po ratios (up to 0.36 ± 0.02). For some parameters in a few lakes the concentration was below the detection limit and these entries were replaced by half the detection limit, which has been found to be a good substitute (Farnham et al. 2002). For ICP-MS detection, limits ranged from 1 µg/L-30 ng/L, depending on the element; for alpha spectrometry, it was ≈ 1 mBq/kg for the water samples and 1 Bq/kg for sediments.
Apart from the concentration of stable elements and radionuclides in the surface water and sediments, measurements of ambient dose equivalent rate were also included in the statistical analyses. This quantity was measured with an energy compensated GM-tube, SRV-2000 (RADOS, Finland) at a height of 1 m above ground near each pit lake (n = 50). Only one value was missing (site 1) and it was substituted by the value from a nearby sampled pit lake (site 2).
The pit lakes were divided according to their location in Sweden, south and north, represented by S or N in the sample code, following the border outlined by limes norrlandicus (Horn after Rantzien 1948) and the Illies ecoregion 14 (Illies 1978). The type of mine or quarry was specified if known, followed by a site number and ending with a letter A, B, or C, which indicates the presence of several pit lakes located within the same mining site (a radius of 100 m). The studied pit lakes in northern Sweden (n = 21) included 16 Cu and Zn mines, two Pb mines, one Au mine, and two sites where the extracted ore was unknown. The pit lakes in southern Sweden (n = 30) included one silver and five iron mines, six limestone, five marble, three feldspar, one granite, and four stone quarries. The latter included different kinds of rocks for various purposes. At five sites, the type of extracted ore was unknown. 27 of these 51 pit lakes were sampled for sediments: 16 in northern Sweden and 11 in southern Sweden (Fig. 1).
The HCA and PCA statistical analyses were performed using IBM ® SPSS Statistics (version 26) software. HCA is an unsupervised statistic tool that categorizes objects into homogenous groups by applying a similarity criterion between objects. In this work, the squared Euclidean distance was used as the criterion. The analysis starts by considering all objects (pit lakes) as individual clusters and calculates the squared Euclidean distance between them; the minimum variance method (Ward 1963) was used to cluster the pit lakes according to their similarities. A threshold of 50% (linkage distance 12.5) was used to separate clusters. The results of HCA were plotted as a rescaled distance dendrogram by the software, where the dissimilarities of the objects are represented by the connected branches.
PCA is a multivariate statistical analysis that can compress data as well as identify tendencies by reducing initial parameters to a few uncorrelated components. These components are linear combinations of the original parameters in such a way that each component explains as much variance as possible among the data set (Miller and Miller 2005;Piñero-García et al. 2012). The amount of variance explained by each component is quantified by an eigenvalue and according to Guttman-Kaiser criterion (Guttman 1954;Kaiser 2011), components with eigenvalues above 1 should be retained as they explain sufficient variance 1 3 without information loss. This criterion was used to select components.
Also, a Varimax orthogonal rotation was used to identify the parameters influencing each component (Miller and Miller 2005;Piñero-García et al. 2012). The parameters used in PCA were standardized into normal distribution (average value 0 and standard deviation 1) to avoid the first component being influenced by the parameters with higher variance. The parameters are listed in each rotated component with loadings (weights) that can be interpreted as correlations to each component; a high loading (> 0.6) would thus indicate that the specific parameter is important in explaining the variance of that particular component. Table 1 summarizes the descriptive statistics of the activity concentration of radionuclides in surface water (n = 51) and sediment (n = 27). The distribution of activity concentration for all radionuclides except for 40 K were asymmetric (skewness > 0), with larger tails than for a normal distribution (kurtosis > 0). Comparison between the percentiles and the average activity concentration indicates some pit lakes with high activity concentrations in both surface water and sediments, e.g. where the average 234,238 U, 234 Th, 226 Ra and 210 Pb exceeded the third percentile.

Sample Characteristics
A test of normality was performed using both Kolmogorov-Smirnov and Shapiro-Wilks tests, where both tests gave p-values of 0 for activity concentration for all radionuclides except for 40 K (p = 0.055), indicating that 234,238 U, 232,234 Th, 226 Ra, 210 Pb and 210 Po were not normally distributed (normal distribution if p > 0.05). Analysis of the log 10 of the activity concentration values (testing for log-normality distribution) showed no or relatively weak log-normal distribution for all radionuclides except 234 U (p = 0.154). Similar results were seen for the stable elements analyzed by ICP-MS (in water) and XRF (in sediments), but Na, K, and Sr in sediments and the pH in surface water were found to be normally distributed (data not shown).
The ratio of 234 U to 238 U activity concentrations is also included in Table 1 and reflects to some extent a set of chemical and physical processes that causes and enhances this disequilibrium. The latter can be caused by alpha particles ejected from 238 U, resulting in 234 Th and 234 U recoiling into the water (Gascoyne et al. 2002;Kigoshi 1971;Maher et al. 2006). Also, high carbonates concentrations in the water is known to increase the solubility of U (Lehto and Hou 2010). Thus, high concentrations of carbonates in limestone quarries, for example, would promote the disequilibrium, since the readily available and ejected 234 U atom can be bound to carbonates and further solubilized, in contrast to 238 U bound to the lattice. Lastly, the disequilibrium may also reflect the source of the water input since the ratio of 234 U to 238 U in groundwater usually exceeds unity (Suksi et al. 2006), and has been reported to be as high as 10 (Cowart 1981). Test Fig. 1 Overview of the locations of the mining sites included in this study. Map data ©2020 Google results indicate that this ratio is both normally and log-normally distributed. Figure 2 shows the dendrograms from HCA for the parameters measured in surface water (pH, SC, Mg, K, Ca, Cu, Zn, Sr, Pb, 234,238 U and 210 Po) and sediments (Na, Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Ni, Zn, Rb, Sr, Y, Zr, Ba, Tl, Pb, 40 K, 234 Th, 226 Ra, 210 Pb and 232 Th). Ambient dose equivalent rate were included in both analyses.

Hierarchical Cluster Analysis
In the water, three clusters can be seen reflecting similar chemical behavior of the elements. The first cluster contains the radionuclides and ambient dose equivalent rate, the second the transitional and post-transition metals, and the third includes pH, SC, and the alkali and alkaline earth metals. In sediments, the clusters of parameters are more difficult to interpret since they do not necessary reflect any chemical behavior or process; they most likely represent the chemical composition of the sediments (bedrock) of these pit lakes. For example, the cluster with Ca and Mg could reflect the presence of the mineral dolomite (CaMg(CO 3 ) 2 which can be found in limestone quarries, Fe and S can occur as the mineral pyrite (FeS 2 ), and the cluster of Y (rare earth element) with the radionuclides is a known co-occurrence in minerals (Barakos et al. 2015;Jaireth et al. 2014). Figure 3 shows the dendrogram for 51 pit lakes where the parameters used are those presented in Table 2 (pH, SC, Mg, K, Ca, Cu, Zn, Sr, Pb, 234,238 U, 210 Po and ambient dose equivalent rate). A threshold of 50% (distance 12.5) was used to distinguish between clusters, resulting in four clusters. Cluster one contains 38 pit lakes including very different types of mines. The smaller clusters are more homogenous in terms of the type of mine included. Cluster two contains Ag and Zn mines, cluster three are mainly feldspar, granite, and stone quarries, and cluster four are all Zn and Cu mines.
The Ag and Zn mines had a relatively low pH (4.9 and 5.7) and Zn concentrations of 5900 and 3900 µg/L, which were the two highest concentrations measured in all of the surface water samples. The Zn concentration in the Ag mine was likely due to the presence of sulfide bearing minerals (Cox 1970), as galena was one of the ores identified at the site, which was further confirmed since that pit lake had the highest measured concentration of Pb in surface water (690 µg/L). Thus, it is likely that the Pb, Zn, and Ag exist as galena (PbS), sphalerite (ZnS), and argentite (Ag 2 S) at this site.
The pit lakes in cluster three differ from the rest mainly due to relatively high concentrations of U. The cluster average of 238 U was 490 mBq/kg whereas the average of all lakes was 110 mBq/kg (Table 1). One stone quarry, in cluster one rather than cluster three, had a surface water 238 U activity concentration of 1.4 mBq/kg. Otherwise, all feldspar, granite, and stone quarries are included in cluster three.
Cluster four includes only three former Cu and Zn mines pit lakes. These lakes had a pH of 7-9 and the highest specific conductance, 1600-2000 µS/cm, whereas the average SC for all lakes were 400 µS/cm. They also had the highest surface water concentrations of Ca; the clusters averaged 1000 mg/L while the limestone quarries (n = 6) Table 1 Descriptive statistics of radionuclide concentration in surface water (n = 51) and sediment (n = 27) from previously published results (Mantero et al. 2020;Thomas et al. 2022) Activity concentration for radionuclides in surface water is given in mBq/kg, and for surface sediments in Bq/kg. P values from Kolmogorov-Smirnov (pKS) and Shapiro-Wilk (pSW)  averaged 200 mg/L. The S concentrations were also the highest; this combination between high S and Ca concentrations can be explained by the liming activities that are conducted at these sites to maintain good water quality. The sulfide-bearing minerals would most likely cause acidic water and high metal concentrations in the surface water if liming were not done.  Fig. 2 Results from hierarchical cluster analysis (HCA) for parameters measured in a surface water and in b sediments in pit lakes in Sweden. The clusters in a seem to separate the different parameters according to their origin (radionuclides) and chemical groups (transitional and post-transition metals, alkali and alkaline earth metals). While in b the clusters appear to reflect the composition of the sediments Fig. 3 Results from hierarchical cluster analysis for 51 surface water samples from pit lakes in Sweden including the following parameters: pH, SC, Mg, K, Ca, Cu, Zn, Sr, Pb, 234,238U, 210Po and dose rate. The dendrogram shows four clusters when applying a threshold of 50% or distance 12.5. Cluster one includes most of the pit lakes, while the smaller clusters are more homogenous in terms of the type of mine included total variance with positive correlations with SC, Mg, K, Ca, and Sr. C2 explains 21% of the total variance with positive correlations with 234,238 U and 210 Po and also with the ambient dose equivalent rate. C3 explains 19% of the total variance with a negative correlation with pH and a positive correlation with Cu, Zn, and Pb. Thus, C1 is largely influenced by the alkali and alkaline earth metals, C2 primarily by the naturally occurring radionuclides, and C3 by the postand transition metals. The relationship between the negative loading of pH and the positive loadings of metal concentrations in C3 is assumed to be due to the fact that a low pH increases metal solubility. In Fig. 4, the PCA scores and PCA loadings are plotted with respect to the three rotated components.

Surface Water
In the C1-C2 plot, most of the pit lakes are centered close to the origin. However, one group of pit lakes extends to the right, which corresponds to the feldspar, granite, and stone quarries, which are known to have higher uranium and thorium concentrations (Maurice 1980;Wilson and Åkerblom 1980). These lakes are all in cluster three (Fig. 3) and the stone quarry that had a low 238 U activity concentration is seen to be grouped together with most of the pit lakes, similar to cluster one (Fig. 3). The second pit lake group with a distinct upward projection include the Cu and Zn mines (also seen in cluster 4 in Fig. 3), which align with the loadings of SC, Ca, and Sr. The SC and Ca is because these pit lakes are being limed to maintain a neutral pH. The highest sulfur concentration in surface water was also found among these pit lakes, also contributing to the SC reading.
The results seen in the C2-C3 plot are similar to those in C1-C2 for the pit lakes related to C2 (radionuclide loadings). The Cu and Zn mines are, however, more closely grouped since the SC loading is now centered around the origin together with that the Cu and Zn mines not being influenced by neither C2 nor C3. One Zn mine (pit lake) is positioned to the right together with an Ag mine, and these are the two lakes that had the highest Zn concentration in surface water. Since the loading of Zn in C3 is positive and to the right, so also are these two lakes.
The relative position of the parameters in the loading plots in Fig. 4 are very similar to the four cluster results obtained by HCA. Here the radionuclides and ambient dose equivalent rate are seen to be grouped together, SC and alkali and alkaline earth metals, the transitional and post-transition metals, and as shown by C3, pH is seen separating the rest, forming a fourth group.

Surface Sediment
The results from PCA on the 27 surface sediment samples is shown in Table 3, where three components with eigenvalues greater than 1 explain 83% of the total variance. The plot of C1-C2 is found in Fig. 5. C1 explains 51% of the variance and is highly correlated with the activity concentration of radionuclides, ambient dose equivalent rates, and concentrations of Rb and Y. C1 is also negatively correlated with Sr concentration, which seems to indicate that the sediments in this data set that are positively correlated with U and Th are negatively correlated with Sr. Also, the positive loadings of Y and Rb would imply that higher concentration of these elements correlates with higher activity concentration of the naturally occurring radionuclides. The correlation between the concentration of Rb and the activity concentration of the radionuclides is less apparent, except at the feldspar mine. This will be investigated further in another work. However, the correlation of the concentration of Y with the activity concentration of the radionuclides is more apparent (R 2 = 0.97) and the fitted equation is: This equation is valid for the range of the 234 Th activity concentrations, from 6 to 1500 Bq/kg and for concentrations of Y from 17 to 63 ppm. Similar fitting, though with lower R 2 values, was achieved using 226 Ra or 210 Pb instead of 234 Th.
C2 explains the remaining 23% of the total variance and is negatively correlated with Mg and Ca and positively correlated with the rest of the parameters. Since C3 only explains an additional 9% of the total variance, it (1) 234 Th = 0.57 • Y 2 − 11.7 • Y − 18 Table 2 Results from PCA after a Varimax rotation for 51 surface water samples from pit lakes in Sweden with three components explaining 72% of the total variance C1 is highly connected to alkali and alkaline earth metals, C2 to naturally occurring radionuclides and ambient dose equivalent rate (dose rate), and C3 to the post-and transition metals 1 3 was considered of minor importance, and the C1-C3 and C2-C3 plots are not shown.

40
K analyzed by gamma spectrometry and 39,41 K analyzed by XRF are positioned closely in the PCA loading plots. The ratio between 40 K activity concentration and K concentration analyzed by XRF gives 252Bq/kg of 40 K per % of K. Assuming a natural abundance of 40 K to 0.012% of natural K, this would correspond to ≈ 320Bq/kg 40 K per % of K, 26% higher than the value quoted here. However, this difference is in the same magnitude as the associated uncertainties for the data for both the XRF and gamma spectrometry analysis.
Unfortunately, Cu was only detected in a few sediment samples (mainly Cu mines) and had to be excluded due to too many missing values. This is one of the reasons why the Cu and Zn mines are grouped around the origin since there is no parameter (or loading) that extends them a marked direction in the score plot. Since these pit lakes contain low concentration of radionuclides, focusing on other elements and excluding the radionuclides in the PCA would better explain the characteristics of these lakes. However, that is outside of the scope of this study, which was focused on the naturally occurring radionuclides.

Conclusions
Results from the descriptive statistics showed that most naturally occurring radionuclides in surface water and sediments in pit lakes are not normally or log normally distributed.  Table 2. a Shows the relation between C1 and C2, b between C2 and C3 and c between C1 and C3. Two trends can be seen in a and b, first the feldspar, granite and stone quarries are aligned with the loadings of the radionuclides and ambient dose equivalent rate, which is reasonable since they contain higher activity concentration of radionuclides and consequently features higher dose rates. The second trend is seen in a where the Cu and Zn mines are mainly aligned with the loadings of specific conductance (SC), Ca and Sr

3
Care should therefore be taken when calculating standard deviations or performing other standardization procedures that assume a normal distribution.
Hierarchical cluster analysis performed on parameters measured in surface water provided clusters that agreed with assumed relations among elements in the chemical groups. By including only one or two parameters from each cluster, one could potentially reduce the number of parameters that needs to be measured. This can reduce analysis costs while maintaining the parameters needed to explain the different chemical characteristics of a pit lake. The clustering of pit lakes with respect to the type of mines seems more plausible for the smaller clusters, which included similar mines and quarries. However, the largest cluster included many different kinds of mines and quarries, indicating that surface water might not be the best sample to differentiate between different types of pit lakes when using hierarchical cluster analysis.
Principal component analysis with varimax rotation showed that when including a large variety of pit lakes (limestone quarries to gold mines), the following parameters (apart from the radionuclides and ambient dose equivalent rate) explained most of the variance: pH, SC, and concentrations of Mg, K, Ca, Cu, Zn, Sr, and Pb. For surface sediments, the corresponding parameters were concentrations of Na, Mg, Al, P, K, Ca, Rb, Sr, Y, and Tl.
A correlation in the PCA between the concentration of Y and the activity concentration of 238 U ( 234 Th) in sediments and ambient dose equivalent rate was found. Therefore, we recommend including analysis of Y when studying naturally occurring radionuclides in the environment to follow this relationship further. Ambient dose equivalent rate was also closely positioned with the radionuclides in the PCA loading plots for surface sediment, indicating that ambient dose equivalent rate could be used as a substitute in PCA if radionuclide concentrations are not measured. The substitution is expected to be less good for surface water since the loadings for ambient dose equivalent rate is further away. Since ambient dose equivalent rate is relatively simple to measure, it is recommended to be included in general studies on lakes.
Funding Open access funding provided by University of Gothenburg.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are Table 3 Results from PCA after a Varimax rotation for 27 surface sediment samples from pit lakes in Sweden The three components explain 83% of the total variance, however, C1 alone explains 51% and this is due to several radionuclides included in the analysis where they also are correlated to each other   Table 3. Apart from the cluster centered around origin, the pit lake to the far left aligns with Ca and Mg, probably since the data is from a limestone quarry. Similarly, as seen in Fig. 4 for surface water, the lakes aligned with radionuclides and ambient dose equivalent rate are the feldspar/stone quarries, the granite quarry is not included here since no sediments were sampled from that lake included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.