Many water quality variables are subject to large fluctuations in space and time. Understanding these fluctuations in the physical environment and determining whether such changes are natural or a result of anthropogenic influences can be a challenging task. An ideal variable provides unambiguous information about the condition of the environment in relation to reference conditions and is relatively easy and inexpensive to measure.

The chemical composition of the surface and groundwater is controlled by many factors that include composition of precipitation, mineralogy of the watershed and aquifers, climate, and topography. These factors combine to create diverse water types that change spatially and temporally (Guler et al. 2002). The usual program of water quality assessment is measuring of multiple parameters in different monitoring sites in specific period. Therefore, a complex data matrix should be assessed to evaluate water quality (Chapman 1992). Furthermore, in river monitoring, it is frequent to determine whether a variable should be attributed to man-made (spatial, anthropogenic) pollution or natural (temporal, climatic) changes in the river hydrology (Razmkhah et al. 2010). Several authors already have reported the use of water quality indices (WQI) to verify the influence of waste discharges on water quality of rivers. Despite the fact that WQI provides a useful way to predict changes in water quality, considering multiple parameters, it does not provide evidences on the contamination sources, especially because they are calculated after normalization of analytical values (Pesce and Wunderlin 2000). The usual technique of interpretation of river water quality is a univariate procedure which does not adequately characterize simultaneous similarities and differences between samples or variables (Dixon and Chiswell 1996). The problem of environmental data reduction and interpretation of multiconstituent physicochemical and biochemical measurements can be more easily handled through application of robust multivariate methods and exploratory data analysis (Brown 1988; Wenning and Erickson 1994). Factor analysis (FA) as a data reduction technique is widely used, being capable of detecting similarities among samples and/or variables (Wenning and Erickson 1994; Battegazzore and Renoldi 1995; Voutsa et al. 1995; Wang et al. 2006; Mendiguchía et al. 2007). In summary, the goal is to explain a portion of their variance in the set of variables input into the analysis by identifying certain underlying common dimensions called the factors.

The main objective of this study is to identify the source of surface contamination in the Karoon river basin using FA as an effective multivariate statistical technique. The water quality data were incorporated to FA to better interpret, understand and define the mechanisms, processes (natural) and specific source of water quality deterioration and contamination in the basin. Selection of this river was done for some reasons; The Karoon River is the largest river in Iran which irrigates more than 280,000 hectares of the surrounding plain. Consequently better understanding of the river water quality and its dominate controlling factors are crucial for providing the basis for comprehending river basin processes and it is essential for developing future projects in agricultural, urbanization and industrial sectors as well as water resource planning and management.

Study area

The Karoon River basin is located in southwestern Iran; it is the largest river in Iran with an area of 42,574 km2 (Fig. 1) The Karoon river system supplies the water demands of sixteen cities, several villages, thousands of hectares of agricultural lands, and several hydropower plants. Increasing water demands at the development stage including agricultural networks, fish hatchery projects, and inter-basin water transfers could result in a gloomy future for water quality of the Karoon River (Karamouz et al. 2004).

Fig. 1
figure 1

Sketch map of water quality monitoring station of Karoon River basin in Iran

The most widespread of the outcrops in the basin is related to the limestone formations (more than 70 % of basin area). Another important formation (<10% of basin area) in the basin is named “Gachsaran”. The lithological composition of Gachsaran Formation mainly consists of marl, anhydrite, gypsum, halite and some interbeds of marly limestone.

Materials and methods

Monitored parameters

The data were obtained from the hydrological database of Khuzestan Water and Power Authority, Ahvaz, Iran over the period of 1976–2005. Khuzestan Water and Power Authority is the most important organization concerned about water resources in Iran since most of surface water resources are located in this province. The quality of data is usually very good because of the high quality of instruments they used in the stations. Besides, there is a teamwork community to control the quality of collected data in this organization. The discharge and the water quality parameters have been monitored at the Gatvand station on a monthly basis for a period of 30 years. This station was selected for the purpose of this study, since it is located before the Khuzestan plain, where the Karoon River enters the plain and thus is less affected by agricultural activities in the region. The selected water quality parameters include calcium, magnesium, bicarbonates, sulfate, chlorides, electrical conductivity, total dissolved solid, pH and discharge. These parameters were chosen since they have long-term records of measured data. Based on the results of the previous studies the flow discharge logarithm was used instead of discharge measurements to obtain more effective results (Evansa et al. 1996).

Due to the fact that the hydrometric station has not been regularly measured and sampled during all the months of a year and also to remove the potential impacts of the seasonal changes, the input data were transformed into seasonal data. The water quality parameters, their units and the basic statistics of the data set on river water quality for the four seasons are summarized in Table 1. A well-defined temporal heterogeneity in distribution of different water quality parameters was observed in the studied station. For example, the box plot diagram of electrical conductivity is shown in Fig. 2. The box plot diagram identified was constructed to evaluate different patterns associated with temporal variations in river water quality. Based on the box plot diagram of electrical conductivity, certain water quality constituents differ seasonally.

Table 1 Descriptive statistics of water quality parameters in Gotvand station for different seasons
Fig. 2
figure 2

Seasonally box plot of water quality parameters in Gotvand station

The monitoring data for various water quality parameters (10 variables) and discharge were organized in a matrix with variables as columns and the samples (289 cases) as rows. SPSS software was used to analysis the statistical relationships of the data after performing auto scaling for various parameters. To examine the suitability of the data for FA, Kaiser–Meyer–Olkin (KMO) and Bartlett’s test were performed. KMO is a measure of sampling adequacy that indicates the proportion of variance which is common variance, i.e., which might be caused by underlying factors (Shrestha et al. 2008). High values (close to 1) typically indicate that FA may be useful, which are the case in this study shown in Table 2.

Table 2 Kaiser-Meyer-Olkin and Bartlett’s test

Bartlett’s test of sphericity indicates whether the correlation matrix is an identity matrix, which would indicate that variables are not specified in confidence interval. The significance level, which is 0.00 in this study (<0.05), indicates that there are significant relationships among variables. The water quality parameters were grouped into four seasons: autumn (October–December) and winter (January–March), spring (April–June) and summer (July–September).

Factor analysis (FA)

Multivariate statistical techniques, such as FA or principal component analysis (PCA), provide more insight into the underlying structure of a data set, the use of these techniques might require further analyses to identify distinct groups (Belkhiri and Mouni 2012). FA is related to PCA, but the two are not identical. FA as a data reduction technique is performed in two steps included of extraction and rotation of factors. The main purpose of first step is to extract various factors through PCA. Application of PCA has been verified by many scientists to water quality assessment (Gotz et al. 1998; Da Silva and Sacomani 2001; Simeonov et al. 2003; Parinet et al. 2004; Ouyang 2005; Kuppusamy and Giridhar 2006; Nakano et al. 2008). For example, Belkhiri et al. (2011) adopted R-mode FA and their results indicate that the parameters responsible for groundwater quality variations are mainly related to the presence and dissolution of some carbonate, silicate, and evaporite minerals in the aquifer.

In second step, rotation of factors can achieve a simpler and more meaningful representation of the underlying factors by decreasing contributions to principal components by variables with minor significance and increasing the more significant ones (Razmkhah et al. 2010).

The basic purpose of a FA are: to determine how many factors are required to provide a set of variables, to specify the level of correlation between variables and associated statistical factors, to investigate and analyze the common factors and to quantify each factor possessed by each observation (identified by the factor scores).

In fact, the main objective of the FA is to analyze the existing structure in matrix of variance–covariance of the multivariate data. The Eigenvalues and Eigenvectors are also extracted from the correlation and covariance matrix. This analysis is a multivariate statistical method which is mainly used for analyzing the relation between a set of variables or issues (Reghunath et al. 2002). This method assumes that the relation between a number of variables is reflected on a lower level called factor (Davis 1973). Thus, if the main data are replaced by the factors, the complexity of the data contents is reduced and then the factors can be assigned to the real effective chemical processes of the water flow of the basin (Evansa et al. 1996). It should be noted that for a better interpretation the initial factors should be modified by varimax rotation (Grande et al. 1996). As a result, a small number of factors will usually account for approximately the same amount of information as does the much larger set of original observations. The FA can be expressed as:

z ji = a f 1 f 1 i + a f 2 f 2 i + a f 3 f 3 i + + a fm f mi + e fi

where z is the component score, a is the component loading, f is the factor score, e is the residual term accounting for errors or other source of variation, i is the sample number and m is the total number of variables.

The FA was less used for temporal analysis of hydrochemical studies; however, it is more applied for spatial analysis of the hydrochemical data such as identification of aeration sources using different resolved materials in different places (Reeder et al. 1972; Puckett and Bricker 1992) and determination of ions sources across a dump area (Hooper and Peters 1989).

FA was performed on the normalized data sets separately, to compare the compositional pattern between analyzed water samples and identify the factors influencing each one. The input data matrices (variables-cases) for FA were [11–289] for the Gotvand Hydrometric station.

Results and discussion

An examination of the Kaiser-Meyer-Olkin measure of sampling adequacy suggested that the sample was factorable (Table 2). PCA was carried out to extract the various factors. The varimax rotation was performed to secure increased principal components of chemical/environmental significance. The Eigenvalues for different factors, percentage variance accounted, cumulative percentage variance and component loadings (Varimax rotated) are given in Table 2. Eigenvalues >1 were taken as criterion for the extraction of the principal components required for explaining the sources of variances in the data. This analysis resulted in the explanation of 71.53–79.26 of variances in the data.

The parameter loading for the three components from the PCA of the data set are given in Tables 3, 4, 5 and 6. In order to interpret the results, the high factor load in excess of 0.75 and mean factor load between 0.4 and 0.75 were considered which were optional values (Childs 1970). These values were obtained in studies by different researchers (Miller and Drever 1977; Puckett and Bricker 1992; Evansa et al. 1996).

Table 3 Variables and factor loadings after varimax rotation for autumn season
Table 4 Variables and factor loadings after varimax rotation for winter season
Table 5 Variables and factor loadings after varimax rotation for spring season
Table 6 Variables and factor loadings after varimax rotation for summer season

The investigation of the factors with Eigenvalues >1 for the four seasons demonstrated that three factors affected the chemical composition of the Karoon River water quality. The Scree plot of spring season exhibited in Fig. 3, which also includes the percentage variances explained by each component and gives an idea on how the different principal components were extracted. This figure shows a pronounced change of slope after the 3th Eigenvalue. Therefore, three components were retained, which have Eigenvalues >1 and explain 79.26 % of the variance.

Fig. 3
figure 3

Scree plot of Eigenvalues vs. components along with percentage variances vs. components for Gotvand water quality station

An Eigenvalue gives a measure of the significance of the factor: the factors with the highest Eigenvalues are the most significant. Eigenvalues of 1.0 or greater are considered significant. Liu et al. (2003) classified the factor loadings as ‘strong’, ‘moderate’ and ‘weak’, corresponding to the absolute loading values of >0.75, 0.75e0.50 and 0.50e0.30, respectively. Therefore, we also classified these results in Tables 3, 4, 5 and 6.

All three factors are examined separately as follows:

Autumn season: The results of the analysis discovered that three factors accounted for 74.66 % of the total variance (Table 3). Based on the distribution of the Eigenvalues, factor 1 alone explained 38.87 % of the variance. TDS, conductivity, chloride and sodium were strongly correlated and sulfate, calcium and potassium were moderately correlated with factor 1. Calcium, sulfate, and discharge were strongly correlated with factor 2 and bicarbonate and pH with factor 3.

Winter season: The results of the analysis revealed that three factors account for 76.74 % of the total variance (Table 4). The Eigenvalues also performed that factor 1 alone accounts for 42.54 % of the variance. TDS, conductivity, chloride and sodium were strongly and sulfate, magnesium and discharge were moderately correlated with factor 1. Calcium was strongly correlated; sulfate, pH and discharge were moderately correlated with factor 2; and bicarbonate and potassium with factor 3.

Spring season: The results of the analysis revealed that three factors account for 79.26 % of the total variance (Table 5). The Eigenvalues also showed that factor 1 alone accounts for 33.61 % of the variance. Sulfate, calcium and discharge were strongly and TDS, conductivity, chloride, sodium, magnesium, potassium were moderately correlated with factor 1. Discharge, chloride, sodium were strongly correlated, TDS and conductivity were moderately correlated with factor 2, and bicarbonate and pH with factor 3.

Summer season: The results of the analysis exhibited that three factors account for 71.53 % of the total variance (Table 6). The Eigenvalues also showed that factor 1 alone accounts for 36.20 % of the variance. TDS, conductivity, chloride and sodium were strongly and discharge was moderately correlated with factor 1. Sulfate was strongly correlated and magnesium was moderately correlated with factor 2 and calcium, bicarbonate and pH with factor 3.

Factor 1 have high loading of the ions Na+ and Cl, and factor 2 have great loading of the ions Ca+ and SO 4 for all seasons except spring. The concentration of Na, Cl, Ca and SO4 in runoff generated by Gachsaran Formation outcrops is much greater than that in other formations. Factor 3 shows the influence of limestone formations on the Karoon River water quality.

As identified by a plot, the selected parameters showing seasonal trends are given in Fig. 4. The average discharge (Fig. 4) is higher in spring compared to autumn, winter and summer. In the study period, these might have been due to the frequent snow melt and continuously discharge melt water into river network in the basin. Lastly, river water quality is expected to improve in the Karoon River basin by increasing discharge amount. Vary factors obtained from FA indicate that the parameters responsible for water quality variations are mainly related to discharge amount as well as Gachsaran geology and formation. EC is extremely less in the spring season when discharge value increases and tremendously high in the rest of the year. This pollution is significantly natural and point source as nonpoint pollutions like agriculture and orchard plantations did not involve in this basin.

Fig. 4
figure 4

Discharge and electrical conductivity variations during a year

Figure 4 showed that although the lowest flow discharge was observed during the summer season, the highest salinity occurred over the autumn season. In other words, the salinity of the river water increases in the wet season (autumn and winter).

Gachsaran Formation having a high erosion potential, so the generated runoff in wet season passing through this formation, dissolve gypsum and halite, can contaminate the surface or subsurface water resources and the salinity of the river water increases. As regards in the flowing water, the dissolution rates of gypsum is 100 times more than the rate of limestone dissolution, and only about 1/1,000 the rate of salt dissolution (Milanovic 2004), so dissolution of halite of Gachsaran Formation is the first and dissolution of gypsum is the second effective factors on the Karoon River water quality.

Conclusions and future work

In this case study, FA was used to evaluate temporal variations in water quality of the Karoon River using 30 years datasets of 10 water quality variables and discharge in the Gotvand station which is located before the Khuzestan plain and less affected by agricultural activities. Our analysis grouped 1 sampling site into four seasons for understanding temporal variation in water quality characteristics. Based on obtained information, it is possible to design a future, optimal sampling strategy, which could increase the number of sampling stations and associated costs particularly in the agricultural and industrial areas. Although the FA/PCA would not result in a significant data increasing, it will help to identify the factors/sources responsible for variations in river water quality at different stations with different pollutions sources as well as sampling size.

Results obtained from FA indicate that the parameters responsible for water quality variations are mainly related to discharge and geology formation (natural) and not any man–made pollution sources. Future research should focus and address man-made pollution sources and the areas with highly polluted areas in the basin.

Due to the fact that in this basin only the Gachsaran Formation has saline units, it can be concluded that the most influential factor on the chemical composition of the water quality is the solution of saline units in Gachsaran Formation. The second affecting factor on the water chemical composition of Karoon River in all seasons except spring is the high-to-mean factor loads Ca and SO4. The ions are increased in result of solution of gypsum formations; however, due to their less solubility, they are the second affecting factor. Thus, major contamination threats over the year are geological situation which are defined as nonpoint pollution source. Due to less solubility of the formations in the basin compared to Gachsaran Formation, the most impact on water quality is related to Gachsaran Formation. Although Gachsaran Formation is outcropped solely in a minor part of the basin, because of its specific lithology characteristics, this formation can affect Karoon River water quality significantly (Zarei et al. 2005). This formation also causes a high erosion potential, so the generated runoff passing through this formation, dissolve gypsum and halite, can contaminate the surface or subsurface water resources. Water quality can be affected by the formation in two ways; firstly runoff water is affected by this lithology and then brine springs which flow through this formation. This result may be affected partly by studying different period as well as land use change, urbanization, etc.

Since water quality assessment is an important part of watershed river management, this study aimed to illustrate the usefulness of a statistical techniques named factor analysis. This technique help efficiently for analysis and interpretation of special data sets, and in water quality assessment, identification of pollution sources/factors and understanding seasonal variations in water quality for effective water quality river management.