Introduction

Extreme events such as floods and droughts are complex natural hazards that affect some areas of the world every year and have significant impacts on water quantity (low and high flows). As well as impacting the quantity of water within rivers, low and high flows resulting from droughts and floods can affect water quality and aquatic biology through various physical, chemical and biological processes (Caruso 2002; Hrdinka et al. 2012) and can also aggravate water pollution and therefore, can impact human health and aquatic ecosystems through water quality deterioration.

The issue of the effects of extreme weather conditions, a possible result of climate change, on stream flow quantity (high and low flows) has been extensively investigated in recent years (e.g. Arnell 1999; Hanson and Weltzin 2000). Recently, investigations on the effects of climate change on water quality have also been carried out, mostly focusing on droughts (Elsdon et al. 2009; Van Vliet and Zwolsman 2008; Zwolsman and Van Bokhoven 2007). Mimikou et al. (2000) showed that water quality simulations under future climatic conditions entail significant water quality impairments because of decreased stream flows. Wilbers et al. (2009) demonstrated that the drought period of 2003 in the Dommel River, a tributary of the Meuse River in the Netherlands, did not significantly affect water quality.

In Iran the availability of water resources is critical during certain periods. River flows are strongly seasonal characterized by low and high natural flow during summer and winter, respectively. The high frequency of droughts in the area makes it necessary to improve management strategies for water quality and quantity during dry periods. Surface water is not only a major source of drinking water in Iran, but also supplies public water utilities and accounts for almost all of the water supply to rural households. Therefore, knowledge of low and high flow quality and quantity in streams is important for maintenance of the quantity and quality of water resources. Although low flow hydrology and climate change impacts on low flow are recognized in regional scales of Iran (e.g. Eslamian et al. 2010; Modarres 2008; Nosrati et al. 2004, 2015; Nosrati and Shahbazi 2008), little is known about the low flow impacts on surface water quality (Nosrati 2011). Thus, it is important to determine the climatic, and in particular drought impacts on surface water quality.

Different multivariate statistical techniques are widely applied to evaluate water quality through data reduction, classification and relationship (e.g. Machender et al. 2014; Nosrati and Van Den Eeckhaut 2012). Although multivariate statistical techniques such as principal components analysis, factor analysis, cluster analysis and discriminant analysis are widely applied to evaluate surface and groundwater quality, to our knowledge, however, there have been limited attempts to establish surface water quality parameters to incorporate into general linear mixed models to assessing low and high flow hydrology and physico-chemical surface water quality. The aim of this study is to evaluate the effects of low and high flows on surface water quality using discriminant analysis and general linear mixed models in Sari-Neka Basin, Northern Iran.

Materials and methods

Study area

The present study investigates the effects of low and high flows on water quality at 15 stations in Sari-Neka Basin in Mazandaran Province in the north of Iran (Fig. 1; Table 1). The study area is located in karstic region in the eastern part of Mazandaran province. The study area (35° 56–36° 52 N and 52° 56–54° 45 E) is geographically divided into two parts: the coastal plains, and the mountainous areas. In plain, intensive large-scale agricultural activities is practiced using irrigation. The Alborz Mountains, south of the study area consists of year-round emerging springs and forested hills. The region climate is a semitropical climate with an average temperature of 25 °C in summer and 6 °C in winter. The mean annual rainfall in the region ranges from more than 1000 mm in the west to 300 mm in the east of the province. Around 33 % of all precipitation falls as snow in the mountainous area. Summer rainfall is also important in the region. The snowmelt period usually begins in mid-April and concludes in late April to early May. The population of the study area has been steadily growing during the last 50 years. 53.2 and 46.8 % of the population are living in urban and rural areas, respectively (SCI 2007).

Fig. 1
figure 1

Location map of the study area and monitoring stations

Table 1 The details of selected monitoring stations in Mazandaran Province

Mazandaran Province is an important region for agricultural production, which directly depends on river water resources. Low flow frequency analysis will provide essential information regarding the risks of industrial development and water quality management during times of low flow, i.e. summer season, such as water pollution by pesticides and other industrial waste constituents. Such pollutants can also be harmful to fisheries downstream in Caspian Sea and agricultural activities, which are the main source of rice production in the country. All rivers in the region originate in the Alborz Mountains.

Data collection and treatment

Hydrological data series for 1991–2006 were obtained for 15 gauging stations in the region from the archives of Water Resources Researches Organization, Iran. Two data sets were considered in the analysis:

  1. 1.

    natural daily river discharges that had no data gaps for the 1991–2006 period;

  2. 2.

    eleven water quality parameters including sodium adsorption ratio (SAR), electrical conductivity (EC), total dissolved solids (TDS), pH, bicarbonate (HCO3 ), chlorine (Cl), sulfate (SO4 2−), calcium (Ca2+), magnesium (Mg2+), sodium (Na+), and potassium (K+).

Low flow has been characterized by the reduction in stream flow that may occur over 1 year or over several consecutive years (Smakhtin 2001) and can be assessed using low flow indices, i.e. lowest annual flow for a given duration (e.g. 7 days), particularly low flows that occur in the same season each year (Tallaksen et al. 1997). In order to evaluate surface water quality during low flow period, first, the annual 7-day minimum discharge series for each gauge were computed. Then, the 7-day average of each water quality parameters associated with the annual 7-day minimum discharge was calculated. The 7-day low flow periods were coincident with water quality data. Moreover the highest annual stream flow and associated water quality parameters for 15 water years (1991–2006) at each gauge were selected to compare to low flow periods.

Basic stream flow characteristics, including the specific discharge, q (mean daily discharge divided by basin area), and the discharge were determined based on low and high stream flow series (1991–2006). This allows determination of significant relationship between the water quality parameters and hydrological characteristics at the monitoring stations between low and high stream flow.

Statistical analysis

The normality and homogeneity of variance of the associated water quality parameters values were tested by two-tailed Kolmogorov–Smirnov and Levene tests, respectively. These statistical analyses were followed by a t test for the identification of significant differences between low and high stream flow periods. Only those variables for which the t-test statistics for low and high stream flow categories were significant (p < 0.05) were retained for further analysis.

Principal components analysis (PCA) was used as the method of factor extraction for this study because it requires no prior estimates of the amount of variation in each surface water quality variable explained by the factors. PCA was performed on standardized variables to eliminate the effect of different measurement units on the determination of factor loading. Factor loadings are the simple correlations between the water quality variables and each factor. In our study, principal components (main factors) with eigenvalues >1 were selected and subsequently subjected to a varimax rotation to minimize the number of variables that have high loadings on each factor. In addition, communalities of every single variable for factor model were calculated to estimate the portion of variance in each variables explained by the rotated principal components. A high communality for a surface water quality variable indicates a high proportion of its variance is explained by the factors. In contrast, a low communality for a surface water quality variable indicates much of that attribute’s variance remains unexplained. Less importance should be ascribed to surface water quality variables with low communalities when interpreting the factors (Nosrati et al. 2015).

Standard, forward and backward stepwise discriminant analysis (DA) was performed for retained surface water quality parameters to select water quality indicators that were most discriminating between the low and high stream flow periods. In standard mode, all variables enter simultaneously into the model. In forward stepwise mode, variables move into the model in successive steps; at each step the variable with the largest significant value will be chosen for inclusion in the model. The stepping will terminate when no other variable has a significant value, whereas, in backward stepwise mode, all variables are included into the model, and then are removed variables step by step; with the smallest significant value until no other variable in the model has a significant value. Thus, as the result of a successful discriminant analysis, one would only keep the important variables in the model, that is, those variables that contribute the most to the discrimination between groups. DA was performed on standardized variables to eliminate the effect of different measurement units on the determination of factor loading (Hill and Lewicki 2007).

Pairwise comparisons as discussed above do not allow to fully quantify and to understand the interaction between the different independent variables. Therefore, the effects of low and high stream flow periods and river basin hydrological characteristics (including the specific discharge and the discharge) on water quality parameters were examined with mixed model analysis. Designs that contain random effects for one or more categorical predictor variables are called mixed-model designs. Random effects are classification effects where the levels of the effects are assumed to be randomly selected from an infinite population of possible levels. The solution for the normal equations in mixed-model designs is identical to the solution for fixed-effect designs. The variables used to build the statistical model consisted of both a categorical variable (low and high stream flow periods: dummy = 1 for low flow and dummy = 0 for high flow) and covariates (including specific discharge and the discharge) as fixed effects. The mixed analysis is able to account for sampling at the same observation point at different moments in time and also allows to identify a monitoring station. This effect accounts for drainage basin characteristics of monitoring stations that were not directly measured and is therefore considered as a random effect within the model. In order to identify the optimal variable to explain variations in water quality parameters, we used backward stepwise general regression model using a minimum significance level of 5 % for model entry. All variables (except categorical data) were subjected to natural logarithmic transformation in order to assure homoscedasticity and linearity between the dependent and the explanatory variables. Statistical analyses were carried out using STATISTICA V. 8.0 (StatSoft 2008).

Results and discussion

Low and high flow effect on water quality

According to the t test, water flow differs significantly (p < 0.05) between the hydrological low and high stream flow periods during 1991–2006 at the 15 selected monitoring stations (Table 2). Statistically significant differences (p < 0.05) were noted for physico-chemical parameters, except for pH and K+ concentration. However, the comparison showed that the pattern and magnitude of the response varied among stations (Table 2). Differences for water quality parameters at Nahre Abloo and Pajim stations were not detected when comparing the low and high periods (Table 2). SAR, Na+ and TDS were statistically significant in the most monitoring stations. Although differences were not always statistically significant, the general pattern was that physico-chemical concentrations were lower during the high flow period at most of the monitoring stations. The same results have also been reported by previous studies. Zwolsman and Van Bokhoven (2007) and Van Vliet and Zwolsman (2008) demonstrated that water quality was negatively influenced by droughts, with respect to water temperature, eutrophication, major ions and heavy metals; they also indicated that the impact of droughts on water quality will be greater when the water quality is already poor. Prathumratana et al. (2008) proposed TSS, alkalinity and conductivity as sensitive water quality parameters for monitoring impacts of changing climate in the lower Mekong River. In their study, negative significant correlations were generally found between discharge flow and dissolved oxygen (DO), pH and conductivity (from 0.2 to 0.9). Worrall and Burt (2008) observed decreasing dissolved organic carbon (DOC) fluxes and concentrations in the areas that had experienced severe droughts in British rivers. Österholm and Åström (2008) showed that the severity of individual summer droughts in the Pajuluoma acid sulphate area of Finland had little or no impact on the water quality during subsequent autumn and spring.

Table 2 Mean and significant values of surface water quality variables in two low and high stream flow periods, 1991–2006 (italicized indicated significant value)

The EC indicates the amount of material dissolved in water. According to the WHO guidelines (WHO 1983), the maximum admissible EC concentration is 250 μS cm−1 for drinking water. All monitoring stations in the study area had conductivity values exceeding this maximum permissible limit for potable water but for 66.6 % (n = 10) of the stations, the EC is higher during low flow periods compared to high flow periods. The average TDS is higher during low flow periods than during high flow periods. The recommended (most desirable) values of TDS for potable water is 500 mg L−1 (WHO 1983). 33.5 % (n = 5) of the monitoring stations have TDS above 500 mg L−1 during low stream flow. Variation in TDS may be related to land use and pollution (Gaillardet et al. 1999; Nosrati and Van Den Eeckhaut 2012) and can be used to indicate the influence of human activities on water chemistry (Han and Liu 2004).

Determination of water quality factors

For the two low and high flow periods separately, PCA was performed on the normalized data sets to identify the factors replacing the most important variables. Factors with eigenvalues of 1.0 or greater are considered significant and factors with the highest eigenvalues are the most significant. The results of principal component analysis showed that the first three principal components (PCs) with eigenvalues >1, accounted for >77 % of variability in water quality in low flow period (Table 3). Communalities for water quality indicate these three factors explained >90 % of variance in SAR, Na+, EC and TDS; >80 % in Mg+2, Ca+2 and Cl; >60 % in SO −24 , HCO3 and pH; <35 % in K+ (Table 3). A high communality estimate suggests that a high portion of variance was explained by the factor; therefore, it would get higher preference over a low communality estimate. Thus, K+ was the least important attribute due to the lowest communality estimates in low flow period.

Table 3 Proportion of variance using varimax rotation and communality estimates of water quality parameters for low and high flows

For the low flow period data set, PC1 explained the largest proportion (51.96 %) of total variance. PC1 had a strong positive loading (>0.75) on SAR, Na+, Cl, EC and TDS, and a moderate positive loading (0.5–0.750) on HCO3 (Table 3). Factor 1 represents the salinity of water which can be explained by natural and anthropogenic processes. The leaching of soil material, mixing of existing salts in soil, and high evaporation and evapotranspiration rates resulted in very high concentrations of ions that contribute to an increase of TDS and to a further deterioration of the water quality. Salinity is the total amount of inorganic solid material dissolved in any natural water, and water salinization refers to an increase in TDS and in the overall chemical content of the water. There are many natural sources such as atmospheric deposition, interactions between soil or rock and water, and salt water intrusion that can contribute to sodium and chloride concentrations. Chloride can be also enriched in natural waters due to the weathering of granites and magmatic rocks. HCO3 exhibits moderate positive loadings on both factors PC1 and PC2. This means that the variability of the HCO3 in the study area is affected by two distinct processes. In order to explanation the spatial distribution of factor PC1, factor score coefficients were calculated for variables. These coefficients represent the weights that are used when computing factor scores from the variables. HCO3 had a lower factor score coefficient (0.04) for PC1 compared to PC2 (0.18). Thus, it can be concluded that HCO3 is more important parameter in PC2.

PC2 explained a significant proportion (15.96 %) of the total variance, had strong positive loading on Mg2+ and Ca2+, and had moderate positive loading on SO4 2− and HCO3 (Table 3). Factor 2 represents the natural hydrogeochemical evolution of water by groundwater-geological interaction which can be explained by the dissolution of rocks and minerals in sediments by chemical weathering. This factor explains the erosion from upland area during rainfall events. The dissolution of limestone and dolomite is possible source of Ca2+ and Mg2+. The SO4 2− sources in surface waters include: (1) atmospheric deposition (Wayland et al. 2003), (2) sulfate-bearing fertilizers and (3) bacterial oxidation of sulfur compounds (Sidle et al. 2000). Natural processes such as the dissolution of carbonate minerals and dissolution of atmospheric and soil CO2 gas could be a mechanism supplying HCO3 to the groundwater, which recharge the low flow.

PC3, explaining 9.85 % of total variance, had a strong positive loading on pH and a moderate negative loading on K+ (Table 3). The negative loading of K+ on PC3 indicates that the source of this parameter can be related to anthropogenic pollution sources, the result of different pollution sources such as effluents of domestic origin, septic tanks, fertilizers and pesticides application in agriculture.

The results of principal component analysis showed that the first three principal components (PCs) with eigenvalues >1, accounted for >67 % of variability in water quality in high flow period. Communalities for water quality indicate these three factors explained >70 % of variance for 7 variables with the exception being for K+, HCO3 , Cl, and EC (Table 3). Thus, those four variables were the least important attribute due to the lowest communality estimates. For high flow period, PC1 explained the largest proportion (40.1 %) of the total variance had a strong positive loading on SAR, Na+, Cl, and TDS, and a moderate positive loading on SO4 2− (Table 3). Dissolution of gypsum and sodium sulphate minerals could increase SO4 2− concentration in water. However there is no relationship between SO4 2− and Ca2+ or Na+ indicating that the excess of SO4 2− in this period mostly result from the leaching of fertilizers, pesticides and increasing air pollution. Chlorine may be derived from pollution sources such as effluents of industrial and domestic origin, fertilizers and septic tanks, indicating anthropogenic pollution sources (Ritzi et al. 1993).

PC2 explained significant proportion (16.33 %) of the total variance, had strong positive loading on Mg2+ and Ca2+, and had a moderate positive loading on SO4 2− (Table 3). PC2 represents a hydrochemical processes that lead to high Mg2+ and Ca2+ concentrations. Associations between Mg2+ and Ca2+ suggest dissolution of calcite and dolomite affected by erosion and deposition from upland area. SO4 2− exhibits moderate positive loadings on both PC1 and PC2. The factor score coefficients of SO4 2− in factors PC1 and PC2 are 0.11 and 0.24 respectively. Thus, it can be concluded that SO4 2− in PC2 is more important parameter.

PC3, explaining 10.96 % of the total variance, had a strong negative loading on pH, had a moderate positive loading on K+ (Table 3); and represents a hydrochemical processes that lead to high K+ concentrations. The negative correlation with pH indicates that introduction of H+ into the water is not due to natural dissolution from soil or rock and the source of this parameter can be related to atmospheric pollution.

Overall, the intensive agriculture practiced in the study area affects all parameters included in the analysis in two low and high flow periods. Irrigation with local surface and groundwater induces a groundwater cycle that increases the salinity in the upper aquifer through irrigation return flow, thereby increasing the concentrations of all ions present in solution. Application of PCA in our study area shows that the dominant factors are explained by the following processes: soil–groundwater interactions, and agricultural, industrial and atmospheric pollution.

Identification of water quality indicators

Surface water measurements that were not significantly affected by low and high stream flow types within the study area (pH and K+) were excluded from further consideration as possible candidates to identify surface water quality indicators. DA was done with the two low and high stream flow periods as grouping variable and the 9 retained water quality parameters as independent variables to remove redundant variables. A significant result was obtained, independent of the discriminant function removal method used (Table 4). Thus, the set of water quality parameters used clearly allow discriminating between the two low and high stream flow periods.

Table 4 Chi-square tests with successive roots removed for discriminant analysis applied to the two low and high streamflow periods of monitoring stations

The Mahalanobis distances between two categories are significantly different. However, F values of the backward mode (F = 12.0, p < 0.0001) are significantly higher as compared to standard (F = 4.4, p < 0.0001) and forward mode (F = 6.5, p < 0.0001).

The standard and forward stepwise discriminant analysis mode yielded classification matrices assigning ca. 63 % of the cases correctly including 9 and 6 parameters respectively, (Tables 5, 6). However, in the backward stepwise mode, 60 % of the low and high stream flow periods were correctly classified by a model using only three discriminant parameters, Mg2+, Ca2+ and HCO3 (Tables 5, 6). Thus, the DA results suggest that Mg2+, Ca2+ and HCO3 are the most significant parameters to discriminate between the two low and high stream flow periods.

Table 5 Classification functions resulting from discriminant analysis applied to the two low and high stream flow periods of monitoring stations
Table 6 Classification matrix resulting from discriminant analysis applied to the two low and high stream flow periods of monitoring stations

Water quality in relation to hydrological characteristics

The resulting variance components and mixed model contains a fixed part and a random part. In the models, the random part contains a site-effect, i.e. I expect that part of the variance that cannot be explained by the independent variables and their interactions is due to the fact that differences between drainage basin characteristics of monitoring stations are not entirely accounted for by the properties that were included in the statistical analysis. Therefore, it would be possible that changes in land and resource use will have a comparable or greater effect on water quality than changes in hydrological characteristics. The intraclass correlation coefficient for stations effects as random effect on water quality parameters was computed as the ratio of the estimated variance component for the station to the total error variance, indicating that 21, 18 and 24 % of the non-explained variation in Ca2+, HCO3 and Mg2+ is accounted for by the station effect, respectively. Root mean squared error (RMSE) of the model for Ca2+, HCO3 and Mg2+ were 18.9, 52 and 10.2 mg L−1, respectively. The observed and predicted values for the natural logarithm of Ca2+, HCO3 and Mg2+ were plotted in Fig. 2.

Fig. 2
figure 2

Scatterplot of observed versus predicted data points (fixed + random effects) for Ca2+, HCO3 and Mg2+. MAE mean absolute error; r Pearson correlation coefficient (p < 0.05)

Table 7 gives information about the significance of effects and models performance for the prediction of the natural logarithm of water quality when no random effect (station effect) is considered. Backward stepwise general regression analyses for all water quality data showed that 7-day low flow (Dstreamflow = 1 for low flow and Dstreamflow = 0 for high flow) and specific discharge (q) entered as significant parameters (Table 7).

Table 7 Backward stepwise general regression analysis for water quality parameters with hydrological characteristics (n = 480)

The predicted values for ln[Ca2+], ln[HCO3 ] and ln[Mg2+] are plotted against the observed values for ln[Ca2+], ln[HCO3 ] and ln[Mg2+] in Fig. 3. RMSE of the model for Ca2+, HCO3 and Mg2+ were 31.1, 57.1 and 15.7 mg L−1, respectively. These regression models identified drought and specific discharge as the main factors. The t statistics shows that the specific discharge is the most important variable in the models (Table 7). Therefore, water quality parameters are perhaps influenced by factors controlling the seasonal supply of physico-chemical water quality. Dakova et al. (2000) found the best statistically significant correlation between hydrobiological indexes and discharge during the low flow period. Elsdon et al. (2009) detected minimal differences in water quality between land uses during a period of extensive drought.

Fig. 3
figure 3

Scatterplot of observed versus predicted data points (fixed effects only) for Ca2+, HCO3 and Mg2+. MAE mean absolute error, r Pearson correlation coefficient (p < 0.05)

Global climate change projections indicate changes in rainfall, causing increased frequency and severity of low flow in some regions (Sheffield and Wood 2008). Low flow conditions are determined by a suite of natural and anthropogenic factors and are an integral part of every river regime (Smakhtin 2001). The basis for estimating low flows is, therefore, of crucial importance for protection of water quality. Decreases in water levels due to drought can affect catchment functioning (including partitioning, storage, and release of water), throughout the following year or even for several years if the drought occurs in a larger area. A 1-year drought not only causes water level decreases, but also results in many other changes. When water levels decrease, solutes become more concentrated as the amounts of water decrease in rivers. This pattern is consistent with the results of the present study, as shown by the increase of Mg2+, Ca2+, HCO3 , SO4 2−, Cl, EC, TDS and Na+ concentrations during low flow (Table 2). The concentration increase is hypothesized to be associated with evaporation from rivers and the ground surface, as well as the increase of residence and contact of waters with soils during recharge and during discharge of groundwater into rivers (Caruso 2002; Murdoch et al. 2000). These results demonstrate that water quality degrades under low flow conditions, and in the context of a climate change increase in drought conditions, leads to an increase of at risk situations related to potential health impacts (Delpla et al. 2009).

Conclusions

Differences in water quality were detected between low and high stream flow periods but the differences are not the same for each constituent. Climate change resulting in more intense and frequent droughts could cause considerably lower stream flows and consequently have effects on surface water quality, mainly increasing constituent concentrations. Also the chemical characteristics of groundwater could have a principal role in influencing low flow chemistry. The results of the mixed models containing a site-effect showed that part of the variance that cannot be explained by the independent variables and their interactions is due to the fact that differences between drainage basin characteristics of monitoring stations are not entirely accounted for by the properties that were included in the statistical analysis. It means that the significant differences in water quality can be explained by anthropogenic influences such as land-use changes. The intensive agriculture practiced in the study area affects all parameters included in the analysis. Irrigation with local surface and groundwater increases the concentrations of all ions present in solution. Urban, municipal and industrial land use types also can affect the quality of surface water. Consequently, the quality of surface water is affected by many factors including precipitation and characteristics of the catchment area. This study confirms that multivariate statistical techniques can be adopted for analysis and interpretation of complex data sets of water parameters in surface water quality assessment and in the identification of important factors. Further research should investigate the response of water quality parameters to drought conditions under land use changes and future climate change scenarios.