Introduction

A river is a well-defined geomorphological structure encompassing the main stream and several tributaries with a unidirectional flow of substantial loads of dissolved and suspended matter from both natural and human-induced activities (Shrestha and Kazama 2007). However, the present global scenario of intense demographic growth and urban practices has resulted in degrading water resources, calling for instant remedial actions by construing a precise study of its physical characteristics and water quality. Studies on the physicochemical characteristics of rivers validate the significant processes responsible for its degradation (Bricker and Jones 1995; Bengraine and Marthaba 2003). Influential factors that determine the river water quality include precipitation inputs, erosion and weathering of crustal materials as well as human-induced activities like urban development and expansion, industrial and agricultural practices (Carpenter et al. 1998; Muangthong and Shrestha 2015), whereas many processes like synergies of pollution loads, hydrological characteristics, sediment and metabolic activities in the water are considered to regulate the water quality gradients (Jung et al. 2016).

In the recent decades, assessment of river water quality has gained scientific interests for its high demand in human consumption and aquatic health. Water quality datasets are challenging to interpret because of the perplexing procedures of the river systems with many uncertainties and prospective relationships among its properties and monitoring sites (Gaume et al. 1998). Several studies have been carried out using multivariate statistical techniques to understand the spatial and temporal patterns of river water quality which is useful for interpretation and effective management of these ecosystems (Perkinns and Underwood 2000; Voutsa et al. 2001; Ouyang 2005; Ouyang et al. 2006; Yang et al. 2010; Dobsa et al. 2014; Li et al. 2015). The river water quality is interpreted using various approaches such as computational models and mathematical formulas, and the most efficient one is through the integration of water quality parameters into one single index (Boyacioglu 2007; Sowlat 2011; Bakan et al. 2010; Wanda et al. 2016).

In Mexico, River Atoyac in Puebla is stated as one of the most contaminated rivers of the country (CONAGUA 2010) and it is extremely impacted by the wastewater discharges from urban, agriculture and industrial sources. Puebla and Tlaxcala states ~120 km away from the megalopolis Mexico City in the center of the country are the fourth largest metropolitan area of Mexico, and it forms the major industrial corridor. Globally, these states form the major grounds for automobile manufacturing and they are the foremost site for the origin of textile industry in Latin America which favored an increase in the demographic growth (154.6%) during the last two decades (INEGI 2010). The region also marks its significance by the presence of both active (Popocatepetl) and extinct volcanoes (Iztaccihuatl and La Malinche) leading to increased turbidity levels, alkalinity and other soluble elements, thereby altering its physicochemical properties (Stewart et al. 2006). This highly influential region from both anthropogenic and natural activities with resulting imbalance of its ecosystems is the strategic factors to commence the present study.

The main objective was to evaluate the seasonal variability (dry, rainy and winter) of the physicochemical parameters in 22 sampling sites (66 number of samples in total) of Atoyac River basin, Central Mexico, using multivariate statistical methods. For this study, statistical methods like correlation matrices, factor analysis (FA) and cluster analysis (CA) were carried out to characterize and assess the variations in surface water quality and to estimate the spatial and seasonal differences caused by natural and anthropogenic factors. An attempt has been made for integrating measured water quality parameters into algorithms for easy interpretation of water quality in Atoyac River basin. The database generated from the present study served as a background for implementing real-time monitoring stations along the Atoyac River, first of its kind in Latin America.

Materials and methods

Study area

The Atoyac River basin in Central Mexico constitutes Zahuapan and Atoyac Rivers that flows through the rural, urban, agricultural and industrial regions of Tlaxcala and Puebla states where it finally drains into the Manuel Ávila Camacho (Valsequillo) dam (Fig. 1). The Atoyac River basin has an extension of 4395 km2, flanked by Volcano Iztaccihuatl and Popocateptl (active) in the western side and Volcano Malinche in the eastern side. The Zahuapan River is fed by the runoff from Sierra de Tlaxco in the north, from inland waters and the Malinche volcano. The Atoyac River originates from the snow melts of Iztaccihuatl volcano with an altitude of 3250 m in the source, 2250 m elevation in the confluence zone and finally devalues to a height of 2000 m (above mean sea level) in the Valsequillo dam. The cross section of Atoyac River basin varies in all the three zones (the upstream, the confluence and downstream zones). The width ranges from 20 to 60 m in R. Atoyac, 25 m in R. Zahuapan and 15–30 m in the confluence zone of both the rivers. The depth of the water column is approximately 1–2 m in R. Atoyac, 1.5 m in R. Zahuapan and 1–3 m in the confluence zone (Rodríguez-Espinosa et al. 2014). The width of Valsequillo dam ranges 2–7 km with a depth of 40 m. The river basin experiences sub-humid climate with an average annual precipitation of 800 mm and temperature of 22 °C. The dry season corresponds to the months of March–May, the rainy season from June–September and winter during October–February (National Meteorological Service 2010).

Fig. 1
figure 1

Study area map illustrating sampling locations from Zahuapan River (sample nos. 1–3), Atoyac River (sample nos. 4–6), confluence zone (sample nos. 7–15) and Valsequillo dam (sample nos. 16–22), México

The basement of this river basin constitutes Paleozoic metamorphic rocks of the Acatlan complex and Mesozoic terrigenous and calcareous rocks (Von Erffa et al. 1977; Ortega-Gutierrez 1978, 1993; Mooser et al. 1996; González-Mancera et al. 2009). The most striking feature of this river basin is the presence of Neogene–Quaternary stratovolcanoes and mountain ridges of Upper Cretaceous limestone filled with volcanic tuffs, lahars, lava flows, cinder cones, lacustrine fluvial deposits and reworked glacial-fluvial materials (Morales-Ramírez et al. 2003).

The Atoyac River is well known for its extensive use in residential, agricultural and industrial activities located in the Puebla city with a population of around 3,303,679 inhabitants residing in the river basin (INEGI 2010). The agricultural activity is characterized with production of corn, vegetables, alfalfa, fodder and beans (IMTA 2005), whereas the industries in this zone include metallurgy (43%), machinery, heavy equipment, food sector (25%), textiles (14%), clothing, leather, chemicals (10%), oil, rubber, plastics, timber products (3%) and other industries (1%) (INEGI 2010).

Sampling and analytical techniques

A total number of 66 river water samples were collected from 22 monitoring sites for three different seasons: dry (April), rainy (September) and winter (February) at Zahuapan River (Station nos. 1–3), Atoyac River (Station nos. 4–6), confluence zone (Station nos. 7–15) and Valsequillo dam (Station nos. 16–22) during the period of 2013–2014. The sampling stations were designated in such way that they cover a wide range of factors like residential drains, industrial discharges, natural soil erosion and influence of volcanoes in all the three sections of the Atoyac River basin. Samples were collected in 5-L polyethylene sterilized bottles and stored in an insulated ice cooled container and were transferred to the laboratory on the same day. Sampling, preservation and transportation of the water samples to the laboratory were based on the standard methods prescribed by APHA (1992).

The physicochemical parameters such as temperature (T), hydrogen potential (pH), conductivity (λ), dissolved oxygen (DO), spectral absorption coefficient (SAC), oxidation reduction potential (ORP), turbidity (TURB) were measured in situ, whereas 5-day biochemical oxygen demand (BOD5), chemical oxygen demand (COD) and total suspended solids (TSS) were analyzed in laboratory. Temperature (°C), pH and conductivity (µS/cm) were measured using the instrument Conductronic/PC18 calibrated at 25 °C, dissolved oxygen (mg/L) with YSI/51B, Spectral Absorption Coefficient (SAC) (A/m at 254 nm) which measures the dissolved organic substances were estimated using a Spectrophotometry UV/Vis (PerkinElmer Lambda 20), oxidation reduction potential (mV) and turbidity (NTU) using HACH (Model no: 2100Q & HQ40D). The accuracy of the parameters was ±0.3 °C for temperature, ±0.02 for pH, −0.03 mg/L for DO and ±1.5 µS/cm for conductivity. BOD5, COD and TSS were determined following the standard protocols of APHA (1992) in laboratory. The analytical data quality was guaranteed through careful standardization, procedural blank measurements, spiked and duplicate samples.

Statistical analysis

The entire data set was computed for multivariate statistical techniques such as correlation matrix, factor analysis and cluster analysis using Statistica software (version 8). In order to evaluate the relationships between water quality parameters at the sampling sites, correlation analysis was conducted for ten parameters in 22 sampling stations for three different seasons (dry, rainy and winter) with p < 0.5, 0.01, 0.001. Factor analysis (FA) involving varimax normalization generated two distinct factors for dry and three distinct factors for rainy and winter season with p values (>0.7) assisting in understanding the processes and associations. In order to detect interrelationships between the sampling sites based on physicochemical parameters, cluster analysis (CA) was performed.

Results and discussion

The results for the physicochemical parameters of three different seasons (dry, rainy and winter) are presented in Fig. 2a–j.

Fig. 2
figure 2

aj Variations in physicochemical parameters of three different seasons (dry, rainy and winter) at 22 sampling sites in Atoyac River basin, Mexico

Temperature and pH

Mean temperature values ranged between 16.1 to 23.60 °C all along the stream. High temperatures observed (all values in °C) in the sampling Stations 8 (27.3), 22 (28.5), 13 (22.7) and 14 (22.7) during the dry season are attributed to the influential impact of effluents from the surrounding industrial zones of tannery and leather, metal based and textiles including the domestic sewage from the urban regions, resulting in direct inputs of heat to the river system (El Morhit and Mouhir 2014). Seasonally, the values of temperature varied between 13 and 27.3 °C in the dry season, 16.9–28.5 °C during the rains and 13.8–22.7 °C in the winter season. The low temperatures of 13 °C in dry, 16.9 °C in rainy and 13.8 °C in winter seasons were presented by the stations closer to the origin of Zahaupan and Atoyac Rivers in the highlands.

The pH values were observed to be similar in all the stations except Station nos. 8, 19, 20 and 22. The sudden increase in the pH value (10.24 in dry season) in Station no. 8 is due to the influence of occasional discharge of large amounts of wastewaters from the textile industries that are strongly affected by the dyeing, scouring, bleaching and washing processes. Similar high values of pH (10–11.5) (Chhonkar et al. 2000) are also observed in some of the rivers in the Asian continent (River Pali in India), which is influenced and affected by the different types of dyes and other materials (e.g., cotton, synthetic, etc.) (Nordin et al. 2013; Sharma et al. 2013). High pH values (Station no. 20: 8.08, Station no. 19: 8.13, Station no. 22: 8.28) presented by the dam stations are linked to the presence of warm waters and intense photosynthetic activity of macrophytes resulting in high production of free CO2 and therefore making the waters alkaline in nature (Saxsena and Saksena 2012). Temporally, the average potential ion activity of hydrogen (pH) in the waters of the Atoyac River basin was relatively high in the winter season (8.01) than the pH of 7.81 during dry period and 7.67 through the rainy season.

Conductivity and DO

The confluence zone (Station nos. 6–15) exhibited high average conductivity (1030.5 µS/cm) compared to the upstream (766.54 µS/cm) and dam (733.94 µS/cm) sections attesting the impact of discharges from industrial activities. Extremely high conductivity of 1870 µS/cm observed in Station no. 4 is due to the effect of ionic pollutants from the industrial (textile, petrochemical and automobile) and agricultural activities during the dry season. Variations in the conductivity values also depend upon the bedrock of the area as in places underlined by granites have lesser conductivities as observed in the stations of Zahaupan River underlined by ignimbrites [avg values in µS/cm: (874) dry; (321.67) rainy; (800.67) winter] and in regions of limestones, chlorides, clay minerals, phosphates and nitrate demonstrate higher values as observed in the stations of Atoyac River and its confluence [avg values in µS/cm: (1375) dry; (502.67) rainy; (1322.89) winter]. In the dam section, conductivity values [avg values in µS/cm: (775) dry; (693) rainy; (734) winter] were observed to be lower than the upstream section due to the absence of industrial inputs that lead to increased conductivity values by increased amounts of dissolved ions liberated from organic and inorganic wastes (Wright 1982; Singh et al. 2013). In general, conductivity values (all values in µS/cm) were higher during the dry season (avg. 1094.32) compared to the rainy (avg. 524.82) and winter (avg. 1009.36) seasons, respectively.

Dissolved oxygen conditions ranged from anoxic level (all values in mg/L) 0 in dry period (Station no. 9) and high value 21 in the winter (Station no. 22). In all three seasons, high values (all values in mg/L) of 3.8 (dry), 7.76 (rainy) and 21 (winter) were observed in the dam sites, which is attributed to the high photosynthetic production by the macrophytes, resulting in supersaturated DO levels (Ahmed 2014). Anoxic and low values (all values in mg/L) of DO (dry: 0; winter: 0.58; rainy: 1.8) in the confluence zone of Atoyac and Zahaupan Rivers are due to the effect of high oxygen demand by organic matter in the sewage wastes from domestic and industrial activities (Paerl et al. 1998).

Upstream side demonstrated turbidity values of 58.24 NTU, where Station no. 4 presented high values of 179 NTU which is due to the impact of waste discharges from the adjoining industries (textile, plastic, autoparts and petrochemical), whereas less turbidities (all values in NTU) in all the three seasons (dry: 1.5; rainy: 0.41; winter: 3) were observed in the reservoir due to the decrease in the velocity of waters as suspended sediment particles settle down (Skalak et al. 2013). Dry and winter seasons experienced maximum turbidities of 381.7 NTU and 346 NTU, respectively, when the dilution effect from rains and erosion of soils was minimal increasing the presence of individual particles.

SAC and ORP

SAC measurements are used to identify the aquatic spectral reflectance and validate the photochemical and photobiological characteristics of natural waters (Mitchell et al. 2002). The spatial variation of SAC values along the river showed peak values at the Stations 14, 13, 4, 12, 8, experiencing high influential impact from the industrial and municipal sewages. Seasonally, they were detected to be high during winter (avg. 51.53 A/m) and similar values were observed in dry (avg. 31.37 A/m) and rainy seasons (avg. 38.38 A/m). The higher absorption coefficients in the dry season exhibited by the sampling stations closer to the industrial discharge points are attributed to the impact of raw settled waste water, whereas during rainy seasons, high amounts of attenuation of the waters as a consequence of dilution effect resulted in lower values (Reynolds and Ahmad 1997).

Negative ORP value observed in Station no. 15 (−370.1 mV) during dry season is due to the influence of effluents from textile and automobile industries containing high amounts of metal waste that result in reducing conditions of the substratum and decreasing the ORP value (Horne and Goldman 1994; Kiran Kumar et al. 2016). High positive values in the dam (Station no. 16: 134.70 mV) are attributed to the presence of large number of macrophytes and high DO levels resulting in an oxidizing environment. Temporal variations in the redox potential conditions of the studied river basin were deduced to be (all values in mV) winter: 165.49; rainy: 76.85; dry: −119.25, where high values in winter are due to high amounts of dissolved oxygen.

COD, BOD5 and TSS

COD and BOD5 measurements are used as repetitive proxy tests to estimate the load of organic carbon in the river systems. In the present study, COD values (127.82 mg/L) were observed to be higher than those of the BOD5 values (80.43 mg/L), as BOD5 measures only the biodegradable constituents. Spatially, elevated levels of COD (815–194.87 mg/L) were presented by the stations at the confluence zone, dominated by the presence of industries. High COD in the dry period (avg. 250.36 mg/L) and low values in the rainy season (avg. 144.73 mg/L) signal to the high amounts of oxygen required to oxidize all organic material in an untreated waste effluent (Cieszynska et al. 2012). BOD5 measurements also exhibited similar pattern of COD at Stations 8, 4, 11, 15, 14. However, raised levels of BOD5 in the dry season (avg. 124.41 mg/L) indicate the extent of organic pollution in aquatic systems and the liable organic matter, which undergoes biotic decomposition (Jonnalagadda and Mhere 2001).

Spatial variations of average TSS levels ranged between 13 mg/L in the dam; 188.6 mg/L in the upstream and 456.3 mg/L after the confluence zone. Lower values of TSS observed in the dam section are due to the fact that in lesser velocities, the suspended solids tend to settle down (Davidson and Summerfelt 2004). The seasonal variations of TSS (total suspended solids) observed in the study presented high values in the rainy season (avg. 644.41 mg/L) and was higher (all values in mg/L) at Station nos. 11 (2996), 8 (2009), 14 (1730), 9 (1234) due to intense precipitation, soil erosion and increased generation of industrial (food processing) and domestic runoffs (Nasrabadi et al. 2016). Comparatively, very less values of TSS were observed in the dry (avg. 67.5 mg/L) and winter (avg. 51.73 mg/L) seasons due to less influential impact of erosional processes and the channels bringing in residues from the human-induced activities (Pimentel 2006).

Statistical analysis

Correlation analysis

The relationship between water quality parameters for three different seasons (dry, rainy and winter) was evaluated using correlation matrix analysis (Table 1). Dissolved oxygen and the hydrogen potential activity (pH) exhibited a positive correlation [r 2 = 0.44 (dry), 0.78 (rainy) and 0.83 (winter)], which is attributed to the increased decomposition activity in the waters resulting in high carbon dioxide content accounting for a well-defined oxygen status (Araoye 2009). The values of turbidity and total suspended solids were strongly interrelated for all the three seasons (r 2 = 0.75, 0.96, 0.46), where r 2 = 0.96 during the rainy season are slightly high compared to the dry season (r 2 = 0.75) and it is inferred to intense precipitation and consequent soil erosion as well as generation of urban and agricultural runoff (Bakar et al. 2007). Negative correlations of turbidity and dissolved oxygen (r 2 = −0.35) dry, (r 2 = −0.08) rainy, (r 2 = −0.42) winter are inversely related due to the fact that the increase in turbidity might decrease the transparency of water leading to the depletion of DO which affects the river water quality and its aquatic life (Sun et al. 2016). The strong positive correlation between SAC vs BOD5 (r 2 = 0.64, 0.82), COD (r 2 = 0.68, 0.95) and TSS (r 2 = 0.57, 0.81) in dry and winter seasons, respectively, is due to the high impacted loads of biodegradable and non-biodegradable pollutants (Reynolds and Ahmad 1997). The low significant correlations between SAC vs BOD5 (r 2 = −0.16), COD (r 2 = 0.24) and TSS (r 2 = 0.42) during rainy season are due to the dilution effect (Turk et al. 2010). High positive correlations of turbidity with COD (r 2 = 0.80, 0.94, 0.72), BOD5 (r 2 = 0.83, 0.34, 0.88) and TSS (r 2 = 0.75, 0.96, 0.46) in all seasons validated the presence of increased loads of pollutants in the river basin.

Table 1 Correlation matrix of physicochemical parameters for three different seasons (dry, rainy & winter) in Atoyac river basin, Mexico

Factor analysis

Factor analysis (FA) was carried out for a better inference of the possible influences and associations of the physicochemical characteristics of the water samples. In order to ease the interpretation, the analysis was independently executed for the three different seasons with cumulative  % of 75.04 (dry), 76.22 (rainy) and 79.96 (winter) (Table 2). The factor loadings were categorized as strong, moderate and weak corresponding to the absolute values of the variance >0.75, 0.75–0.50 and 0.50–0.30, respectively (Liu et al. 2003). Varifactor 1 (VF1) is explained a total variance of 55.78% which is during the dry period, with strong negative loadings on λ (−0.83), TURB (−0.77), SAC (−0.77), BOD5 (−0.95), COD (−0.96) and TSS (−0.93), moderate negative loading on pH and ORP and a weak negative loading on temperature, respectively. This factor mainly signifies the contribution of pollutants from the untreated sewage flows generated from the domestic, agricultural and industrial activities (Juahir et al. 2011). The total variance of 42.01% in the rainy season for VF1 demonstrated strong positive loading on temperature and conductivity (0.80; 0.80), while turbidity (−0.90), COD (−0.77) and TSS (−0.85) presented strong negative loadings and the associations of TURB, COD and TSS are associated with the erosion of soils and the increase in total suspended solids in the river system during rainy season (Pejman et al. 2009). The negative relationship between temperature, conductivity and other factors like TURB, COD and TSS is due to the presence of large quantities of suspended solids, which adsorb and scatter sunlight determining the extinct of solar radiation (Paaijmans et al. 2008). The VF1 during the winter season explained a total variance of 49.85%, with significant strong negative loadings on λ (−0.84), TURB (−0.94), SAC (−0.79), BOD5 (−0.88), COD (−0.94) and TSS (−0.79), moderate positive loading on DO and a weak loading on ORP. The negative association of DO is due to the high BOD5 levels and presence of organic pollutants that consume the available oxygen for decomposition (Waziri and Ogugbuaja 2010). VF2 explained a total variance of 19.26% (dry), 23.13% (rainy) and 19.64% (winter), all the three seasons denoted a strong negative loading on DO and pH (rainy and winter). As there is decrease in pH, the activity of H+ increases results in the reaction of hydrogen ions and oxygen leading to a depletion of DO (Zang et al. 2011). In the VF3, a total variance of 11.08 and 10.46% was observed in the rainy and winter seasons, respectively. In the rainy season, moderate negative loading was presented by COD, whereas weak positive loading on TURB and SAC is due to the variability in the spectral absorbance for interference by the presence of huge loads of biodegradable and non-biodegradable pollutants. Thus, the factorial analysis supports the fact that the strong negative loadings (>75%) on λ, COD, BOD5, SAC, TURB and TSS affect the health of the river system and significantly contribute to the evaluation of the water quality (Ajorlo et al. 2013).

Cluster analysis

A dendrogram was rendered (Fig. 3a–c) by grouping 22 sampling stations into three statistically significant clusters for three different seasons (dry, rainy and winter). In the dry season (Fig. 3a), three clusters (C1, C2, C3) were obtained and Cluster 1 (C1) presented the groupings of the most contaminated stations (4, 8, 11, 13, 14) with similar characteristics of λ, TURB, SAC, COD, BOD5 and TSS suggested the influence of direct discharges from residential and industrial units that affect the water chemistry and aquatic life (Jingsheng et al. 2006). However, in Cluster 2 (C2) the stations with related negative redox potential conditions were grouped attributing the fact that reductive environments are best sites for the accumulation of contaminants (Chan et al. 2003). In Cluster 3 (C3), the sampling stations of Valsequillo dam were grouped for its less influential impact of toxins in the water column. Majority of sampling sites (13 stations) were linked together in C3 during the rainy season (Fig. 3b) as they presented low values of TSS, whereas the Stations (8, 11, 13, 14) in the urbanized and industrial zones signified high turbidities especially during the rainy season when there is high impact of runoff and flushing of soil salts (Garizi et al. 2011). The dendrogram for winter presented three clusters, where C2 included all the stations as the water holds high amounts of dissolved oxygen (5.91 mg/L) during winter season (Sutherland and O’Neill 2016) resulting in a better water quality conditions than the dry and rainy seasons.

Fig. 3
figure 3

ac Dendogram of the 22 sampling stations using hierarchical cluster analysis based on the water quality of the Atoyac River basin, Mexico

Algorithms for evaluation of water quality in Atoyac River basin

Algorithms were proposed to characterize the water quality of Atoyac River basin based on in situ (T, pH, λ, DO, SAC, ORP, TURB) and laboratory measurements (BOD5, COD and TSS) which assists in easy interpretation of water quality data. They are expressed as follows:

Algorithm for field recorded physicochemical parameters (AFRPP):

$${\text{AFRPP}} = \mathop \sum \limits_{i = 1}^{n} \left( {\left| {\frac{{{\text{pH}}_{0} - {\text{pH}}_{n} }}{{{\text{pH}}_{0} }}} \right|} \right) + \left( {\left| {\frac{{\lambda_{0} - \lambda_{n} }}{{\lambda_{0} }}} \right|} \right) + \left( {\left| {\frac{{{\text{SAC}}_{0} - {\text{SAC}}_{n} }}{{{\text{SAC}}_{0} }}} \right|} \right) + \left( {\left| {\frac{{T_{0} - T_{n} }}{{T_{0} }}} \right|} \right) + \left( {\left| {\frac{{{\text{ORP}}_{0} - {\text{ORP}}_{n} }}{{{\text{ORP}}_{0} }}} \right|} \right) + \left( {\left| {\frac{{{\text{DO}}_{0} - {\text{DO}}_{n} }}{{{\text{DO}}_{0} }}} \right|} \right) + \left( {\left| {\frac{{{\text{TURB}}_{0} - {\text{TURB}}_{n} }}{{{\text{TURB}}_{0} }}} \right|} \right)$$

Algorithm for laboratory-based physicochemical parameters (ALPP)

$${\text{ALPP}} = \mathop \sum \limits_{i = 1}^{n} \left( {\left| {\frac{{{{({\text{BOD}_5})}_0} - {{({\text{BOD}_5})}_{\text{n}}} }}{{{{({\text{BOD}_5})}_0} }}} \right|} \right) + \left( {\left| {\frac{{{\text{COD}}_{0} - {\text{COD}}_{n} }}{{{\text{COD}}_{0} }}} \right|} \right) + \left( {\left| {\frac{{{\text{TSS}}_{0} - {\text{TSS}}_{n} }}{{{\text{TSS}}_{0} }}} \right|} \right)$$

where pH0, λ 0, SAC0, T 0, ORP0, DO0, TURB0, (BOD5)0, COD0 and TSS0 are values of pristine waters collected in upper basin of Atoyac and Zahuapan Rivers (Fig. 1) and pH n , λ n , SAC n , T n , ORP n , DO n , TURB n , (BOD5)n, COD n and TSS n are the values corresponding to the evaluated stations.

The AFRPP and ALPP representing the water quality of the river with the values close to 0 indicate better water quality. The differences observed between the parameters defined for pristine waters and the measured sample are close to 0, only if the measured sample has not been altered by human-induced activities. Therefore, the summation of these parameters close to 0 recommends better water quality and the values greater than 0 refers to deteriorated water quality.

In the present study, these algorithms were applied and the resulting values are represented in Fig. 4a, b, respectively. Regions I and IV of the upstream and Valsequillo dam section of the river basin have similar characteristics to pristine or low contamination sites (COD 40–60 mg/L), whereas moderate contamination (COD 170–400 mg/L) was identified in region III (confluence with Zahuapan River) and in few sites of region II (Atoyac River). During the dry season, Station no. 8 presented significant values of a high alkaline pH of 10.25, BOD5 of 480 mg/L and COD of 1055 mg/L (Fig. 4a, b). A strong positive correlation was observed between AFRPP and ALPP for the 22 sampling stations, where r 2 = 0.84, 0.89 and 0.96 for dry, rainy and winter season, respectively (Fig. 4c). Thus, these generated field- and laboratory-based algorithms for Atoyac River basin enable the interpretation of the water quality data and can be implemented in the real-time monitoring stations for observing natural as well as external discharges.

Fig. 4
figure 4

 a Algorithm proposed for field recorded physicochemical parameters (AFRPP). b Algorithm for laboratory-based physicochemical parameters (ALPP). c Correlation of AFRPP and ALPP

Table 2 Factor loadings for 10 parameters at 22 sampling sites for three different seasons (dry, rainy & winter) in Atoyac river basin, Mexico

Comparison studies

Comparison studies were carried out with the water quality characteristics of other rivers worldwide (Table 3). The Atoyac River basin located in a volcanic summit zone also marked its influence in the water chemistry, where the deposition of airborne volcanic particulate matter (tephra) in the aqueous environments will initiate the dissolution of soluble accumulations resulting in alteration of its physicochemical characteristics (Martin-Del Pozzo et al. 2002). The higher values of pH (10.24), turbidity (381.7 NTU), BOD5 (480 mg/L), COD (1055 mg/L) and TSS (2996 mg/L) observed in the present study than compared to the other rivers are well supported by the high influential impact of the natural processes, volcanic activity, industries and domestic discharges. Distinct high values of SAC and ORP in Atoyac River basin confirmed its depriving state. Based on the water quality standards and comparison with other rivers, Atoyac River in Mexico is unequivocally unfit for domestic usage and aquatic life. The results suggest that the studies on surface water quality drive the need for a regular monitoring program for the improvement in river ecosystem and the identification of the possible sources of pollutants.

Table 3 Worldwide comparison of river water parameters

Conclusion

Increasing concerns on the degrading river water quality worldwide in recent years, it is mandatory to develop wide range of multivariate statistical techniques to analyze, use and interpret the datasets of river quality management. In this study, various approaches of correlation matrices, factor and cluster analyses had been integrated for the valuation, restoration and protection of the regional water quality. The physicochemical characteristics of the river basin revealed the influence of the small-scale changes associated with the active Popocatepetl Volcano, also supplemented by intense human activities. Hierarchical cluster analysis grouped the 22 sampling sites based on its similar water quality characteristics, which offers a reliable classification for designing an optimal future monitoring strategy. In addition, factor analysis assisted in identification of the pollution sources with strong relationships between λ, TURB, SAC, BOD5, COD and TSS. The results of factor analysis also evidently indicated that the governing factors which were responsible for the deterioration of the water quality include the pollutants from soluble salts (natural- erosion and runoff), organic, heavy metals (point sources: industries), toxic pollution (pharmaceuticals, textile) and from the domestic and agricultural activities. Deduced algorithms of field and laboratory measurements proved to be a tool in fast and economic evaluation of the river water quality. The studies also mentioned the fact that the sites with high degree of contamination are the ones that pass through the urban sector or are directly affected by the raw settled sewage. Comparison studies also presented the fact that this river basin with high fluctuations in its water quality is attributed for its industrial and volcanic setting. Thus, the present study illustrated the efficacy of multivariate statistical technique in validating the status of Atoyac River basin establishing a baseline for the installation of real-time monitoring stations along its course.