Introduction

The Indus River and their tributaries are one of Asia’s largest river systems. It originates from Tibet (northwestern foothill of the Himalayas). It is 3500 km long, out of which 1500 km flows through the Indian state of Jammu and Kashmir, and finally joins the Arabian Sea. It flows in between the Ladakh range and the Zanskar range, a high-altitude region of India. This river system is the lifeline for the civil populace for drinking water, agriculture purposes, etc., and therefore assessment of their water quality is gaining pace in recent time (State of Environment Report 2013). The seasonal and annual river flows are highly variable (Ahmad and Qadi 2011; Asianics Agro-Dev. International 2000). Annual peak flow occurs between June and late September, during the southwest monsoon. The high flows of the summer monsoon are augmented by snowmelt in the north that also transports a large volume of sediment from the mountains.

In this study area, local populations are thoroughly dependent on Indus River for drinking water, domestic usage, and irrigation purposes. Therefore, there are multiple factors affecting their water quality, and evaluating important factors is necessary to monitor their portability as per public health concern. Some of the earlier studies were conducted to analyze the river water quality (Charan 2013; Bharti et al. 2017). In these studies, it was indicated that the river water quality was deteriorating, but the possible factors were not determined. Studies conducted earlier have indicated that river water chemistry is characterized by the complex correlation among a range of the physicochemical and biological variables in water. Therefore, the present study was undertaken to identify important factors among large data set significantly affecting the quality of river water. Application of multivariate statistical techniques reveals such relationships using analytical techniques such as the principal component analysis (PCA), factor analysis (FA), and cluster analysis (CA). The execution of the multivariate statistical analysis with a large amount of data provides a reliable alternative approach for understanding and interpreting the complex system of water quality.

Materials and methods

Ethics statement

It has no requirements for any specific permits to conduct a field study as it is not related to any endangered or protected species.

Materials

Analytical grade chemicals (Merck, India) were used to analyze various parameters of the Indus River water. All glassware and other sample containers were rinsed with double-distilled water and sterilized prior to use.

Study area

Leh is the main district of the Ladakh region situated between 32°–36° north latitude and 75°–80° east longitude, at a height of 2300–5000 m above msl. From the climatic point of view, this region is characterized by both arctic and desert climates. Therefore, Ladakh is often called “COLD DESERT.”

In this region, the rocks are igneous, metamorphic, and sedimentary in nature. Lithologically, the soils of the study area are mainly of the sandy type, followed by silt and clay. Analysis of soil characteristics was carried out with a soil hydrometer (Model No 2151H Soil Hydrometer) according to the method described by Singh et al. (2005). The analyzed data showed that sand, silt, and clay constituted 80.73%, 12.83%, and 6.44% of the soil, respectively.

Sample collection

Twenty-eight water samples each of both the seasons (summer and winter) were collected from the Indus River between 10.00 and 12.00 h. All the sampling sites were located near the village and farmland. The water samples were collected at a depth of 10 cm and placed into 500-mL polypropylene bottles (Tarson Company). Samples were stored in the laboratory at the temperature of 0–4 °C for subsequent chemical analysis (Water Quality-Sampling—Part 11, 1992). The chemical measurements were taken in the laboratory within 24 h after the collection of the water samples. GPS readings were taken to identify the sampling locations with the help of Garmin GPS 72H. QGIS Desktop 3.0 (Fig. 1).

Fig. 1
figure 1

Sample collection sites made by QGIS software

Sample treatment and sample analysis

Samples were collected in two bottles. The sampling bottles were washed, rinsed with distilled water, and dried before use. For physicochemical analysis, water samples were preserved with toluene.

In situ parameters, such as temperature (TEMP), pH, electrical conductivity (EC), salinity, total dissolved solid (TDS), and dissolved oxygen (DO), were analyzed by using Hach SensION 156 (APHA 1998). Turbidity (TUR) was measured by using Hach portable turbidity meter (2100Q01) (APHA 1998). These parameters were recorded on the spot during the sample collection. Major anions such as carbonate (CO3) and bicarbonate (HCO3) were immediately analyzed with the titrimetric method (Singh et al. 2005). Hardness was measured with the titrimetric method (APHA 2012). The level of chloride (Cl) was detected by using Mohr’s method (APHA 2012). Other anions, such as sulfate (SO42−), nitrate (NO3), and orthophosphate (PO43−), were analyzed through the protocol as described by American Public Health Association (APHA 2012). All the parameters, units, analytical method, instruments, and references are listed in Table 1.

Table 1 Physicochemical parameters determined and analytical techniques used

E. coli in water samples were identified with the pour plate method as described in the Medical Laboratory Manual for Tropical Country. Water samples were aliquoted into sterile MacConkey agar plates and uniformly spread over the entire surface of the agar and incubated at 44 °C for 48 h. The total number of colonies of E. coli was counted, and the mean value of three replicates was calculated (MacConkey 1905).

For mineral analysis, water samples were digested on a mass-to-weight basis, using metal grade 69% nitric acid (HNO3), 60% perchloric acid (HClO4), and 35.40% hydrochloric acid (HCl). Samples were digested on 42 blocks of an Automated Hot Bock digestion system (Questron Technologies Inc., Canada). All the minerals were estimated in the digested water samples by inductively coupled plasma optical emission spectroscopy (ICP-OES) (Perkin-Elmer Analyst, Optima 7000 DV) (Charan et al. 2013). During the sample analysis by ICP-OES, plasma conditions were as follows: plasma flow 15 Lt/min, auxiliary gas flow 0.2 Lt/min, nebulizer gas flow 0.8 Lt/min, RF power 1300 W, and pump flow rate 1.5 mL/min.

Data treatment and statistical analysis

All the mathematical and statistical computations were made using Microsoft Office Excel 2007, Statistical Package for Social Sciences (SPSS) version 22, and Minitab 17 statistical packages. The data were standardized by using standard statistical procedures. The data were subjected to PCA to reduce the dimensionality of the data by explaining the correlations among a large number of variables in terms of a smaller number of underlying factors (principal components or PCs) and then applying R&Q mode varimax rotation for finding more clearly defined factors called varifactors or VFs after running the FA that facilitates interpretation of the data (Helena et al. 2000; Reghunath et al. 2002). Finally, Q-mode CA was carried out to identify the similarity among all the samples (Reghunath et al. 2002).

Statistical procedures

In PCA, eigenanalysis of the experimental data was performed to extract principal components (PCs) using two selection criteria: the scree plot test and corrected average eigenvalue. For hierarchical CA, the squared Euclidean distance between normalized data was used to measure the similarity between samples. Both average linkages between groups and Ward’s method were applied to standardized data, and the results obtained were represented in a dendrogram. All the mathematical and statistical computations were made using Microsoft Office Excel 2007, Statistical Package for Social Sciences (SPSS) version 22, and Minitab 17 statistical packages.

Data standardization

Kaiser–Meyer–Olkin (KMO) and Bartlett’s tests were used to determine the data suitability to execute the PCA (Child 2006). KMO is a measure of sample adequacy. If only KMO value is greater than 0.5, PCA can be used. Bartlett’s test measures the relationship between the variables at a significance level. In our study, the KMO value of the summer season was 0.676 and that of the winter season was 0.655 (Tables 23, respectively).

Table 2 Kaiser–Meyer–Olkin value (0.676) of summer season data, which is greater than 0.5, giving the suitability for PCA of the observation
Table 3 Kaiser–Meyer–Olkin value (0.655) of winter-season data, which is greater than 0.5, giving the suitability for PCA of the observation

Results and discussion

The present study also evaluated the status of various water quality parameters and revealed the seasonal variation. General descriptive statistics of all the parameters of the summer and winter season is shown in Tables 4 and 5, respectively. Table 6 reveals with the seasonal variation of all the parameters, and it was found that TDS, turbidity, chloride, alkalinity, and calcium hardness showed the significantly lower level in the winter season.

Table 4 General descriptive statistics of river water physicochemical and minerals of summer season
Table 5 General descriptive statistics of river water physicochemical and minerals of winter season
Table 6 Result after the execution of T test on the parameters of river water

All the 25 variables were run through the PCA, which extracted 7 and 8 variables based on the eigenvalues (> 1) in summer and winter seasons, respectively. The extracted and non-extracted variables of both seasons are listed in Tables 7 and 8, respectively. Scree plot is shown in Fig. 2.

Table 7 Extracted components based on eigenvalue after PCA of summer season data
Table 8 Extracted components based on eigenvalue after PCA of winter season
Fig. 2
figure 2

Scree plot of eigenvalues of physico-chemical variables of surface water of summer water in Leh, Jammu & Kashmir, India

R-mode factor analysis of all the parameters/variables of water samples was carried out for both seasons and is given in Tables 9 and 10. The analysis of the summer season data matrix generated seven factors that together account for 77.05% of the variance, whereas the analysis of the winter-season data matrix generated eight factors that account for 76.90% of the variance. The rotated loadings, eigenvalues, percentage of variance, and cumulative percentage of variance of all the factors of summer and winter seasons are given in Tables 7 and 8, respectively.

Table 9 Varimax-rotated R-mode factor loading matrix of summer season data

The first eigenvalue of the summer season factor analysis is 6.14, which accounts for 24.56% of the total variance, and these constitutes the first and main factor. The second and third varifactors have the eigenvalues of 5.24 and 2.84, respectively, which account for 20.95% and 11.36% of the total variance, respectively. The remaining four eigenvalues each constitute less than 10% of the total variance. However, in the case of winter-season factor analysis, the first and second varifactors contain the eigenvalues of 4.68 and 3.43, respectively, which account for 18.72% and 13.71% of the total variance, respectively. Except for the third varifactor (10.86%), eigenvalues of the remaining five varifactor reveal less than 10% of the total variance (Table 9).

In the present result of summer factor analysis, the first factor (which accounts for 24.56% of the total variance) is characterized by higher loadings of calcium (Ca), magnesium (Mg), iron (Fe), sodium (Na), potassium (K), and manganese (Mn) with moderate loadings of phosphate. This may be due to the influence of non-point sources, such as agricultural runoff or atmospheric deposition by natural weathering (Huang et al. 2013; Boutron et al. 1991; Bohlke et al. 2007). In the study area, chemical fertilizers are used by farmers in the summer season (Mann 2002; Acharya et al. 2012). For this reason, the phosphate loading is probably a moderate-type loading. The agricultural runoff or weathering process enhances the ion exchange and oxidation–reduction conditions. These cumulatively induce the nutrient solubility (Bohlke et al. 2007; Seiler et al. 2003). In this way, our finding of nutrient loading in the study area through agricultural runoff or atmospheric deposition may be possible (Huang et al. 2013).

The second factor (which accounts for 20.95% of the total variance) is characterized by very high loadings of calcium hardness (CaHard) and total hardness (ToHard), followed by higher loadings of chloride (Chl) and alkalinity (Alk). It is also revealed with the higher negative loading of temperature (Temp) followed by moderate loadings of E. coli. One of our previous studies showed that the total hardness level is high in river water due to the higher levels of calcium and magnesium entering the water, which might be due to the weather factor as higher negative loadings of temperature (Bharti et al. 2017; Nelson 2002; Grift et al. 2016). Higher loadings of Ca and Mg were seen in the first factor, and these are strongly related to our previous study (Bharti et al. 2017). High loadings of chloride might be from the dissolution of salts due to the weathering process or oxidation–reduction reaction (Sarin et al. 1989; Datta and Tyagi 1996; Liang et al. 2016). One of the previous studies in this study area on the soil had estimated that the soil alkalinity level is high (Charan 2013). The present study has shown moderate loading of alkalinity in the groundwater (Bharti et al. 2017). Because of a poor sanitation system, moderate loading of E. coli was found in this study area (Anonymous 2009; Affum et al. 2015). Meanwhile, Water Stewardship Information Series (2007) has documented that infiltration of domestic or wild animal fecal matter may act as a source of E. coli. River sites are highly affected by the presence of wild and domestic animals in this area. None of the factors of the winter-season data matrix (Table 10) show any loadings of E. coli. This strongly establishes that the river site is moderately affected by the presence of E. coli in the summer seasons.

Factors 4–7 are characterized by the dominance of only one variable each, such as dissolved oxygen (factor 4), nitrate (factor 5), pH (factor 6), carbonate (factor 7), whereas factor 3 showed higher loadings of conductivity (COND) and TDS (Table 8). High loadings of TDS and COND are revealed with the physiochemical sources of variability (Varrol and Sen 2009). Negative moderate loading of pH might indicate the increase in dissolved organic carbon (DOC) from the runoff (Dinka 2010).

The output of the R-mode cluster analysis of the summer season is given as a dendrogram (Fig. 3). The dendrogram contains two major clusters as shown in Fig. 3. Clusters 1 and 2 show the interrelationship among the variables. Cluster 1 shows the interrelationship among the minerals that are found in Factor 1 of the R-mode factor analysis. This dendrogram validates the interpretation in the R-mode factor analysis (Reghunath et al. 2002).

Fig. 3
figure 3

Scree plot of eigenvalues of physicochemical variables of surface water of summer water in Leh, Jammu and Kashmir, India

Varimax rotation of winter-season data showed eight varifactors. The first factor (which accounts for 18.72% of the total variance) is characterized with higher loadings of iron, sodium, and potassium, followed by negative higher loadings of TDS, and negative lower loadings of chloride and phosphate (Table 10). It has been found that the number of nutrient loadings was less in comparison with the first varifactor of the summer season. It might be due to less weathering and agricultural runoff. The temperature is very low in the winter season in this area, and no cultivation is found in this season in the study area. For these reasons, the number of nutrient loadings is less (5, 29–31, 45). Negative moderate loadings of phosphate might be due to the zero level of agriculture in this area. Negative higher loading of TDS might be due to the few physicochemical sources of variability. In the second varifactor, higher loadings of bicarbonate and manganese were found and this might be due to the weathering process of rocks (Kumar et al. 2009).

Factors 3–8 are characterized by the dominance of only one variable each, such as Ca (factor 3), calcium hardness (factor 4), alkalinity (factor 5), dissolved oxygen (factor 6), pH (factor 7), and salinity (factor 8) (Table 10). All these factors account for 44.47% of the total variance. The single dominance of variables in each factor indicates non-mixing or partial mixing of different types of water (Arshad and Gopalakrishna 2009). The present results were in accordance with the study of Reghunath et al. (2002). In their study, it was found that factors 3–8 are characterized by the dominance of only one variable each, such as Mg in factor 3, K in factor 4, NO3 in factor 5, CO3 in factor 6, pH in factor 7, and SO4 in factor 8, and together, these six factors account for 34.7% of the total variance. The results of this study strongly agreed with those of our study. Therefore, there is a strong indication that non-mixing/partial mixing of different types of water is present in the study area (Tables 9 and 10).

Table 10 Varimax-rotated R-mode factor loading matrix of winter season

The output of the R-mode cluster analysis of the winter season is given as a dendrogram (Fig. 4). The dendrogram contains two major clusters, as shown in Fig. 4. Clusters 1 and 2 show the interrelationship among the variables. Cluster 1 shows the presence of variable that has the higher loadings in most of the varifactors. This dendrogram confirms the interpretation made in the R-mode factor analysis (Reghunath et al. 2002) (Fig. 4).

Fig. 4
figure 4

Dendrogram of the Q-mode cluster analysis of winter season. (The axis shown below indicates the relative similarity of different cluster groups. The lesser the distance, the greater the similarity between objects)

Conclusions

New information on seasonal variation of different water quality parameters, viz physical, chemical, and microbiological of the Indus River water from the Trans-Himalayan high-altitude region, has been analyzed. As per principal component analysis, followed by factor analysis, the loading results of the seven varifactors in summer season and eight varifactors in winter seasons were extracted. These findings indicated that the anthropogenic activities and nutrient loading are the main factors affecting the river water quality in the summer seasons. However, in the winter seasons, through factor analysis, it might be inferred that the river water in the winter season has been less affected by anthropogenic activities. With reference to multivariate statistical analyses, it can be concluded that the agricultural, domestic, and hydrogeochemical sources are affecting significantly water quality of the Indus River.