Introduction

The availability of good quality water is an indispensible feature for drinking, agriculture, industrial and irrigation purposes (Nagaraju et al. 2014) as well as for preventing diseases and improving the quality of life (Nabila et al. 2014). Water quality is controlled by many factors including climate, soil topography, and water rock interaction (Love et al. 2004; Li et al. 2016; Nagaraju et al. 2017). The analysis of freshwater sources is an important and sensitive issue in water quality monitoring to control and reduce the incidence of contamination (Akoto and Abankua 2014). Water, the most essential element for the existence of life on Earth, is easily exposed to pollution by rapid industrialization and increase in population which creates unhealthy environment (John Mohammad et al. 2015). The water quality assessment provides clear information about the subsurface geologic environments in which the water bodies are present (Raju et al. 2011). The conventional techniques such as trilinear plots, statistical techniques are widely accepted methods to determine the quality of water sources (Kumar et al. 2015; Qishlaqi et al. 2017; Nagaraju et al. 2017; Shigut et al. 2017). However, the use of these graphical methods to interpret water chemistry is limited to only two dimensions and these methods deal with a limited number of variables responsible for the water chemistry which can produce biased results (Güler et al. 2002). To overcome the limitations of these conventional methods, multivariate statistical techniques, such as cluster analysis (CA) and principal component analysis (PCA) are widely used in the interpretation of complex data matrices to evaluate the water quality, ecological status, identification of possible factors that influence water systems and offers a valuable tool for reliable management of water resources as well as rapid solution to pollution problems (Farnham et al. 2003; Tanasković et al. 2012; Matiatos et al. 2014; Kamtchueng et al. 2016, Kumar et al. 2017). This method also helps in the interpretation of natural associations between different variables and thus highlights the information not available at first glance. Hence, this multivariate treatment of environmental data could be successfully used to interpret relationships among the water quality variables for better management of the environmental system (Liu et al. 2003; Rakotondrabe et al. 2017).

The Jia-Bharali river basin is one of the most developed regions in the north Brahmaputra plain of India. In the recent years, rapid deterioration of the water quality of almost all water bodies has been observed due to intensive agriculture, urbanization and development of small industries in the basin. In the earlier publications (Khound et al. 2012, 2013), water quality of the basin was presented taking only average values of the parameters in both the seasons which was lacking of elaborate interpretation of water chemistry as well as statistical analysis. In this context, the major objectives of this study were to (1) investigate the spatial and temporal variation of water quality parameters of the ground water and the surface water sources and (2) demonstrate the usefulness of the Multivariate statistical analysis to interpret the water quality parameters of the water sources in and around Jia-Bharali river basin of India.

Materials and methods

Study area

The Jia-Bharali river catchment is bounded by longitudes 92°00′–93°25′E and latitudes 26°39′–28°00′N in the North Brahmaputra Plain of north eastern India. The Jia-Bharali, one of the major tributaries of the river Brahmaputra, flows down from the lower Himalayas in Arunachal Pradesh in the north eastern India and enter the North Brahmaputra Plain at Bhalukpung (92°65′E: 27°01′N) where it takes the name of Jia-Bharali (Jia meaning alive in local language). Jia-Bharali catchment is made up of two tectonic blocks being separated by the river itself. The western block is tectonically active with continued release of strain and the eastern block is a zone of strain accumulation. The structural features of the Jia-Bharali basin include a system of faults dividing the basin into a number of blocks within the Brahmaputra valley. A zone of weakness or a graben between the Rangapara block to the west and Charali block to the east is noted, along which the Jia-Bharali River is flowing southwardly to meet the Brahmaputra (Viswanathan and Chakrabarti 1977). It is dotted with numerous meander scars, remnant channels, misfit streams, inactive floodplains and natural levee. The Jia-Bharali catchment shows the presence of a number of river terraces at different topographic levels with the present Jia-Bharali channel system occupying the lowest level. The course of this palaeo river system is known as Mara Bharali (Mara meaning dead) and is well discernible on the ground. Subsequently, Mara Bharali has attained a graded condition with respect to the local base as the Brahmaputra river at Tezpur, 92°53′E: 26°39′N and developed a wide meander belt (Khound et al. 2013). The region has extensive tea-plantations on the higher topographic surfaces and paddy fields generally occupying the lower topographic planes. The northern portion along the foothills of Arunachal Himalaya is made up of reserve forests (e.g., Chariduar, Balipara reserve forests) and sparsely populated forest-villages. The region abounds in biodiversity with evergreen and deciduous trees of many types. The climate of the study area is sub-tropical in nature with hot and humid summer (average temperature 29 °C), heavy monsoon rain (May–September) followed by inundation of almost the entire area, dry autumn and cold winter (November–February, average temperature 16 °C). The Jia-Bharali river basin experiences 4–5 major floods annually during the monsoon periods (Jain et al. 2007).

Hydrogeology

The Jia-Bharali basin occupies central part of the NBP at the foothills of Eastern Himalaya. Active tectonics together with high monsoon rain result in very high sediment load into the trunk channel and its tributaries which are deposited in the mountain front areas and further south in the floodplains. The present study area is a typical alluvial terrane with ubiquitous fluvial features dotting the landscape. Geological horizon seems to have an alluvium and alluvial plain. Aquifer material is also either sand, medium sand, coarse sand or clayey sand. The depth of the shallow wells (ground water sampling points) was recorded in the range of 3.66–11.00 m with the diameter: 0.60–1.40 m (Khound et al. 2013). Surface materials near the wells are generally found to be composed of clay or sandy clay or silty clay.

Method

Ground water (GW) samples were collected from 50 shallow hand dug wells, while surface water (SW) samples were collected from 35 different sources consisting of streams, rivers and ponds spread over the entire area of the Jia-Bharali river basin twice a year (wet and dry) for a three years period from 2009 to 2012 (Fig. 1). The factors that were considered for the sampling program included (1) the type of sample to be collected, (2) the locations from where the samples will be collected, (3) the frequency of sample collection, (4) total number of samples to be collected, (5) size of the samples, etc. The factors were decided by the type of data to be collected and the hypotheses to be tested (EPA 2004). For the present work, a detailed time schedule was prepared for collection, transportation, pre-treatment, storage and analysis of the water samples. SW samples were manually collected at a depth < 1 m in the center of the sources, preferentially where the flow velocity was high enough to allow for good homogenization of the solid particles and dissolved material (Li et al. 2014; Ndam Ngoupayou et al. 2016). GW samples were directly collected from shallow dug wells. All the water samples were collected in polythene containers of 1 L capacity. Before collecting sample, the containers were thoroughly cleaned by washing with 8 M nitric acid, followed by repeated washing with distilled water. They were rinsed thrice with the sample water before collection (APHA 1998). Samples were collected from the same sources during each set of sample collection. Determination of some parameters is affected by sample storage and these need to be estimated immediately after collection. Water samples collected in the polyethylene bottles were divided into two parts: one part was acidified with nitric acid to pH < 2 and stored at 4 °C for metal analysis and the other part was preserved at 4 °C by adding appropriate reagent (APHA 1998) and used for the analysis of other physicochemical parameters. Standard methods (APHA 1998) were followed in collection, storage and analysis of the water samples. Na+ and K+ were determined with a flame photometer (Elico CL 361), the anions, SO42−, PO43− and NO3 with UV–visible spectrophotometer (Hitachi 3210), total dissolved solids (TDS) by evaporation method, total hardness, Ca2+, Mg2+, Cl and HCO3 by titrimetric method. Iron (Fe) was estimated by using atomic absorption spectrophotometer (Varian SpectrAA 220) following standard acid digestion technique.

Fig. 1
figure 1

The selected ground and surface water sampling points of the study area (Khound et al. 2012)

Ion balance studies are carried out usually to validate the experimentally obtained data (Domenico and Schwartz 1990). For this purpose, the concentrations of the principal cations and anions were calculated in milliequivalents per liter. The two should be equal to one another and the ratio should be equal to 1.0 since the water as a whole is electrically neutral (Hem 1985). The ionic balance (BI) of the water samples can be calculated as: BI = 100 × (Σ+ −Σ)/(Σ + Σ+) where Σ+: total milliequivalents/L of principal cations and Σ: total milliequivalents/L of principal anions. Statistical analysis was carried out using statistical package for social sciences (SPSS, Version 16). Water quality monitoring and analysis programs generate large complex dataset that needs multivariate statistical methods for interpretation of the underlying information. Multivariate statistical analysis successfully explains the correlation among a large number of variables in terms of a small number of underlying factors and helps to simplify/organize large datasets to provide meaningful insight (Kumar et al. 2017). This multivariate method was used here to obtain information about the most relevant characteristics of the physicochemical variables with a minimal loss of original data (Altun et al. 2008; Kazi et al. 2009) to create an entirely new set of factors much smaller in number when compared to the original dataset of variables focused on reducing the contribution of the less significant variables to simplify even more the data structure coming from the principal component analysis (Iscen et al. 2008). For principal component analysis, the entire dataset was first standardized and arranged in correlation coefficient matrix with normal distribution in all variables (Okiongbo and Douglas 2015). Eigenvalues and eigenvectors of the correlation matrix were extracted, and then less important of these was discarded (Davis 1986). The eigenvalues having values more than unity demonstrate the significant contribution of a factor to the total variance. The factor loadings were calculated by a varimax rotation technique in such a way that they are closer to + 1, 0, − 1, representing positive contribution, no contribution and negative contribution. Hierarchical cluster analysis comprises a series of multivariate methods which are used to find true groups of data. In clustering, similar objects are grouped into the same class and similar variables are merged to construct a dendrogram (Cloutier et al. 2008; Güler et al. 2012; Moya et al. 2015, Kumar et al. 2017).

Results and discussion

Spatial and temporal variation of physicochemical parameters

Basic statistics of the water quality data are given in Tables 1 and 2. The variations of the physicochemical parameters in the different water sources from the Jia-Bharali river basin and its surrounding areas are presented in Fig. 2. The wet season values (Wet) and dry season values (Dry) of all the parameters are presented by taking the averages of three wet seasons and three dry seasons values, respectively.

Table 1 Descriptive physicochemical characteristics of the shallow aquifers (BDL: below detection limit) in both the wet and dry seasons
Table 2 Descriptive physicochemical characteristics of surface water sources in both the wet and dry seasons (BDL: below detection limit)
Fig. 2
figure 2

Bar diagram of the hydrochemical parameters of GW and SW sources in both the wet and dry season

In the present work, the GW samples had pH from 5.6 to 7.6 (mean value 6.7) in the wet seasons and from 5.9 to 7.4 (mean value 6.7) in the dry seasons (Khound et al. 2012). The pH difference between wet and dry season was insignificant. 18% of the wells in wet seasons and 24% of the wells in the dry seasons were found to have acidic pH, while majority of the wells in both seasons had pH in guideline range (6.5–8.5) proposed by WHO (2011) for drinking purposes. The SW samples were found slightly alkaline in nature during both the seasons with the averages being in the range of 6.1–7.4 (mean value: 6.7) in the wet seasons and from 6.3 to 8.0 (mean value 6.8) in the dry seasons (Khound et al. 2012). The water sources were thus found suitable for irrigation purposes with respect to pH, i.e., there was no alkalinity hazard. Occurrences of acidic pH in many GW and SW samples of the study area may be due to the dissolved carbon dioxide and organic acids (fulvic and humic acids), originated from the decay and subsequent leaching of plant materials (Garcia et al. 2001). Presence of lateritic soil in the studied river basin may also contribute to the acidic pH of the water sources (CESS 1984). Moreover, as the study area experiences extensive cultivation (e.g., paddy fields and tea estates), the pH could be lowered due to the use of acid producing fertilizers like ammonium sulfates and super phosphates of lime (Raghunath et al. 2001). Based on the pH measurement, the water sources of the basin could be classified as belonging to four pH-zones, (1) < 6.0, (2) 6.0–6.5, (3) 6.5–7.0 and (4) > 7.0 as shown in Table 3. The alluvial dug wells presented a large variation in TDS from 63 to 349 mg/L with a mean value of 168 mg/L during the wet seasons and from 79 to 285 mg/L with a mean value of 121 mg/L during the dry seasons (Khound et al. 2012). A few of the TDS values were very large during the wet seasons indicating considerable input through the surface runoff. The TDS values of SW sources varied in a wide range of 55–130 mg/L (mean value: 86 mg/L) in the wet seasons and 70–170 mg/L (mean value: 103 mg/L) in the dry seasons (Khound et al. 2012). The dry season values (86%) were observed higher than those of wet season mainly to reduction in water volume. When TDS > 1000 mg/L (WHO 2011), the water is likely to have objectionable tastes. However, all the water sources (both ground and surface) had TDS content well below WHO (1984) limit (1000 mg/L) in both the seasons, indicating their suitability for drinking as well as irrigation purposes. On the basis of WHO (1993) classification, most of the water sources could be categorized in ‘excellent’ category for drinking purposes (Table 4). Davis and Dewiest (1966) had proposed a threefold classification of water sources based on TDS levels as (1) domestic (< 500 mg/L), (2) irrigation (500–1000 mg/L) and (3) industry (> 1000 mg/L). On the basis of this classification, all the sources in the present work fall under the domestic category. Water hardness generally causes due to entry of sewage, detergents and other domestic and industrial wastes (Jain and Sharma 2002). However, in the present study, all the dug wells in both the seasons had hardness values well below the WHO (1984) permissible limit of 500 mg/L (52–198 mg/L, wet season and 37–167 mg/L, dry season) which was based on taste and household use considerations (Khound et al. 2012). The average total hardness (as CaCO3) values of the SW samples varied from 29 to 112 mg/L with a mean value of 56 mg/L in the wet seasons and 35–153 mg/L with a mean value of 77 mg/L in the dry seasons (Khound et al. 2012) well below the WHO (1984) permissible limit of 500 mg/L. 79.9% of wet season water samples were found to fall in soft category (0–60 mg/L), while 65.7% of dry season samples were found to fall in moderately hard category (61–120 mg/L) (Durfor and Becker 1964). The sources of hardness in these water sources could be ascribed to domestic activities as well as small-scale industrial effluents. However, it was observed that majority of the GW sources fall under ‘moderately hard’ category, while most of the SW sources fall under ‘soft’ category in the Durfor and Becker (1964) classification (Table 5). SW sources had considerable COD load, the ranges being from 58 to 221 mg/L with a mean value of 92 mg/L in the wet seasons and 49–139 mg/L with a mean value of 70 mg/L in the dry seasons. Low to modest values of COD may be attributed to organic load resulting from storm water runoff from houses, roads, failing septic systems, kitchen waste, street waste, etc. In the wet season, 85.8% of the SW sources showed more COD content compared to those during the dry season. 22.9% sources in the wet season and 8.6% sources in the dry season experienced COD values more than 100 mg/L. This might indicate entry of surface runoff to the water system during the rainy seasons.

Table 3 Classification of water sources based on pH
Table 4 Classification of water sources based on TDS (mg/L)
Table 5 Classification of water sources based on TH (mg/L)

Major cation chemistry

Ca2+ and Mg2+ content of the studied shallow aquifers were recorded in a wide range of variations as Ca2+: 9.3–43.9 mg/L and 7.1–39.9 mg/L, Mg2+: 2.1–25.9 mg/L and 2.6–17.3 mg/L in the wet and dry seasons, respectively (Khound et al. 2012). 72% GW samples from the shallow wells showed higher values of Ca2+ in the wet season, and the rest 28% aquifers showed higher values in the dry seasons. Similarly, out of 50 shallow well water samples, 64% of the water samples had higher Mg2+ content in the wet season and the rest 36% of the water samples showed higher values in the dry season. Ca2+ and Mg2+ of SW sources also showed wide variations in values: Ca2+: 2.2–26.3 mg/L and 3.6–28.9 mg/L; Mg2+: 0.8–9.1 mg/L and 1.6–11.2 mg/L in the wet and the dry seasons, respectively (Khound et al. 2012). Except two ponds (30 and 33), all the other SW sources had higher values of Ca2+ in the dry season than those in the wet season. 5.7% of the SW sources had higher Mg2+ content in the wet season and the remaining 94.3% of the water samples showed higher values in the dry season. However, it was observed that all the water samples in the wet and the dry season had Ca2+ and Mg2+ concentration below the desirable limit of Ca2+: 75 mg/L, Mg2+: 30 mg/L of BIS (2004) for drinking water. In the present work, the Na+ contents of the aquifers were found in the range of 6.5–40.8 mg/L in the wet season and 3.8–30.4 mg/L in the dry season, respectively (Khound et al. 2012) and thus water could be considered as suitable for irrigation and domestic purposes. Corresponding K+ values were observed as 2.6–22.7 mg/L in the wet seasons and 2.0–16.7 mg/L in the dry seasons. GW samples had higher values of Na+ (78%) and K+ (76%) in the wet season than in the dry season. SW Na+ contents were found in the range of 3.5–11.2 mg/L in the wet seasons and 4.3–13.8 mg/L in the dry seasons, while K+ contents were observed from 0.8–5.2 mg/L in the wet seasons and 1.5–6.2 mg/L in the dry seasons (Khound et al. 2012), respectively. Except two sources, all the other SW sources were found to have Na+ content below 10 mg/L. 77.3% of the SW bodies showed more K+ content in the dry season than in the wet season. Only 2.8% (one) sources in the wet season and 5.7% sources in the dry season were found to have K+ content above 5 mg/L. However, all the water samples (both ground and surface) showed Na+ and K+ content well below the recommended values (Na+:200 mg/L, K+:12 mg/L) of WHO (2011) and BIS (2004), indicating their suitability for drinking as well as irrigation purposes. In high rainfall zones of India such as Assam, Orissa and Kerala, total iron content of water sources generally varies from 6.8 to 55.0 mg/L (Singhal and Gupta 1999). Similarly, in the present work, most of the water samples showed Fe concentration in the range of 0.12–4.45 mg/L in the wet seasons and 0.16–7.80 mg/L in the dry seasons (Khound et al. 2013) which were much higher than the WHO (1984) permissible limit for drinking water (0.3 mg/L). For SW sources, iron contents varied from 0.16 to 1.11 mg/L in the wet seasons and 0.24–2.90 mg/L in the dry seasons with most sources exceeding WHO (2004) limit (0.3 mg/L) for drinking water. It is obvious that in the wet season 54.4%, SW sources have Fe content below the maximum permissible limit, while in the dry season, Fe content increases to well above this limit for 94.4% SW sources, mainly due to the reduction in water volume. 85.8% sources in the dry seasons and 45.8% sources in the wet seasons show Fe content in the range of 0.3–1.5 mg/L. The iron content shows different values in different parts of the studied basin depending on the soil characteristics, and it can be summarized in Table 6. It may be noted that the solubility of iron at pH 6 is about 105 times greater than at pH 8.5 (Mason and Moore 1985) and since the water sources of the present study area was generally acidic (pH < 7.0), it was likely that the water had dissolved large amounts of iron from the soil as the water percolates down. However, hydrochemistry showed that the major ion contents of all studied water sources of the Jia-Bharali river basin followed the trend, Ca2+ > Na+ > Mg2+ > K+ > Fe, in both the wet and the dry seasons.

Table 6 Classification based on iron content (mg/L)

Major anion chemistry

Carbonates showed its presence in water at pH more than 8.3, and hence, in the present study, no carbonate (CO3-) alkalinity could be expected in any of the water sources (Narain and Chauhan 2000) as they had pH lower than 8.3. The total alkalinity therefore of the water sources could be considered almost entirely due to the presence of bicarbonates (Pawar 1993). Bicarbonate content (as CaCO3) of the GW samples was observed in the range of 34–119 mg/L (mean 70 mg/L) in the wet seasons and 18–68 mg/L (mean 43 mg/L) in the dry seasons, respectively (Khound et al. 2012). Wet season HCO3 was found at slightly higher levels indicating some contribution from the carbonate weathering process due to heavy rainfall in the river basin. SW samples showed HCO3 values varied from 18–39 mg/L in the wet seasons and 28–54 mg/L in the dry seasons (Khound et al. 2012). However, all the water sources showed the bicarbonate alkalinity values within the WHO (1984) permissible limit (120 mg/L) for drinking water. In the present study, GW Cl showed a wide range of values from 5.6 to 110.7 mg/L in the wet season with a mean value of 39.1 mg/L and from 8.8 to 90.7 mg/L with a mean value of 36.8 mg/L in the dry seasons (Khound et al. 2012). Similarly, SW Cl content was found in the wide range of values from 5.9 to 25.3 mg/L with a mean value of 10.9 mg/L in the wet seasons and from 9.8 to 28.7 mg/L with a mean value of 15.5 mg/L in the dry seasons (Khound et al. 2012). 54% of aquifers and 92% of the SW sources presented higher Cl values in the dry seasons than those of the wet seasons. The low chloride content of the water sources in both the seasons could be due to the (1) absence of industrial activities as well as low rate of percolation of agricultural and domestic wastes to the water bodies and (2) insignificant geogenic contributions from the area (Mariappan et al. 2000). However, all the water sources showed Cl content below the permissible limit of 600 mg/L (WHO 2011) for drinking water and therefore the water was free from excessive presence of chloride in both the wet and dry season. The aquifers in the present study showed the SO42− content in a wide range from 7.1 to 83.1 mg/L with a mean value 21.4 mg/L in the wet seasons and 3.1–38.0 mg/L with a mean value 18.1 mg/L in the dry seasons (Khound et al. 2012). Similarly, SO42− contents of the SW sources were observed from 1.8 to 14.2 mg/L in the wet seasons and 3.4–28.4 mg/L in the dry seasons (Khound et al. 2012), respectively. The variation of SO42− concentration in both the seasons was thus very wide. 58% of the dug wells and 85.8% of the SW sources showed higher SO42− content in the wet season than in the dry season. However, all the water sources in both the wet and dry season were found to have sulfate contents much below the permissible limit (200 mg/L, WHO 2004) for drinking and other household purposes. The aquifers of the basin showed only small amounts of NO3 from BDL to 0.72 mg/L with a mean value of 0.11 mg/L in the wet seasons and BDL to 0.23 mg/L with a mean value of 0.04 mg/L in the dry seasons (Khound et al. 2012). 46% of the GW samples possessed higher NO3 concentration in the dry season in compared to that of wet season. 38% aquifers in the dry season and 18% aquifers in the wet season showed NO3 concentration below the detection limit in the study area. The SW sources were also found to have only small amounts of NO3 from BDL to 1.23 mg/L in the wet and BDL to 0.43 mg/L in the dry seasons. 80% of the SW samples showed higher NO3 concentration in the wet season, while the rest 20% samples showed higher values in the dry season. 27% SW sources in the dry season and 13% sources in the wet season showed NO3 concentration below the detection limit in the study area. However, all the values were well below the WHO (2004) recommended value of 50 mg/L for drinking water in both the seasons. Smaller nitrate values of study area indicated that the nitrifying bacteria were not much active due to the presence of anaerobic conditions (the area having a water cover for most of the time) for the large part of the year. The presence of extensive paddy cultivation in the study area suggested that agricultural runoff was the probable source for this concentration (Kumar et al. 2011). High concentration of nitrate causes methemoglobinemia or blue baby syndrome and have been cited as a risk factor in developing gastric an intestinal cancer (Chapman 1996). However, the contents in the present work were very low to arouse such concern. GW PO43− values were recorded in the range from 0.01 to 1.27 mg/L with a mean value of 0.16 mg/L in the wet seasons and BDL to 0.98 mg/L with a mean value of 0.07 mg/L in the dry seasons (Khound et al. 2012). 82% GW samples showed higher PO43− concentration in the wet season than the dry season. In dry season, 36% of the aquifers showed PO43− content below the detection limit. The SW sources also showed similar PO43− content in the range from BDL to 1.48 mg/L in the wet seasons and from BDL to 1.14 mg/L in the dry seasons. In the wet seasons, 5.7% and in the dry seasons, 37.2% of the SW sources showed PO43− content below the detection limit, while 88.5% SW samples possessed more PO43− concentration in the wet season than the dry season. PO43− enters SW through several routes including weathering of phosphate containing rocks, agricultural runoff carrying unused fertilizers and percolation of sewage and industrial wastes (Anda et al. 2001). The presence of vast paddy rice cultivation in the study area obviously pointed to agricultural runoff being the main source. The high PO43− concentration during the months of June and July, i.e., the monsoon period could be attributed to agricultural runoff and discharge of water containing detergents etc. from the surface (Kumar et al. 2011). In the present study, the fluoride concentration of the dug wells varied in the range of BDL–0.49 mg/L (mean value: 0.06 mg/L) in the wet seasons and BDL–0.70 mg/L (mean value: 0.08 mg/L) in the dry seasons (Khound et al. 2012), respectively. 36% aquifer showed higher wet season F than the dry season content. SW sources were also found to have F content from BDL-0.10 mg/L (mean value: 0.01 mg/L) in the wet seasons and BDL-0.14 mg/L (mean value: 0.01 mg/L) in the dry seasons. In 22.9% of the SW samples, the F content was higher in the dry seasons, while in the wet seasons, most of the sources had fluoride below the detection limit mainly due to dilution. However, all the water samples had F content well below the acceptable limit (1.5 mg/L) of WHO (1984) for drinking purposes in both the wet and dry season. The sources of F ion in the studied water sources could be attributed to mineral and inorganic nutrients, agricultural and domestic sewage runoff (Rao et al. 2015). Fluoride enters SW sources mainly from weathering of rocks, phosphatic fertilizer usage, and sewage sludge. Hydrochemical analysis showed that the anion composition of all the ground and surface water sources was dominated by bicarbonate (as CaCO3), chloride and sulfate with almost insignificant contribution from phosphate and nitrate, the order being HCO3 > Cl > SO42− ≫ > PO43− > NO3 > F in both the wet and the dry seasons.

Ion balance study

The analytical precision for the chemical variables was determined by computing the ionic balance between the cations (Ca2+, Mg2+, Na+ and K+) and anions (HCO3 , Cl, SO42− and NO3) (Rao et al. 2015). The ion balance study with respect to average cation and anion concentrations in meq/L and their ratios for the water sources of the studied basin are shown in Tables 7 and 8. The ratio of total cation equivalents to the total anion equivalents of the GW sources varied from 0.68 to 1.37 with a mean of 1.12 in the wet season, while in the dry season, it showed the variation from 0.73 to 1.36 with a mean of 1.07. 70% aquifers in the wet season and 84% aquifers in the dry season showed the ion balance ratio in the range 0.68–1.20. The ratio of total cation equivalents to the total anion equivalents of the studied SW sources (except source 12, river water source) varies from 0.69 to 1.10 in both sets of the seasons and may be considered as ≈ 1.0. Such discrepancies are not uncommon considering that different methodologies are followed for the estimation of the cations (Ca2+ and Mg2+ by EDTA-titration, Na+ and K+ by flame photometry, Fe (total) by atomic absorption spectrophotometry) and the anions (Cl and HCO3 by volumetric method, SO42−, NO3 and PO43− by spectrophotometric method). Thus, the ion balance studies in the present case may be considered as indicating reliability of the measured data to a large extent. When averages are taken over different batches for the same site, the differences arising from methodology variations are likely to be wiped out and better ion balance was obtained. This again points to the general validity of the measured data.

Table 7 Ion balance study for 50 aquifers of the basin
Table 8 Ion balance study for 35 surface water sources of the basin

Multivariate statistical analysis

Raw data treatment

The normal distribution of each variable required for multivariate statistical analysis could be confirmed by analyzing kurtosis and skewness statistical tests (Lattin et al. 2003). The original database were found to generate a wide range of skewness values as − 0.556 to 4.291 for GW and 0.35–4.17 for SW with the respective kurtosis values ranged from − 0.918 to 12.693 and − 1.18–19.1. It indicated that the database was far from normal distribution. Since for most of the values kurtosis and skewness were > 0, the raw data of all variables were transformed in the form x′ = log 10(x). After transformation, GW skewness values ranged from − 0.854 to 0.810 in the wet season and − 1.087 to 1.192 to 2.011 in the dry season, while kurtosis values ranged from − 1.345 to 2.644 in the wet season and –1.621 to 3.117 in the dry season, respectively. Similarly, SW skewness values ranged from − 0.070 to 1.430 in the wet season and − 0.128 to 2.011 in the dry season with the kurtosis values − 0.893 to 1.619 and –0.952 to 4.897 in the wet and dry season, respectively. These values indicated that the dataset were under normal distribution or close to normal distribution for statistical analysis. In order to avoid miss classification caused by wide differences in data dimensionality, z-scale transformation was applied to the raw water quality dataset (Liu et al. 2003) to eliminate the influence of different units of measurements, and to render the data dimensionless before the multivariate analysis. KMO is a measure of sampling adequacy for the proportion of common variance caused by underlying factors. The value of KMO close to 1.0 generally indicates that principal component analysis/factor analysis may be useful, which was the case in this study: KMO: 0.74 (GW), 0.52 (SW) in the wet season and 0.77 (GW), 0.51 (SW) in the dry season. Bartlett’s test of sphericity indicates whether correlation matrix is an identity matrix, i.e., the variables are unrelated. The significance level which was 0 in this study (less than 0.05) indicated the presence of significant relationships among water variables in both the wet and the dry seasons (Shrestha and Kazama 2007). KMO and Bartlett’s tests are presented in Table 9.

Table 9 KMO and Bartlett’s test of sphericity

Correlation matrix analysis

The degree of linear association between the important physicochemical parameters and the major cations and anions, taking them in pairs, are presented in two correlation matrices for each season (Tables 10, 11, 12, 13). The correlation coefficients were computed for the average values of three wet seasons and three dry seasons. A correlation coefficient (r) of + 1 indicates that two variables are perfectly related in a positive linear sense, but r = − 1 indicates a negative linear correlation. Thus, two variables having a positive correlation coefficient infer that they have a common source, while negative correlation coefficient indicates different sources. The correlation can be considered as strong when r > 0.50, good when r = 0.50 and poor when r < 0.50 (Kumar et al. 2017). Thus, correlation analysis reveals the nature of the relationship between the water quality parameters of the studied river basin. TDS bears a good linear correlation with TH [r = 0.60 (wet) and 0.50 (dry)], Na+ [r = 0.59 (wet) and 0.46 (dry)], Ca2+[r = 0.56 (wet) and 0.83 (dry)], Mg2+[r = 0.54 (wet) and 0.67 (dry)], HCO3 [r = 0.57 (wet) and 0.59 (dry)], Cl [r = 0.59 (wet) and 0.58 (dry)] and SO42− [r = 0.59 (wet) and 0.51 (dry)] in all the GW sources and hence, the major constituents of TDS come from the ionic contributions due to Na+, Ca2+, Mg2+, HCO3 Cl and SO42−. Similarly, SO42− bears a good positive correlation with Ca2+ [r = 0.61(wet) and 0.62(dry)], Mg2+[r = 0.79(wet) and 0.63(dry)], Na+[r = 0.49(wet) and 0.55(dry)], K+[r = 0.50(wet) and 0.56(dry)], and Cl [r = 0.64(wet) and 0.60(dry)] in the studied GW sources. Ca2+ has a good positive correlation with Mg2+ as r = 0.78(wet) and 0.82(dry)] in the GW and r = 0.70(wet) and 0.81(dry) in the SW sources, respectively. Besides this, the correlation matrices also showed significant positive correlations between different physicochemical parameters of the studied aquifers such as TH–Ca2+ [r = 0.71(wet) and 0.83(dry)], TH–Mg2+ [r = 0.69(wet) and 0.67(dry)], TH–Cl [r = 0.51(wet) and 0.58(dry)], TH–HCO3 [r = 0.64(wet) and 0.59 (dry)], TH–SO42− [r = 0.59(wet) and 0.51 (dry)]. This showed the dependence of GW hardness on Ca2+, Mg2+, HCO3, Cl and SO42−. Chloride content was significantly and positively correlated with Ca2+ [r = 0.62(wet) and 0.70(dry)], Mg2+ [r = 0.70(wet) and 0.68(dry)], Na+ [r = 0.67(wet) and 0.62(dry)] and SO42− [r = 0.63(wet) and 0.60(dry)]. Good correlations were also found in between HCO3 and Ca2+ [0.71(wet) and 0.55 (dry)], TH–Ca2+ [r = 0.71 (wet) and 0.83 (dry)], TH–Mg2+ [r = 0.69 (wet) and 0.67(dry)] which indicated similar sources and/or geochemical behavior during ionic mobilization (Tiwari et al. 2015). The positive correlations between Cl–Na+ [r = 0.67(wet) and 0.62(dry)] indicated that Cl and part of the Na were originated from anthropogenic sources (Tiwari and Singh 2014). The good correlation between Ca2+ and SO42− [r = 0.61(wet) and 0.62 (dry)] indicated that gypsum dissolution is a major contributor for the dissolved ions in the GW sources of the study area. However, no such significant relationships were observed in between the other constituents of the GW and SW samples.

Table 10 Correlation matrix of the physicochemical parameters of the ground water samples in the wet season (taking average values)
Table 11 Correlation matrix of the physicochemical parameters of the ground water samples in the dry season (taking average values)
Table 12 Correlation matrix of the physicochemical parameters of the surface water samples in the wet season (taking average values)
Table 13 Correlation matrix of the physicochemical parameters of the surface water samples in the dry season (taking average values)

Principal component analysis

Shallow aquifers

Using the varimax normalization (Kaiser 1960), four principle components (PC) having eigenvalues more than one were extracted for the wet season and the dry season and are presented in Figs. 3 and 4. Principal components were found to be accounted for 68.1% of total variance in the wet season and 70% of total variance in the dry season, respectively. Thus, it was found quite useful and could be applied to identify the main sources of variation in the GW chemistry of the study area in both the seasons. PC 1 with 36.2% of total variance in the wet season and 38.6% of total variance in the dry season showed positive values for all variables with strong positive loadings (> 0.50) of hardness, TDS, Ca2+, Mg2+, Na+, HCO3, Cl and SO42−. Thus, PC 1 could be considered as contribution of surface runoff from agricultural fields to the GW sources (Fukasawa 2005). As agriculture is the mainstay of a large majority of the population of the basin, the extensive use of chemical fertilizer may be the potential sources of these metallic and non-metallic constituents in the studied aquifers. PC 2 accounted for 12.5% of the total variance in the wet season and 11.5% of variance in the dry seasons and presented high positive scores of hardness, Ca2+, Mg2+, HCO3 and F. Thus, PC 2 could be attributed as the influence of increased urban activities of the basin on the water sources as municipal and domestic disposals are also important sources of Ca2+ and Mg2+ in the aquatic environment (Jacob et al. 1999). PC 3 with 11.2% of the total variance in the wet season and 9.3% of variance in the dry season loaded positive scores of Ca2+, Mg2+, Fe, HCO3, SO42−, NO3, Cl, PO43− and F, attributing particular geology of the basin. During the infiltration of recharge, the water adsorbs a large amount of CO2 released from soil, which is mainly from decay of organic matter and root respiration. In weathering reactions, it is converted to HCO3 salts (Berner and Berner 1987). The high positive loading of Ca2+ in both the seasons suggested the importance of dissolution of carbonate rocks in the catchment area (Nesrine et al. 2015). Alkaline water mobilizes fluoride from the soils/rocks and also releases it from fluoride-bearing minerals such as apatite, biotite and clay (Madhnure et al. 2007). PC 4 with 8.3% of the total variance in the wet season and 7.7% of same in the dry season showed positive loadings for Mg2+, K+, Fe, SO42−, PO43−, NO3, F and HCO3. This component presented the erosion effect during cultivation of soil and associated organic matter in the study area. Therefore, the contamination of the shallow aquifers during the investigation period mostly originated from anthropogenic and geogenic sources (Unmesh et al. 2006).

Fig. 3
figure 3

Principal component analysis (PCA) for the selected aquifers in the wet season

Fig. 4
figure 4

Principal component analysis (PCA) for the selected aquifers in the dry season

Surface water sources

Six principle components (PC) were extracted from SW database using the varimax normalization (Kaiser 1960) for both the wet and dry season, and they are presented in Figs. 5 and 6. The results showed that the six PC accounted for more than 70% of the total variance (73.85% in the dry season and 72.68% in the wet season), which could be usefully applied to identify the main sources of variation in the SW chemistry of the study area in both the seasons. PC 1 had strong positive loadings (> 0.50) for hardness, Ca2+ and Mg2+ which accounted for 22.6% of the variance in the dry seasons and 19.02% of the variance in the wet seasons. However, due to the absence of industrial activities, the high positive loadings of pH, hardness, Ca2+, Mg2+, Na+, K+, Fe, SO42−, PO43−, NO3 and COD in the wet season accounted PC 1 as anthropogenic sources of the study area. Ca2+ possessed high positive loading in both the seasons showing dissolution of carbonate rocks in the catchment area (Singhal and Gupta 1999). PC 2, which accounted for 15.6% of the total variance in the wet season and 13.1% of the same in the dry seasons, presented positive scores of hardness, Ca2+, Mg2+, and HCO3. Thus, PC 2 could be ascribed as geogenic factor. PC 3 showing ~ 12% of the total variance in both the season loaded positive scores of pH, hardness and chloride accounting the influence of urban activities on the SW sources. PC 4, which accounted for 9.8% of the total variance in the wet season and 10.6% of total variance in the dry season, showed positive loadings for anions Cl, SO42−, PO43−, NO3 and COD suggesting the contribution of surface runoff from agricultural fields to the water sources (Narain and Chauhan 2000; Howari and Banat 2002). PC 5 (~ 8.5% of total variance) and PC 6 (~ 7.5% of total variance) showed almost similar loadings in both the seasons and represented the erosion effect during cultivation of soil and associated organic matter in the study area. Therefore, anthropogenic and geogenic sources (Pawar 1993) were considered as major sources of SW contamination during the study period.

Fig. 5
figure 5

Principal component analysis (PCA) for the selected SW sources in the wet season

Fig. 6
figure 6

Principal component analysis (PCA) for the selected SW sources in the dry season

Hierarchical cluster analysis

It is the most widely applied data classification technique where objects are grouped such that similar objects fall into the same class. Hierarchical clustering joins the most similar observations and then successively the next most similar observations. The levels of similarity at which observations are merged are used to construct a dendrogram (Güler et al. 2012; Moya et al. 2015). Hierarchical cluster was applied to detect similar and dissimilar groups between 50 shallow dug wells and 35 SW sources with 15 variables (Shrestha and Kazama 2007). The datasets were treated by the Ward’s method of linkage with squared Euclidean distance as measure of similarity.

Hierarchical cluster of shallow aquifers

Hierarchical cluster analysis classified the studied shallow wells of the basin into three groups of similar characteristics with respect to variables for both the seasons. The water quality differences in the clusters reflected difference in morphology and anthropogenic pollution. On the basis of dendrogram, the 14 variables of the GW sources could be grouped into three main groups for both the wet and dry season. In the wet season, first group included pH, Na+, K+, Ca2+, Mg2+, Cl and SO42−. This group associated with the group second having total hardness, TDS and HCO3, while group third was composed of Fe, F, PO43− and NO3. This finding corroborated the result of correlation and cluster analysis. Similarly, in the dry season, cluster analysis generated three main groups with close association. The Cluster 1 possessed Na+ and K+, Mg2+ and pH, while the cluster 2 included total hardness, TDS, Ca2+, HCO3, Cl and SO42−. These two clusters were associated with the third cluster having the variables Fe, F, PO43− and NO3 similar to the wet season. The enrichment of Na+ and Cl ions in GW was due to the interaction with rocks and secondly association of TDS with higher concentration of Na+ and Cl ions. This indicated anthropogenic activities such as discharge of domestic and agricultural runoff, which support the contamination of groundwater (Yidana et al. 2008). The dendrograms of GW sites with respect to variables obtained by Ward’s method in Jia-Bharali river basin are shown in Figs. 7 and 8.

Fig. 7
figure 7

Dendrogram of hierarchical cluster analysis of the selected aquifers in the wet season

Fig. 8
figure 8

Dendrogram of hierarchical cluster analysis of the selected aquifers in the dry season

Hierarchical cluster of surface water sources

SW samples of the basin in both the seasons were clustered into three groups of similar characteristics based on their sources. Cluster 1, cluster 2 and cluster 3 corresponded to relatively high pollution, moderate pollution and low pollution sources, respectively, reflected difference in morphology and anthropogenic activities. In the wet season, cluster 1 included 7 sources; 2, 7, 8, 9, 10, 14, 34. Cluster 2 includes 19 sources as 1, 4, 5, 6, 11, 13, 15, 16, 17, 18, 21, 23, 24, 25, 29, 31, 32, 33 and 35. Cluster 3 grouped the remaining sources 3, 12, 19, 20, 22, 26, 27, 28, 30. The water quality differences in the clusters reflected difference in morphology and anthropogenic pollution. The sampling sites under cluster 1 are located in commercial area, and water sources are extensively used and polluted by human activities. Most of the SW sources under cluster 2 are located in residential areas and near to the agricultural fields, suggesting the deterioration of water quality because of pollutions from domestic waste water, agricultural runoff and urban activities. However, the sources under cluster 3 are located in remote and less populated areas of the and hence, they are suffered from less pollution. The dendrograms of the SW sites with respect to variables obtained by Ward’s method in Jia-Bharali river basin are shown in Figs. 9 and 10.

Fig. 9
figure 9

Dendrogram of hierarchical cluster analysis of the SW sources in the wet season

Fig. 10
figure 10

Dendrogram of hierarchical cluster analysis of the SW sources in the dry season

Piper classification

A Piper triplot (1944) helps us to classify the quality of water sources. In the present study, Piper trilinear diagram constructed on the basis of the experimental results are shown in Figs. 11 and 12. The Piper diagrams revealed that majority of GW samples [86% (wet) and 92% (dry)] and SW samples [80% (wet) and 86% (dry)] fall in the area 5 of the diagram [carbonate hardness (secondary alkalinity) exceeds 50%] which suggests that chemical properties of these water samples are significantly dominated by the alkaline earths (Ca2+ and Mg2+) over the alkalis (Na+ and K+). The strong acids (SO42− and Cl) exceed the weak acids (represented by HCO3). The second group having remaining water samples fall in the area 9 of the diagram, indicating no dominant type of water class (no one cation–anion pair exceeds 50 percent). Thus, Piper analysis of the water samples shows preponderance of alkaline earths (Ca2+, Mg2+) and weak acids (HCO3) over alkalis (Na+, K+) and strong acids (SO42−, Cl) in the water of the majority of sites of the study area in the both wet and dry seasons, thereby indicating calcium and magnesium bicarbonate type of water. Piper classification of water samples in both the wet and dry season are presented in Table 14.

Fig. 11
figure 11

Piper diagram of GW samples

Fig. 12
figure 12

Piper diagram of SW samples

Table 14 Piper classification of water sources

Conclusions

This study shows that multivariate analysis is a useful method that could helps in determining the sources and extent of pollution.

The hydrochemical and multivariate analysis of water samples in combination with conventional graphical methods revealed the present status of water quality with respect in and around the Jia-Bharali river basin of North Brahmaputra Plain, India. The study showed that majority of the water sources had preponderance of alkaline earths (Ca, Mg) and weak acids (HCO3) over alkalis (Na, K) and strong acids (SO4, Cl), thereby indicating calcium and magnesium bicarbonate type of water. The water regime was acidic to alkaline and could be classified under ‘soft’ to ‘moderately hard’ category. Major anions and cations except iron were found well within maximum permissible limits for drinking and irrigational purposes. The water regime was also found free from alkalinity, chloride and nitrate-nitrogen hazard. Principal component analysis demonstrated the geogenic and anthropogenic influences on the water quality of the study area. Hierarchical cluster analysis grouped water sources into statistically significant clusters based on the similarity of water quality characteristics. Overall, this study is a first approach and contributes to the establishment of water quality in the Jia-Bharali river basin and its adjoining areas. It lays the groundwork for the development of baseline water quality data that will be essential for the sustainable development of important ecosystems of the locality of Jia-Bharali river basin.