Introduction

Trace metals attributing as common pollutants are found to be widely distributed in the river catchments originating from natural sources and processes as chemical weathering, soil erosion, fallout of aerosols from marine, volcanic or arid soil sources. However, as a result of human inputs and activities (Merian 1991) the level of these metals in the environment has increased tremendously. Due to simplicity the univariate statistical analysis has been generally used to treat trace element data in groundwater (Helena et al. 2000). However, multivariate analysis such as principle component analysis (PCA) and cluster analysis is widely used to explain the correlation amongst a large number of variables in terms of a small number of underlying factors without losing much information (Meglen 1992; Ogwoeleka 2015; Pazand 2016; Qian et al. 2016). This method can also help in measuring natural associations between samples and/or variables (Wenning and Erickson 1994) and thus highlight the information which is not available at first glance.

For this study, lower Jia Bharali catchment and adjoining areas in central part of North Brahmaputra Plain (NBP) was selected which is characterized by more than 800 m thick older and younger Alluvium deposited by the west flowing Brahmaputra river and the south flowing trans Himalayan rivers (Khound Nayan et al. 2013). The river regime is highly dynamic with frequent channel changes and copious sand deposition. Average sediment load carried by these rivers are coarse, facilitating easy percolation and recharge of groundwater regime. Published reports (Chakrapani 2005; Singh et al. 2005; Jameel and Hussain 2007) reveal that most of the Indian rivers are carriers of untreated sewage, industrial effluent and runoff from agricultural and urban land to the surface water bodies present in their basins. Due to the absence of industrial zone and large scale irrigation projects the surface and ground water regime of the study area are expected to be free from such condition and bear a pristine signature of the natural environment. The people in the Jia Bharali river basin seldom use the surface water for drinking as well as for various household purposes including irrigation of crops, rearing of poultry and fish, etc. The population of the basin mainly consists of farmers and fishermen who depend on the surface water sources for their livelihood. In this context, the major objectives of this study were to (1) determine natural associations between surfacewater samples and metallic variables; (2) investigate the spatial and temporal variation of trace metal composition of the surfacewater sources and, (3) demonstrate the usefulness of the statistical analysis to interpret the trace element composition of the surfacewater sources of Jia Bharali river basin.

Materials and methods

Study area

The Jia Bharali catchment is bounded by longitudes 92o00/-93o25/E and latitudes 26o39/-28o00/N. The drainage system of north Brahmaputra plain, N E India is made up of a large number of river systems flowing from Arunachal Himalaya in the north and debouching into the Brahmaputra in the south. It is an actively subsiding foreland basin with river regime bearing neotectonic changes and catchment area tectonics (Phukon and Machahary 2011). The precursor trunk channel of the Jia Bharali, known as the Kameng, flows orthogonal to the Himalayan thrust pattern and deflects along the Tipi Thrust in the north and the foothills fault in the south, respectively, before debouching into the foreland at Bhalukpung (92o65/E, 27o01/N). Further downstream, the River is known as the Bharali with its catchment localized within the Brahmaputra alluvium. The major tributaries of Jia Bharali are the Diju, Namiri, Upar Dikrai, Khari Dikrai, and Bor Dikrai coming from the foothills on the left bank and the Mansiri with numerous feeders from the Balipara hills on the right bank. The Jia Bharali river has water yield of 85.8 litre per second/km2 and sediment yield of 4721 tons km2/year (Goswami 1985; Viswanathan and Chakrabarti 1977). The geomorphic mapping of the alluvial catchment has revealed the presence of a number of river terraces at different topographic levels with the present Jia Bharali (Jia meaning alive in local language) channel system occupying the lowest level. Older Alluvium is found to be composed of partially indurated and oxidized sand, silt and clay form higher topographic levels in the study area which are found dotting the landscape within an overall younger alluvial terrane of present day river deposits. The higher topographic surfaces are used for extensive tea-plantations while the lower ones are mostly used for paddy cultivation. The climate of the study area is sub-tropical in nature with hot and humid summer, (average temperature 29 °C), heavy monsoon rain (May–September) followed by inundation of almost the entire area, dry autumn and cold winter (November–February, average temperature 16 °C) (Jain et al. 2007). Rainfall during July and August is the highest and amounts to more than 33 % of the annual rainfall. Depending on the prevailing climate and influence on different winds mainly monsoon, the basin experiences the highest rainfall during June–September with average rainfall ~1500–2000 mm (Jain et al. 2007). The rock types encountered in Jia Bharali river basin are successively the Siwaliks (Tertiaries), Gondwanas, and Precambrian Bichom Group, Tenga Formation, Bomdila Group and Sela Group (Nandy et al. 1971). The drainage network of the Jia Bharali river is a rectangular drainage in the western part of the basin where the main streams and their tributaries display right-angled bends (Thornbury 1989). The eastern part of the catchment is made up of dendritic drainage with tectonically inactive terrain (Ahmed 2001). All the major streams and the rivers in this part of the catchment follow the structural trends of the Himalayas. The area is almost free from industrial activities and agriculture is the main economic activity within the basin. Thus, studied surfacewater sources receive a large amount of agricultural runoff and untreated domestic waste water from the basin.

Water sampling and chemical analysis

Water samples from 35 surface water sources consisting of small streams, rivers, and ponds spread over the entire area of the Jia Bharali river basin are collected from pre-selected locations twice a year (wet and dry seasons) for a two year period from 2009 to 2011. The sampling programme is conducted as (1) W1: wet season (July 2009) (2) W2: wet season (July 2010) (3) D1: dry season (February 2010) (4) D2: dry season (February 2011). Thus, sampling and analysis process includes 35 surfacewater sources and four seasons (two wet and two dry seasons) for the entire study period. Standard methods (APHA and AWWA (American Public Health Association) 1998) are followed in collection, storage and analysis of the water samples. The metals are estimated in an Atomic Absorption Spectrometer (Varian SpectrAA 220) following standard acid digestion technique and the results are further verified for accuracy by analyzing a few random samples in the ICP-MS. Statistical analysis is carried out using statistical package for social sciences (SPSS Version16) (Pazand 2016; Qian et al. 2016).

Results and discussion

Spatial and temporal variations of the trace metals

In this study, arsenic concentration of the surface water samples is in the range of BDL to 0.003 mg/L (mean 0.001 mg/L) in all the wet seasons and from BDL to 0.008 mg/L (mean 0.001 mg/L) in all the dry seasons. All of the values are below the recommended maximum permissible limit of drinking water standards (0.01 mg/L, WHO 1984). Cd content of the surface water sources varies from BDL to 0.015 mg/L (mean 0.003 mg/L) in the wet seasons and from BDL to 0.041 mg/L (mean 0.008 mg/L) in the dry seasons. However, 20.1 % of the samples in the rainy season and 34.4 % of the samples in the dry season have cadmium content higher than the desirable limit of 0.005 mg/L (WHO 1984). The distribution of Cd in the surface water is relatively consistent throughout the study area, implying that it could have been derived from non-point sources such as agricultural runoff. The cobalt contents of the surface water sources in the study area are in the ranges of BDL to 0.061 mg/L with a mean value of 0.018 mg/L in the wet seasons and from BDL to 0.106 mg/L with a mean value of 0.04 mg/L in the dry seasons. WHO (2004) has not proposed any drinking water guideline value for Co, but the values in the present work appear considerable. In this study, the copper concentration of the water samples is found far below the desirable value of 2 mg/L (WHO 1984). Copper content of the surface water samples ranges from BDL to 0.20 mg/L (mean 0.04 mg/L) in the wet seasons and BDL to 0.21 mg/L (mean 0.10 mg/L) in the dry seasons. In this work, the total Cr concentration [both Cr(III) and Cr(VI)] varied from low to sufficiently high values, i.e., from 0.02 to 0.43 mg/L (mean 0.13 mg/L) in the dry seasons and from BDL to 0.19 mg/L (mean 0.07 mg/L) in the wet seasons. 94.4 % of the surface water sources show more Cr concentration in the dry season. WHO (2004) has suggested a maximum permissible value of 0.05 mg/L for Cr in drinking water and most of the sources in the studied area have Cr exceeding this value. So, it is clear that most of the surfacewater sources of the study area had high Cr content in both the wet and the dry seasons. Iron contents varies from 0.10 to 1.39 mg/L (mean 0.43 mg/L) in the wet seasons and 0.12 to 3.61 mg/L (mean 0.81 mg/L) in the dry seasons with most sources exceeding WHO (2004) limit (0.30 mg/L) for drinking water. It is obvious that in the wet season 54.4 % of surface water sources have Fe content below the maximum permissible limit while in the dry season, Fe content increases to well above this limit for 94.4 % surface water sources, mainly due to the reduction in water volume. 85.8 % of sources in the dry seasons and 45.8 % of sources in the wet seasons show Fe content in the range of 0.3–1.5 mg/L. Manganese commonly coexists with iron in water. However, where this occurs, the concentrations of iron are generally greater because iron has a greater crustal abundance. The surface water samples of the study area show Mn content in the range of BDL −0.14 mg/L (mean 0.04 mg/L) in the wet seasons and BDL −0.20 mg/L (mean 0.08 mg/L) in the dry seasons. Only 8.6 % of the samples in the wet seasons and 34.3 % of samples in the dry seasons have Mn above the permissible limit of 0.10 mg/L (WHO 1984). In this work, Ni contents spread from BDL to 0.12 mg/L (mean 0.04 mg/L) in the wet seasons and from BDL to 0.20 mg/L (mean 0.08 mg/L) in the dry seasons. 60.1 % of water samples during the wet season and 80.1 % of water samples during the dry season show Ni content above the permissible limit of 0.02 mg/L. Surfacewater sources have been found to have Ni content in the range of BDL −0.12 mg/L in the wet seasons and from BDL −0.21 mg/L in the dry seasons. 91.5 % of sources have comparatively higher nickel concentrations in the dry season than in the wet season. In this work, the concentration of Pb was in the range of BDL −0.30 mg/L (mean 0.11 mg/L) in the dry seasons and BDL −0.17 mg/L (mean 0.06 mg/L) in the wet seasons. Most of the values exceed the maximum permissible limit of 0.05 mg/L for drinking water (WHO 2004). In this study, almost all the surface water sources have higher Pb content in the dry season than in the wet season. 54.4 % of the surface water samples have Pb above the permissible limit of 0.05 mg/L (WHO 1984) during the rainy season while in the dry season, 85.8 % of the surface water sources show Pb above the limit corresponding to the low pH of the sources. Zinc shows a significant content in the surface waters from BDL −0.10 mg/L (mean 0.02 mg/L) in the wet seasons and BDL −0.11 mg/L (mean 0.04 mg/L) in the dry seasons. However, even the highest values of Zn measured in this work are far below the guideline value of 3 mg/L (WHO 2004). River water samples have low Zn content compared to that of pond water sources ranging from BDL −0.05 mg/L in both the wet and the dry seasons.

Pearson-correlation matrix

It is a simplified statistical tool to show the degree of dependency of one variable to the other (Belkhiri et al. 2010). The generated correlation values are complied in the Tables 1 and 2. The poor correlation between As and Fe in both the seasons may be caused by the removal of Fe as FeCO3 solids (Lee et al. 2010). It may be concluded that As could be released into water due to reductive dissolution of MnO(OH) or FeO(OH) as bacteria oxidizes organic matter to gain energy (Ohno et al. 2005; Shamsudduha et al. 2008). The existence of positive correlation between iron and manganese in both the season suggests natural occurrence of these two metals from dissolution of soils, rocks, and minerals. A significant positive correlation is found to exist between (1) Cu and Fe (r 0.47) (2) Co and Pb (r 0.44) and (3) Fe and Zn (r 0.48) signifying their similar source of geogenic origin and mobility. Pearson correlation analysis also shows that most of the trace metals are weakly and moderately correlated to each other at r < 0.05 level.

Table 1 Correlation matrix of the trace metals in the wet season
Table 2 Correlation matrix of the trace metals in the dry season

Multivariate statistical analysis

Raw data treatment

To confirm the normal distribution, of each variable required for multivariate statistical analysis is checked by analyzing kurtosis and skewness statistical tests (Lattin et al. 2003). The original data shows a wide range of skewness values indicating that the data are far from normal distribution. Since most of the values of kurtosis and skewness are >0, the raw data of all variables are transformed in the form x′ = log 10(x). After transformation, the skewness values range from −1.459 to 1.191 in the wet season and −0.826 to 3.346 in the dry season while kurtosis values ranged from −1.777 to 2.155 in the wet season and −1.205 to 2.838 in the dry season, respectively, indicating that all the data are in normal distribution or close to normal distribution. KMO is a measure of sampling adequacy for the proportion of common variance caused by underlying factors. The value of KMO close to 1.0 generally indicates that principal component analysis or factor analysis may be useful, which is the case in this study: KMO 0.44 in the wet season and 0.46 in the dry season. Bartlett’s test of sphericity indicates whether correlation matrix is an identity matrix, i.e., the variables are unrelated (Shrestha and Kazama 2007). The high significance level (>0.05) in this study indicates the presence of close relationships among the variables in the wet season (0.39) and strongly in the dry season (0.62). KMO and Bartlett’s tests are presented in the Table 3. The generation of positive skewness and kurtosis values demonstrates asymmetric distribution of almost all the heavy elements in the studied basin (Fig. 1).

Table 3 KMO and Bartlett’s test of sphericity
Fig. 1
figure 1

Locations of surfacewater sampling points in Jia Bharali lower catchment and adjoining region

Principal component analysis

Principal component analysis (PCA) is a multivariate statistical technique used for data reduction and for deciphering patterns within large sets of data (Farnham et al. 2003). Due to the variation of concentration, PCA is applied to the correlation matrix of 10 trace metallic variables. The eigenvectors of the correlation matrix are principal components and each original observation is converted to principal component score by projecting it onto the principal axes (Chen et al. 2007). The elements of the eigenvectors that are used to compute the scores of the observations are called principal component loadings. Typically, the raw data matrix can be reduced to two or more principal component loadings that account for the majority of the variance. The first principal component loading explains the most variance and each subsequent component explains progressively less. As a result, a small number of factors usually account for approximately the same amount of information as the much larger set of the original observations do. In this context, using the varimax normalization (Kaiser 1960) five principle components (PC) were extracted for the wet season and four factors for the dry season for the studied database and they are presented in the Tables 4 and 5. The results shows that the PC having eigen values more than 1 account for 71.45 % of the total variance in the wet season and 61.90 % of total variance for the dry season which is quite useful to identify the main sources of variation in the hydrochemistry of trace metals in all the seasons (Figs. 2, 3 and 4). PC 1 defines 21.1 % of variance in the wet season with strong positive loadings (>0.50) of Co and Ni while in the dry season it accounts 19.79 % of the total variance with strong positive loading of As, Mn and Zn. Thus, PC 1 can be attributed increasing urban activities as well as increasing household and industrial wastes in the catchment area. In the wet season trace elements with positive PC 1 loadings typically occur as soluble oxyanion in oxidizing waters, whereas Mn and Cr with negative PC 1 loadings were generally more soluble within oxygen depleted groundwater. The solubility of Mn as Mn2+, is very high in low pH (reducing) waters, and much lower in oxidizing waters because manganese precipitates as Mn(IV)-oxide scavenging other trace elements like Co, Pb, Zn, Cu and Ni from solution in more oxidizing waters (Farnham et al. 2003). Arsenic, the redox sensitive element, is commonly more soluble in oxidized groundwater occurring as oxyanion AsO4 2− or H2AsO4 . However, in reducing waters, arsenic tends to be incorporated in insoluble minerals (Langmuir 1997; Welch and Lico 1998). PC 2 with 15 % of the total variance in both the seasons loads positive scores of Cd, Co, Cr, Pb and Zn in the wet season and Cd, Co, Cu, Fe and Mn in the dry season. The enrichment of these trace metals in water may be due to soil leaching and chemical weathering linking particular geology of the plain (Simeonova et al. 2003; Zorer et al. 2008). These elements are most often found in crustal components (Zelenka et al. 1994; Atgin 2000; Kumar et al. 2001) and hence PC 2 may be ascribed as geogenic factor influencing the trace metal distribution in the surfacewater sources. The negative loadings of Fe and Mn in the monsoon season has been attributed to the dilution effect of the high rainfall in the basin and consequent increased flow of water into the surface water sources (Sharma and Subramanian 2010). PC 3 showing ~14 % of the total variance in both the season loads high positive scores of Cu, Ni, Pb, Zn in the wet season while As, Cd, Fe, Pb and Zn in the dry season. As agriculture is the mainstay of a large majority of the population of the study area, the extensive cultivation and wide use of chemical fertilizer may have added these metal ions into the surface water sources through surface runoff (Chatterjee et al. 2009; Toor et al. 2010) in both the seasons. Having positive loadings of micronutrient Zn in both the seasons, the surface water sources indicate the Zn enrichment of the basin and thus attribute their suitability for irrigation purposes. PC 4, which accounts for 11 % of the total variance in the dry season and same in the wet season, shows positive loadings of all metals except a few (Fe and Pb in the wet season, Cu and Mn in the dry season) representing the erosion effect during cultivation of soil and associated organic matter in the study area (Kazama and Yoneyama 2002; Fukasawa 2005). PC 5 accounts for 10 % only in the wet season and shows positive loadings for As, Cd, Cu, and Cr. Thus, PC 5 in the wet season receives contribution from road traffic runoff due to increased urbanization and industrialization of the studied river basin.

Table 4 Principal component analysis in the wet season
Table 5 Principal component analysis in the dry season
Fig. 2
figure 2

Seasonal distribution of As, Fe, Mn and Co (mg/L) in the selected surfacewater sources

Fig. 3
figure 3

Seasonal distribution of Cd, Cr, Pb and Ni (mg/L) in the surfacewater sources

Cluster analysis

Cluster analysis comprises a series of multivariate methods which are used to find true groups of data. In clustering, similar objects are grouped into the same class (Danielsson et al. 1999). Hierarchical cluster analysis is the most widely applied techniques to detect similar and dissimilar groups between the sampling sites (Shrestha and Kazama 2007). Hierarchical clustering joins the most similar observations, and then successively the next most similar observations. The levels of similarity at which observations are merged are used to construct a dendrogram. The datasets generated in this study was treated by the Ward’s method of linkage with squared Euclidean distance as a measure of similarity. The dendrogram of sites obtained by Ward’s method in Jia - Bharali river basin is shown in Figs. 5 and 6. On the basis of dendrogram 10 variables can be grouped into two main clusters in both the seasons. First group includes Cu, Cd, Pb, Co, Cr, Ni, Zn, Mn and Fe while group 2 includes only one element Arsenic. These differences in spatial distribution could be attributed to differences in the behaviour and reactions of these metals in water that affect their mobility. For example, Cd is highly mobile compared to Pb (Alloway 1995) and can affect the concentration and dispersion of Cd in surfacewater. Chemical leaching and precipitation have been found to be the major contributors to heavy metal contamination in soils and sediments (Avil et al. 2005) and consequently surfacewater. The study area is a rainforest area with very high annual rainfall (Jain et al. 2007). Therefore, it can be deduced that chemical leaching, weathering and rainfall may be the major contributory factors to heavy metal contamination of the sampled water in the study area.

Fig. 4
figure 4

Seasonal distribution of Cu and Zn (mg/L) in the surfacewater sources

Fig. 5
figure 5

Dendrogram showing the relationship among the variables in the wet season

Fig. 6
figure 6

Dendrogram showing the relationship among the variables in the dry season

Conclusion

The hydrochemical and multivariate analysis of surface water samples reveals the present status of surface water quality with respect to the trace metals in and around the Jia Bharali river basin of North Brahmaputra Plain, India. The study shows that the analysis of hydrochemical data using the multivariate statistical techniques such as principal component analysis and cluster analysis gives some information not available at first observation. Different statistical estimations, viz. standard deviation, variance, skewness, and kurtosis, performed for each metallic constituent reveals their normal and asymmetric distribution in the river catchment. All the water samples analyzed in the present investigation are contaminated with Fe, Pb, Cr and partially with Mn, Ni lead, and Cd. Apart from the geogenic sources, anthropogenic sources linked to extensive tea cultivation and other which, however, needs to be ascertained with further close spaced sample analysis.