Introduction

Groundwater is among the most important natural resources available on earth. It is the primary valuable source of drinking water for humankind. Since there is not sufficient surface water to fulfill the ever growing demand for clean drinking water, use of groundwater cannot be avoided. Further, surface water is more vulnerable to contamination than groundwater. Due to rapid industrialization and population growth worldwide, groundwater utilization has increased very rapidly in the past decades. Groundwater is being extensively utilized in industry, agriculture, drinking water supply, and daily routine human activities. Overuse of groundwater resources has not only deteriorated the quality of water but also made it susceptible to various contaminants. Water table in some parts is lowering rapidly which will soon reach to an alarming situation. Therefore, available groundwater resources need to be used in a sustainable way by the present generation.

Today, groundwater pollution has emerged as an environmental challenge for new generation (Vodela et al. 1997). Clean potable water is essential to almost all living organisms. Millions of human deaths have already been claimed by the consumption of contaminated water. Waterborne diseases are spreading very rapidly due to the leaching of undesirable substances into groundwater and improper sanitary conditions. Due to diseases associated with the consumption of polluted groundwater caused by insufficient cleanliness, the United Nations has declared clean water and sanitation as basic human rights (UN 2006).

Both natural processes and human activities are responsible for the deterioration of the groundwater quality (Andrade et al. 2008; Kouras et al. 2007; Gu et al. 2017). Anthropogenic activities such as rapid industrialization, excessive use of phosphate fertilizers, pesticides, herbicides, domestic effluents, and over utilization of groundwater have caused the placement of a large quantity of unwanted contaminants into groundwater and surface water (Singh et al. 2004; Devic et al. 2014; Selvakumar et al. 2017). Thus, anthropogenic activities are important drivers of both surface and groundwater pollution (Niemi et al. 1990; Ayotte et al. 2011). Among the contaminants, heavy metals and metalloid pose a serious threat to human health. The concentration of the metalloid arsenic has risen above the concentration limits set for groundwaters of many countries including Punjab (Sharma et al. 2016), Jharkhand (Alam et al. 2016; Chakraborty 2015), Manipur (Chandrashekhar et al. 2016), Mizoram (Blick et al. 2016), Arunachal Pradesh (Shah 2015), Andhra Pradesh (Hussain and Rao 2014), Assam (Das et al. 2017), Himachal Pradesh (Rana et al. 2016), Telangana (Purushotham et al. 2017), Chhattisgarh (Patel et al. 2017; Singhal et al. 2018), Uttar Pradesh (Shah 2017; Kumar et al. 2017a, b; Olea et al. 2018), Bihar (Chakraborti et al. 2016), and West Bengal in India (Smedley and Kinniburgh 2002; Rahman et al. 2009; Shrivastava et al. 2017; Bhowmick et al. 2018). Thus, the assessment of groundwater quality with respect to the concentration of NO3, PO43− SO42−, NH4+, Cl ions and heavy metals and semimetals such as Cu, Cr, Co, Ni and As is very important and necessary. Groundwater and surface water quality can be assessed very effectively by employing statistical tools such as univariate (mean, minimum, maximum and standard deviation), bivariate (correlation), and multivariate analysis (principal component analysis, cluster analysis, and factor analysis) (Omo-Irabor et al. 2008; Kazi et al. 2009; Zhao et al. 2011; Ravikumar and Somashekar 2017; Yidana et al. 2018). Generally, data fitness for principal component analysis and factor analysis is performed by means of KMO (Kaiser–Meyer–Olkin) and Bartlett’s sphericity test (Zhao et al. 2011; Singh et al. 2013). Ward’s method of cluster analysis using squared Euclidean distance (Fovell and Fovell 1993) was performed for clustering. The Piper diagram has also been applied widely to find out the characteristics of any water system.

The objective of the present paper is to analyze the groundwater quality of two different blocks (Reoti and Belahari) of the Ballia district, Uttar Pradesh, India, with emphasis on arsenic distribution and contamination. So far, very limited reports on groundwater quality of this district using multivariate statistical tools are available.

Sampling site

The Ballia district is situated in the eastern part of the state Uttar Pradesh, India, and is part of the central Ganga plain. The north latitude and east longitude are 25°23″ to 26°11″ and 83°38″ to 84°39″, respectively. Its total geographical area is 3168 km2 supporting a population of 2.75 million. The district has been divided into six tehsils and seventeen blocks. Chhoti Saraju, Ghaghra, and Ganga are the rivers that flow through the district (Ali et al. 2012; Chauhan et al. 2009). According to reports of Tripathi (2007–2008), old and younger alluvium constitutes the major physiography of district. The climate of the district is sub-humid supporting grassland vegetation. The maximum temperature was reached in May (32.25 °C) followed by June (30.75 °C). The minimum recorded temperature is 12.15 °C during December followed by 15.9 °C during January. The highest and lowest humidity has been recorded in August (82.5%) and September (80%), respectively. The average rainfall in the district is 983 mm. The Reoti and Belahari blocks (area 140 km2 and population 0.1 million) are the sites of study for the groundwater quality characteristics.

Materials and methods

Sample collection and analysis

A total of 72 samples from hand pumps were collected into pre-washed and acidified polyethylene bottles of 300 ml capacity. Before sampling, the hand pumps were driven for 5–10 min to flush out stored water. Samples were collected in two sets from each site. One set contained no preservative and was used for the anions analysis. The other set of sample was treated with HCl (1 ml per 500 sample) and used for the analysis of heavy metals like iron, manganese, copper, chromium, cobalt, and metalloid arsenic. Before analysis, all samples were stored at 4 °C in a refrigerator. Some parameters like temperature, pH, electrical conductivity (EC), oxidation–reduction potential (ORP), and total dissolved solids (TDS) were determined on site using a portable water analysis kit (Decibel Dynamics Ltd., New Delhi, India).

Water samples were digested with a diacid mixture made up of nitric acid and perchloric acids in the ratio 10:1. Equal volume of samples and diacid mixture were mixed properly and evaporated on a hot plate till no turbidity and color appeared in the solution. Finally, the volume was adjusted to that of the original sample volume by adding Milli-Q water and analyzed for the desired metal. The metal and metalloid (arsenic) concentrations were determined using a Perkin-Elmer AAnalyst 800 Atomic Absorption Spectrophotometer (AAS). Flame method of AAS was applied to determine the concentration of metals.

The concentration of total arsenic in sample was analyzed by a Hydride Generation-Atomic Absorption Spectrophotometer (HG-AAS) equipped with the instrument. Before analysis, all samples were treated with potassium iodide and ascorbic acid to reduce any arsenate(As V) to arsenite(III). The detection limit for arsenic was 1 µg/L.

The concentrations of phosphate, nitrate, nitrite, ammonium, chloride, sulfate and bicarbonate were determined by standard methods.

Sodium adsorption ratio (SAR) and percent sodium were calculated according to a formula used by Subba Rao (2006), Singh et al. (2013), Sharma et al. (2018), and RamyaPriya and Elango (2018). All cation concentrations are expressed as meq L−1.

$$\begin{aligned} {\text{SAR}} & = {\text{Na}}^{ + } /\left[ {\left( {{\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } } \right)/2} \right]^{0.5} \\ \% {\text{Na}} & = \left( {{\text{Na}}^{ + } + {\text{K}}^{ + } } \right)100/\left( {{\text{Ca}}^{2 + } + {\text{Mg}}^{2 + } + {\text{Na}}^{ + } + {\text{K}}^{ + } } \right) \\ \end{aligned}$$

Pearson’s correlation coefficient was calculated using Microsoft Office Excel 2007, while principal component analysis, factor analysis, and cluster analysis were computed using SPSS 16. A Piper diagram was generated using GW chart software (Tables 1, 2 and 3).

Table 1 Pearson’s correlation coefficient for pre-monsoon groundwater characteristics
Table 2 Pearson’s correlation coefficient for monsoon groundwater characteristics
Table 3 Major water quality characteristics, abbreviations, units and determination methods

Multivariate statistical analysis

Prior to factor analysis, data appropriateness was checked by KMO and Bartlett’s sphericity test. Multivariate analysis is an important statistical method which can easily be used to identify the factors governing the quality of a water system and help us to manage (regulate) those factors very strictly to minimize contamination (Reghunath et al. 2002; Simeonov et al. 2004). Multivariate statistics such as principal component analysis, factor analysis, and cluster analysis have been used for the assessment of surface water quality (Yidana 2010; Noori et al. 2010; Shrestha and Kazama 2007; Zhao et al. 2011; Singh et al. 2018; Kashyap et al. 2018). Kazi et al. (2009) applied this tool for the quality determination of a polluted lake ecosystem.

Principal component analysis (PCA) is a method of dimension reduction. Here, a large number of factors are minimized in such a way that the resulting smaller factor represents maximum variance of data. Generally, PCA transforms a large number highly correlated variables to small uncorrelated variables, i.e., principal components (PCs) representing most of the variation in the data (Singh et al. 2005; Shrestha and Kazama 2007; Kouras et al. 2007). After extraction, minor principal components (PCs) showing very little contribution to data variation are eliminated (Yeung 1999) so that data can be represented in its original form with a minimum loss of information (Helena et al. 2000; Vega et al. 1998).

Hierarchical agglomerative cluster analysis (HACA), the most common clustering method, groups the samples according to their level of similarity or dissimilarity. The most similar objects are grouped first followed by higher clustering at a consecutive stage. The result is represented in the form of a dendrogram. The purpose of cluster analysis lies in the determination of distinct patterns within multivariate data (McKenna 2003; Kumar et al. 2018). Ward’s method using squared Euclidean distance is considered to be the most appropriate method for dendrogram preparation (Kotti et al. 2005; Gulgundi and Shetty 2018).

Range, mean, maximum, minimum, and standard deviation are the tools of univariate descriptive statistics (Omo-Irabor et al. 2008; Thapa et al. 2018).

Results and discussions

Physicochemical characteristics

Water quality analyses of 72 groundwater samples collected during the pre-monsoon and monsoon season are represented in table (quote the table number). The mean temperature of groundwater did not vary much, i.e., the mean being 26.31 and 26.34 °C during the pre-monsoon and monsoon seasons, respectively. The groundwater samples showed a pH variation from mild acidic to near neutral. The mean pH varied from 6.58 to 6.78 in the pre-monsoon and monsoon samples, respectively (Figs. 1, 2, 3, 4, 5, 6 and 7).

Fig. 1
figure 1

Scree plots of the eigenvalues from PCA during the pre-monsoon season

Fig. 2
figure 2

Scree plots of the eigenvalues from PCA during the monsoon season

Fig. 3
figure 3

Hierarchical dendrogram of pre-monsoon data

Fig. 4
figure 4

Hierarchical dendrogram of monsoon data dendrogram using the Ward’s method

Fig. 5
figure 5

Rotated loading plots of first three PCs in a pre-monsoon and b monsoon seasons

Fig. 6
figure 6

Piper diagram of groundwater samples during pre-monsoon and monsoon season

Fig. 7
figure 7

Map showing sampling locations for groundwater

Electrical conductivity varied from 0.32 to 1.06 mS/cm (mean value 0.56) in the pre-monsoon and 0.35 mS/cm to 1.32 mS/cm (mean 0.71) in the monsoon samples. The higher conductivity during the monsoon season is probably due to the leaching of minerals. The ORP value falls in the range − 147 mv to 144 mv (mean − 45.08) and − 142 mv to − 42 mv (mean − 76.81) in the pre-monsoon and monsoon samples, respectively. The negative mean value of ORP indicates reducing groundwater conditions which are responsible for the dissolution of arsenic-bearing minerals. All water samples are within the prescribed limits for total dissolved solids concentration. US-EPA has set the permissible limit of TDS in drinking water as 500 mg/L. The sulfate concentration ranged from 1.06 to 37.54 mg/L (mean 7.98) and 1.06 to 10.76 (mean 4.38) mg/L in the pre-monsoon and monsoon samples, respectively. The phosphate concentrations varied from 1.19 to 2.55 (mean 1.75) and 0.83 to 5.71 (mean 2.60) in the pre-monsoon and monsoon samples, respectively. However, the concentration of sulfate and phosphate is within the prescribed permissible limits of WHO. The permissible limit of sulfate and phosphate in groundwater is 500 and 5 mg/L, respectively. The mean chloride concentration is below the WHO recommended limit of 250 mg/L in both the pre-monsoon and monsoon samples. Nickel, manganese, and chromium concentrations were found to be above the permissible limit in nearly all samples except a few samples from the pre-monsoon season for chromium. Copper concentrations varied from 1.41 to 5.87 mg/L and 5.32 to 7.35 mg/L during the pre-monsoon and monsoon seasons, respectively. WHO has recommended copper concentration in drinking water below 1.00 mg/L. The arsenic concentrations in the groundwater samples were found to be in the range 4.18 to 75.60 µg/L (mean 24.67) and 0.34 to 74.46 µg/L (27.81) during the pre-monsoon and monsoon seasons, respectively. Sixty-nine samples showed arsenic concentrations above the permissible limit defined by WHO. The maximum arsenic concentration was 75.62 µg/L, and the concentration was approximately same in both the pre-monsoon and monsoon seasons.

Groundwater hydrochemical facies

The trilinear Piper diagram was prepared using the software GW chart. The diagram reveals very clearly the relative concentrations of major ions present in the groundwater samples collected. The diagram shows a combination of two triangles and a single diamond above the adjacent triangles in terms of anions like Cl, SO42−, HCO3, CO32−and cations like Na+, K+, Ca2+ and Mg2+. The left triangle shows major cation concentrations and the right one major anion concentration. The collected groundwater samples collected show the major composition as a Ca2+−Mg2+−Cl−SO42− type, calcium type, no dominant type, calcium chloride type and chloride type. Ca2+−Mg2+−Cl−SO42−-type composition of groundwater has been reported previously (Laluraj et al. 2006; Ravikumar et al. 2010; Dar et al. 2011; Jasmin and Mallikarjuna 2006; Yadav et al. 2018; Aher 2017). Calcium chloride-type water may be produced by either reverse ion exchange between sodium and calcium (Adams et al. 2001; Sappa et al. 2014; Kumar et al. 2017a, b) or mixing of freshwater and older saline water (Adams et al. 2001). According to Chebotarev’s sequence, the chloride concentration of water increases along groundwater flow from recharge zone to discharge zone (Yakubo et al. 2009). The presence of chloride-type water indicates its withdrawal from very deep strata, i.e., a discharge zone in groundwater. Chloride-type dominated water has recently been reported (Chitradevi and Sridhar 2011; Kshetrimayum and Bajpai 2012). The presence of chloride in groundwater results from weathering of rock materials, industrial effluents, domestic effluents (Karanth 1987; Srinivas et al. 2017), and leaching of chloride-based pesticides applied to agro-ecosystems. The Piper diagram shown here is similar to that presented by Saleh et al. (1999). Analysis of the Piper diagram reveals that groundwater samples from the Reoti and Belhari blocks are very similar in origin.

Groundwater quality criteria of for irrigation purpose

All groundwater samples have total dissolved solids concentrations under the satisfactory water class. With respect to salinity hazard, it can be concluded that more than 85% samples are good for irrigation during the pre-monsoon season. But, in the monsoon season, approximately 64% of the samples are safe for irrigation purposes. Similarly, most of the samples have a sodium concentration (20–40 mg/L) below safety limits (83.33%) during the pre-monsoon season, but, in the monsoon season, the Na concentration of only 58.33% of the samples is below a safe sodium level (20–40 mg/L). However, 16.67% of the pre-monsoon samples and 41.67% of the monsoon samples display sodium concentration in the range 40–60 mg/L. Due to precipitation, there is a decline in groundwater quality in terms of both salinity hazard class and sodium level. It is interesting here to note that all groundwater samples are excellent with respect to sodium hazard level.

Sodium adsorption ratio (SAR) is defined as the ratio of sodium ion concentration to the square root of the average calcium and magnesium ion concentrations. Increased sodium adsorption ratio (SAR) of water not only affects the physical and chemical characteristics of soil but also negatively alters the useful activity (biological organic matter decomposition) associated with native soil microorganisms. Biochemical properties of soil are also disturbed to a greater extent. The ultimate results of groundwater irrigation with a high SAR value can be described in terms of soil degradation and low productivity (Rietz and Haynes 2003; Ahada and Suthar 2017; Sharma et al. 2018). Irrigation with water having higher SAR value increases the soil sodium concentration which leads to the destruction of soil structure and aggregates (Mavi et al. 2012; Srinivas et al. 2017; Selvaganapathi et al. 2017). A positive correlation between SAR and clay dispersion (Nelson et al. 1997) has already been reported.

In general, water with a high SAR value poses a great hazard to soil. All samples have a sodium adsorption ratio (SAR) in the range 0–10. This shows the suitability of groundwater samples for irrigation. In conclusion, we can say that pre-monsoon groundwater samples are more suitable for agricultural irrigation than monsoon samples of groundwater. As 95.83% groundwater samples have an arsenic concentration above the WHO safer limit of 10 µg/L, chances of arsenic poisoning in humans and animals through vegetables, cereal grains, and fodder (Das et al. 2004; Huq et al. 2006, Zhao et al. 2010; Chakraborti et al. 2014; Sharifi et al. 2017; Chandra et al. 2018) cannot be avoided.

Pearson’s correlation statistics

Arsenic concentration is negatively correlated with oxidation reduction potential (ORP) in both the pre-monsoon and monsoon samples indicating reducing groundwater conditions (Guo et al. 2010). A negative correlation of arsenic concentration with total dissolved solids concentration implies that arsenic may bind to surfaces available on solid substances. A negative correlation of arsenic concentration with sulfate supports the idea that it is mobilized under reducing groundwater conditions (Smedley and Kinniburgh 2002; Ohno et al. 2005). A weak negative correlation of arsenic concentration with sulfate concentration has been reported by Singh et al. (2013). Arsenic concentration is positively correlated with iron, manganese, copper, chromium, ammonium (Winkel et al. 2011), bicarbonate and phosphate concentrations (Winkel et al. 2011). A positive correlation of arsenic with iron, manganese, and bicarbonate in both pre-monsoon and monsoon samples indicates a geological origin for arsenic in groundwater (Kouras et al. 2007). Sulfate concentration is positively correlated with ORP, electrical conductivity, total dissolved solids, bicarbonate, chloride, and nitrate. Nitrate concentration is positively correlated with arsenic concentration in the pre-monsoon samples, while it is negatively correlated (Kanel et al. 2013) with arsenic concentration in monsoon samples. A positive correlation of arsenic concentration with iron, phosphate, and manganese concentration supports the idea of reductive dissolution of arsenic-bearing minerals and thus arsenic enrichment in groundwater (Naidu et al. 2006; Sathe et al. 2018; Kumarathilaka et al. 2018). A positive correlation of arsenic concentration with bicarbonate furthermore strengthens the idea that reducing conditions exist in the groundwater (McArthur et al. 2001). A positive correlation of arsenic concentration with bicarbonate and phosphate concentrations demonstrates a possible competitive displacement of phosphate and bicarbonate bound arsenic, thus favouring arsenic mobilization (Wang et al. 2009; Gao et al. 2013). Competitive binding of arsenate and phosphate ions onto an iron-based compound such as goethite has already been reported (Gao and Mucci 2001). It furthers unveils the fact that the application of phosphate-based fertilizers may also contribute to arsenic enrichment in groundwater (Acharyya et al. 1999, 2000; Chidambaram et al. 2017; Khanikar et al. 2017). But, it was suggested that the application of fertilizers may not be sufficient to be a primary source of phosphate (McArthur et al. 2001). Arsenic concentration is positively correlated with chloride concentration in the pre-monsoon samples but negatively correlated with chloride concentration. Chloride concentration is positively correlated with electrical conductivity, total dissolved solids, and bicarbonate concentrations. Iron concentration is positive correlated with ammonium and phosphate concentrations. Electrical conductivity is positively correlated with major ion concentrations such as sodium, potassium, calcium, magnesium, nitrate, and sulfate concentrations indicating the possibility of groundwater contamination due to natural weathering of carbonate-bearing minerals (Yadav et al. 2014; Sheikh et al. 2017). A positive correlation of arsenic concentration with copper concentration demonstrates anthropogenic contamination of groundwater ((Chatterjee and Mukherjee 1999). A positive correlation of chloride concentration with sulfate concentration suggests the mixing of water from different aquifer systems (Saleh et al. 1999).

Principal component analysis (PCA)

Principal component analysis (PCA) was performed for a total of 22 different factors. Six and seven major principal components (PCs) were obtained from PCA analysis in the pre-monsoon and monsoon seasons, respectively. The scree plot demonstrates that the slope for an eigenvalue has been changed after component numbers six and seven for pre-monsoon and monsoon season, respectively. Only eigenvalue greater than one has been taken into account for principal component analysis (PCA). They altogether accounted for 76.25 and 78.52% of total variations in the pre-monsoon and monsoon seasons, respectively. Rotated values for each component for the pre-monsoon and monsoon seasons are shown in Tables 4 and 5. PC1 explained 27.45% of the total variance observed in the pre-monsoon season. In PC1, a strong positive loading is experienced due to electrical conductivity, total dissolved solids, sulfate, sodium, potassium, calcium and magnesium concentrations. The presence of these substances in groundwater may be due to the weathering of rocky mineral substances.

Table 4 Characteristics of groundwater quality
Table 5 PCA of groundwater characteristics in pre-monsoon

Principal component 2 (PC2) represented 18.56% of the total variance explained. Positive loading is being exerted by chloride, cobalt, copper, and chromium concentrations. The sources of these ions may be industrial. The PC3 contributed 9.46% of the total variance. PC3 showed positive loadings contributed by pH, potassium and nickel concentrations. However, negative loading was also displayed by oxidation reduction potential. PC4 accounted for 8.26% of total variation in groundwater quality and represented by ammonium, phosphate and arsenic concentrations. For PC5, a total of 7.11% variance was contributed by temperature and nitrate concentration. However, nitrate concentrations showed a negative loading in PC5. A strong positive loading was observed by bicarbonate concentration in PC6. It represented 5.41% of total variance in groundwater hydrochemistry. PCA analysis revealed that there is no more difference in groundwater quality/chemistry between pre-monsoon and monsoon samples although arsenic, which was loaded into PC4 in the pre-monsoon samples, was shifted into PC2 in monsoon samples (Tables 6, 7, 8 and 9).

Table 6 PCA of groundwater characteristics in monsoon
Table 7 Comparison of principal components during the pre-monsoon and monsoon seasons
Table 8 Sodium adsorption ratio (SAR) and percent sodium (% Na+) of the pre-monsoon and monsoon samples
Table 9 Status of groundwater characteristics in Ballia district, Uttar Pradesh, on the basis of TDS, EC, percent sodium (% Na) and sodium absorption ratio (SAR) for agricultural irrigation purposes

In the pre-monsoon samples, 55.47% variance is explained by first three PCs, while 53.38% of the total variance is contributed by the first three PCs in monsoon samples.

Hierarchical cluster analysis (HCA)

For cluster analysis, Ward’s method using squared Euclidean distance was applied. Squared Euclidean distance has already been used for cluster analysis (Fovell and Fovell 1993; Shrestha and Kazama 2007; Yadav et al. 2014; Magesh et al. 2017; Devi and Yadav 2018; Behera and Das 2018). Cluster analysis using Ward’s method gives most meaningful results (Vega et al. 1998; Sharma et al. 2017; Behera and Das 2018). The result of cluster analysis is demonstrated in the form of a dendrogram (tree-shaped structure). At distance 25, both the pre-monsoon and monsoon variables cluster into two major distinct groups. The first cluster in the pre-monsoon season is represented by Mg2+, K+, SO42−, TDS concentrations, and EC. These variables also show higher loadings in PC1 indicating their entry into groundwater system through the natural process of rock weathering (Subyani and Ahmadi 2010; Ishaku et al. 2012; Yadav et al. 2014; Gopinath et al. 2018). This cluster is also contributed by the variables like Ni, Cr, Cu, HCO3, Co, Mn, PO43− and NO3 concentrations most of which may be of industrial in origin. However, large agricultural inputs of fertilizers may also be possible sources of ions like NO3 and PO43−. The second cluster is a group of three variables showing similarity between Na+, Ca2+, and Cl concentrations. Interestingly, the variables of the first and second cluster in the monsoon season are observed to be almost similar to the variables of the first and second clusters present in the pre-monsoon season indicating a similar hydrogeochemistry of groundwater governed by similar types of variables.

Conclusions

Arsenic concentration together with that of other metals did not show a significant variation in the pre-monsoon and monsoon seasons. The results of the study strengthen and favor the theory of reductive dissolution of arsenic as revealed by positive correlations between arsenic and iron and manganese concentrations.