Introduction

Water contamination has caused several diseases and deaths worldwide, and therefore, is considered a sensitive issue for humankind. Particularly for Indian regions, the problem is aggravated due to rapid industrialization, urbanization, and population growth, thereby leading to more discharge of effluents/pollutants into the environment (Aravinthasamy et al. 2020; Bahita 2019; Swain et al. 2021; Wagh et al. 2018, 2019a, 2019b, 2020). The utility of water for a specific purpose needs the physical, chemical, and biological characteristics of the dissolved or suspended constituents to be under certain thresholds, which are regarded as the permissible limits. The primary objective of setting these limits is to ascertain no harmful effects to the user. Water pollution has become an alarming concern, mostly due to the ever-increasing anthropogenic interventions, as evident from several prior studies (Adimalla 2019, 2020; Bahita et al. 2021a, 2021b; Jasrotia and Kumar 2014; Jasrotia et al. 2018; Karunanidhi et al. 2021a, 2021b; Li et al. 2014, 2017; Zhang et al. 2019, 2021a, 2021b, 2021c). The water resources in numerous countries are in critical condition due to changes in their physicochemical nature. These changes cause damages to human beings, plants and animals. Upon consumption, the poor quality of water may cause diseases or toxic health effects to human beings and livestock. Similarly, the deterioration in water quality may directly hamper the survival conditions of the aquatic animals. To prevent the crops from diseases and improve the crop production, the water quality should be suitable. The persistent application of contaminated water for irrigation may also deteriorate the soil condition making it less productive or even unfit for agriculture (Zhang et al. 2020a, 2020b). Therefore, to preserve and maintain the natural ecosystem, regular assessments of water quality are crucial (Bahita 2019; Xu et al. 2019a, 2019b, 2019c).

Groundwater plays a vital role in fulfilling the water demands for India's municipal, agricultural, and industrial sectors. It is also a fact that India is the largest user of groundwater in the world, with an annual usage of 251 km3 (Sahoo et al. 2021; Swain et al. 2022). As groundwater is generally believed to be free from contaminants as their exposure is relatively lesser than surface water bodies, they are directly used for drinking purposes at several places, especially in developing countries (Dhal and Swain 2022). However, due to numerous natural or anthropogenic factors, the groundwater quality has been under threat of deterioration in terms of quality and quantity. The contamination of groundwater has been reported in several recent studies (Adimalla and Venkatayogi 2018; Adimalla et al. 2020; Kadam et al. 2021a, 2021b, 2021c; Li et al. 2016, 2021; Wu et al. 2015). Particularly in the urban areas of developing countries, the water quality has deteriorated significantly due to an exponential increase in industrialization. The effluents discharged from these industries have very harmful impacts on the surface and groundwater. Further, rapid population growth and urbanization in such regions have led to a remarkable increase in water demands. Under such circumstances, the degradation in water quality becomes a grave issue. Therefore, in the view of environmental management, it is necessary to evaluate and monitor groundwater quality regularly. Further, out of the 17 Sustainable Development Goals (SDG) established by United Nations in 2015, the SDG 6 of “Clean water and sanitation” aims to achieve universal and equitable access to safe and affordable drinking water for all. It also emphasizes improving the water quality by minimizing pollution and efficiently managing the contaminants (UNDESA 2016).

The geographic information system (GIS) is used to create, manage, analyse, and map various types of data. In the domain of groundwater and water quality assessments, GIS has been utilized to visualize the spatial variations in water quality parameters that helps in identifying the critical regions, which may be helpful to recognize the primary sources of contamination (Jasrotia et al. 2013, 2016, 2018; Karunanidhi et al. 2013, 2019a, 2019b; Khan et al. 2020; Thilagavathi et al. 2015; Venkatesan et al. 2021). However, evaluating many parameters to discern the overall water quality becomes a difficult task. This problem can be resolved by the water quality index (WQI), which integrates multiple parameters along with their relative importance to form a single unit. In addition, the statistical approaches viz., correlations and principal component analysis (PCA) are extensively used to carry out understand the inter-dependence and dominance of the water quality parameters (Li et al. 2019; Taloor et al. 2020; Wu et al. 2014). Both the geospatial and statistical approaches can certainly be helpful to the water resources managers for decision-making purposes.

The Gurgaon and Faridabad districts of Haryana, located in the periphery of New Delhi, have witnessed rapid urbanization and industrialization in last few years (Guptha et al. 2021, 2022; Rai and Saha 2015). This might have affected the groundwater quality of these two districts. Groundwater is the main source of for fulfilling the drinking, agricultural, domestic, and industrial water demands; yet there is hardly any study on the assessment of groundwater quality in recent period. As groundwater of the rapidly urbanizing centres is usually more vulnerable to pollution, their regular monitoring is necessary to minimize the associated human health risks. In this regard, this study aims to comprehensively evaluate groundwater quality using geospatial and statistical approaches. The specific objectives are, (a) to analyse the groundwater quality parameters and prepare geospatial maps depicting their distribution across Gurgaon and Faridabad districts, (b) to ascertain the groundwater suitability using WQI and maps to visualize their spatial variations, and (c) to apply the statistical approaches, viz., correlation and PCA to understand the inter-dependence and dominance among different parameters. In the subsequent sections of this paper, study area and data, methodology, results and discussion, and the key conclusions drawn from the study are described in detail.

Materials and methods

Study area and data

In the present study, a growing industrial area in the Gurgaon and Faridabad districts of Haryana, India, is considered to investigate groundwater quality. The study area lies between 76.6° E to 77.5° E longitudes and 28.2° N to 28.5° N latitudes. Gurgaon and Faridabad lie in the periphery of Delhi and are also the prominent constituents of India’s National Capital Region (NCR). The location of the study area is presented in Fig. 1. Due to the rapid growth of urbanization, industrialization, and population, the groundwater of these districts is under a severe threat of deterioration in terms of quality and quantity.

Fig. 1
figure 1

Location of the study area (Gurgaon and Faridabad districts, Haryana, India)

The study area is occupied by Quaternary alluvium and Pre-Cambrian meta- sediments of Delhi Super Group (Kaur et al. 2019; Singh and Bhatia 2013). Regarding aquifer system, the major part is underlain by Quaternary alluvium consisting of sand, clay, and silt, which forms the principal ground water bearing horizon. Thus, groundwater exists/occurs in alluvium and the underlying weathered or fractured quartzites. The semi-consolidated sand beds resulting from weathering and fracturing processes form potential aquifer zones (Kaur et al. 2019; Singh and Bhatia 2013). The average annual rainfall over the Gurgaon and the Faridabad districts is 596 mm and 542 mm, respectively, which is much lower compared to the average annual rainfall over the entire country (i.e., 1194 mm). Therefore, groundwater is a vital source to cater the water demands for irrigation, drinking, domestic and industrial purposes.

The data for the present study are procured from the Central Ground Water Board (CGWB), Government of India. The information on the water quality parameters viz., pH, electrical conductivity (EC), carbonate (CO32−), bicarbonate (HCO3), chloride (Cl), sulphate (SO42−), nitrate (NO3), fluoride (F), calcium (Ca2+), magnesium (Mg2+), sodium (Na+), potassium (K+), silica (SiO2), and total hardness as calcium carbonate equivalent (TH) are obtained for the year of 2017 (pre-monsoon season) from 28 sites over the study area. Out of these, 7 sites were located in Faridabad district, whereas 21 sites were located in Gurgaon district. The details (i.e., district, block, location, longitude and latitude) of all the 28 sites are summarized in Table 1. The procedure adopted by CGWB for sampling and measuring concentrations of different parameters are provided in the supplementary information.

Table 1 Location details of the sites for groundwater quality investigation

Methodology

In the present study, the data collected for different sites over the study area are assessed for the concentration of geochemical parameters and whether they are suitable for human drinking. In addition, prevalent statistical techniques are also used for the risk assessments of groundwater quality. These statistical techniques include water quality index (WQI), correlation and significance testing, and principal component analysis (PCA), which are briefly described as follows. The data are presented in the Table S1 of the Supplementary Information.

Comparison with a reliable standard and geospatial mapping

The assessment of groundwater quality regarding its suitability for drinking purposes needs a comparison with respect to the permissible limits recommended by the Bureau of Indian Standards (IS 10500: 2012) as mentioned in previous studies (Bahita et al. 2021a; Prasanth et al. 2012). Therefore, the concentration of individual water quality parameters from each site is compared against the corresponding limit prescribed by BIS (2012) to detect the sites affected by pollution. Further, some of the quality parameters are geospatially mapped through GIS to understand the spatial variation of the groundwater contamination.

Water quality index (WQI)

WQI method is very commonly used to assess the overall quality of water considering the concentrations of several parameters, which makes it easier for the water resource managers, especially for decision-making on water supply for specified purposes (Adimalla and Qian 2019; Adimalla and Taloor 2020; Bahita et al. 2021a, 2021b; Gaikwad et al. 2020). The procedure to compute WQI is presented below.

The quality rating of ith parameter (qi) is obtained as,

$${q}_{i}=100 \left[\frac{({\mathrm{V}}_{\mathrm{i}} - {\mathrm{V}}_{\mathrm{id}}) }{({\mathrm{S}}_{\mathrm{i}} - {\mathrm{V}}_{\mathrm{id}})}\right]$$
(1)

where, Vi is the actual (i.e., measured) value and Si is the permissible value of the ith parameter recommended by BIS (2012) at a given sampling site, Vid is the corresponding ideal value of that parameter (i.e., in pure water). All Vid values are equal to zero for drinking water parameters barring pH, whose ideal value is 7 (Bahita et al. 2021a, b). Note that for a water sample with pH value above (below) 7, the value of Si is taken as 8.5 (6.5).

Then the relative weight (Wi) for the ith parameter is computed from the following equation:

$${W}_{i}= \frac{{w}_{i}}{\sum_{i=1}^{n}{w}_{i}}$$
(2)

where, wi is the weight of each parameter and n is the total number of parameters. The wi values are referred from the previous studies (Batabyal and Chakraborty 2015; Ramakrishnaiah et al. 2009). In this study, based on the availability of permissible limits, eleven parameters are considered for computing WQI. These parameters along with their corresponding Si, wi and Wi are provided in Table 2.

Table 2 The parameters and their permissible limits, weights and relative weights taken for WQI calculation

Finally, the overall WQI was computed as follows,

$$\mathrm{WQI}= \sum_{i=1}^{n}{q}_{i}{W}_{i}$$
(3)

An example of the WQI calculation for site 1 (Pali) is provided in Table S2 of the Supplementary Information. The overall water quality rating based on WQI is presented in Table 3, which is referred from previous studies (Bahita et al. 2021a, b; Rao et al. 2020; Taloor et al. 2020).

Table 3 WQI-based rating of water quality

Correlation

The correlation, which is a measure of the association between two variables, can be evaluated in terms of the coefficient of correlation (r). The range of r is from − 1 to + 1. The correlation between the water quality parameters is generally obtained to assess their inter-dependence, which helps in understanding the dominant parameters (Mohamed et al. 2019; Popugaeva et al. 2020). This ultimately helps in decision-making to monitor the water quality. Although a highly positive or negative (i.e., close to + 1 or − 1) value of r typically indicates a strong correlation, it is essential to assess the statistical significance or confidence of the correlation (Helena et al. 2000; Rocha et al. 2019). This can be achieved by the p value test. The lower the p value, the more significant is the correlation and vice-versa (Helena et al. 2000; Popugaeva et al. 2020; Rocha et al. 2019). In this study, the significance of the correlation is tested at a confidence level of 90% (i.e., p value < 0.1).

Principal component analysis (PCA)

PCA is a multivariate statistical technique commonly used as a dimensionality reduction approach (Dutta et al. 2018; Xu et al. 2021). PCA extracts eigenvalues and eigenvectors from the covariance matrix of the originally correlated variables (Singh et al. 2004). The eigenvector consists of a list of coefficients, which are also regarded as loadings. The principal component is obtained by multiplying the loadings to the original set of variables (Andrade et al. 2020; Helena et al. 2000; Xu et al. 2019b, 2019c). Therefore, PCA results in forming a new set of orthogonal uncorrelated variables through linear combinations of originally correlated variables. This method was developed by Hotelling (1933) and is widely applied in studies pertinent to water quality assessments (Andrade et al. 2020).

Results and discussions

The measured values of the physicochemical water quality parameters are listed in Table S1 of the Supplementary Information. The statistics viz., maximum, minimum, average and standard deviation (SD), for all the water quality parameters are presented in Table 4. The acceptable pH range for drinking water is 6.5–8.5, based on the standards set by BIS (2012) and World Health Organization (WHO 2011). The minimum and maximum values of pH over 28 sites are 7.33 and 8.88, respectively. This shows the alkaline nature of water over the study area. There are seven sites with pH values exceeding the permissible limit of 8.5 (Table S1). Similarly, for EC, the minimum and maximum values are 240 and 7215 µS/cm, respectively. There are 15 sites with EC values exceeding the permissible limit of 2000 µS/cm. The concentration of carbonate is nil in 19 sites, whereas the maximum goes up to 48 mg/l at Jhanirola (Table S1). The concentration of bicarbonate ranges from 85 to 659 mg/l with an average and SD of 394.4 and 133.5 mg/l, respectively. The minimum and maximum values of chloride are respectively 21 and 2014 mg/l. There are five sites with chloride values exceeding the permissible limit of 1000 mg/l. The concentration of sulphate ranges from 5 to 1215 mg/l with an average of 276.8 mg/l and a SD of 301.1 mg/l. There are six sites with sulphate values exceeding the permissible limit of 400 mg/l. The concentration of nitrate ranges from 0 to 96 mg/l with an average and SD of 34.6 and 29 mg/l, respectively. There are eight sites with nitrate values exceeding the permissible limit of 45 mg/l. Similarly, the minimum and maximum values of fluoride over 28 sites are respectively 0.23 and 2.35 mg/l, with three sites exceeding the permissible limit of 1.5 mg/l. The parameters viz., nitrates and fluorides, can be very harmful above the permissible limit, as evident from several prior studies (Adimalla and Li 2019; Adimalla et al. 2018a, 2018b; Adimalla and Li 2019; Narsimha and Sudarshan 2017; Wu et al. 2015). However, high concentrations of nitrates and fluorides are reported in recent studies over the Indian region, which can be attributed to both natural and anthropogenic causes (Adimalla and Li 2019; Karunanidhi et al. 2019a, 2019b; Narsimha and Rajitha 2018; Narsimha and Sudarshan 2017). The cations viz., calcium, magnesium, sodium and potassium have exceeded their corresponding permissible limits at 5, 7, 18 and 3 sites, respectively. The concentration of silica ranges from 5 to 36 mg/l with an average of 21.6 mg/l and a SD of 6.6 mg/l. Similarly, the total hardness ranges from 100 to 1962 mg/l with an average and SD of 564.6 and 510.6 mg/l, respectively. There are eight sites with TH values exceeding the permissible limit of 600 mg/l. There is a significant variation in TH values, as evident from its high SD value.

Table 4 Statistical properties of the observed geochemical parameters

Overall, the concentrations of the water quality parameters at some sites exceeded the permissible limits for drinking purposes recommended by BIS (2012). This may be due to different natural factors, e.g., soil salinization, dissolution of minerals, the residence time of water–rock interactions, etc. However, anthropogenic activities play a major role in the deterioration of groundwater quality, especially in the urban and peri-urban regions. The percentage of samples beyond the permissible limits for each parameter is listed in Table 5.

Table 5 Percentage of samples beyond the permissible limits for drinking water recommended by BIS (2012)

The physicochemical parameters are also represented by the Piper diagram, as shown in Fig. 2. Piper diagram (Piper 1944) is an effective way to represent the ion concentrations, which helps to recognize the hydrochemical types of groundwater. From the left-side triangle in Fig. 2, it is evident that the groundwater over the study area is mostly sodium type. Similarly, from the diamond-shaped plot of the Piper diagram, it can be observed that the groundwater of the study area is mostly SO4-Cl-Na type, followed by HCO3-Na type. The right-side triangle reveals the samples to be mostly bicarbonate type or chloride type; however, water from some sites of Gurgaon district also belong to the ‘no dominant’ type.

Fig. 2
figure 2

Piper plot showing the concentration of geochemical parameters of groundwater over the study area (the orange-colored quadrilateral-shaped and blue-colored star-shaped marks represent the sites from Faridabad and Gurgaon districts, respectively)

The geospatial mapping of various water quality parameters is carried out to visualize their variations, which also helps to identify the contaminated regions. The spatial variations of pH, EC, TH, fluoride, nitrate and sulphate over the study area are presented in Fig. 3. It can be observed that the groundwater in the western portions of the Gurgaon district is highly alkaline in nature. The EC values are beyond the permissible limit of 2000 µS/cm over a majority of the study area. Groundwater over a significant portion of the study area is affected by very high EC, i.e., more than 3500 µS/cm, which goes up to as high as 7215 µS/cm. The groundwater in these regions also possesses total harness beyond the permissible limits for human drinking purposes. Although fluoride concentrations are within the limit over a majority of the study area, some portions of the Gurgaon district are affected by fluoride concentrations above 1.5 mg/l. Similar inferences can also be drawn for nitrate, as evident from Fig. 3. The concentrations of fluoride and nitrate in groundwater are crucial as their increasing concentrations may have very harmful impacts on human health upon consumption. Therefore, groundwater in these regions should be handled with the utmost care, especially when considered for drinking purposes. The sulphate concentrations are beyond the permissible limit over a major portion of the Faridabad district.

Fig. 3
figure 3

Geospatial mapping of the water quality parameters over the study area

The three principal natural factors controlling the groundwater hydrochemistry are precipitation, evaporation, and rock weathering. To understand and visualize the influencing aspects, Gibbs (1970) created a simple and effective graphic comparing TDS concentration versus the weight ratios of Na+/(Na+  + Ca2+), or TDS versus the weight ratios of Cl/(Cl + HCO3). The Gibbs diagram for the samples used in this study is presented in Fig. 4. Most of the samples are found in the rock weathering dominant region, indicating that water–rock interactions are the dominant natural mechanism in regulating groundwater chemistry. Evaporation has an impact on the groundwater too, while not a single sample showed precipitation dominance. The ratios of Cl/(Cl + HCO3) ranged from 0.041 to 0.892. Similarly, the ratios of Na+/(Na+  + Ca2+) ranged from 0.378 to 0.968 with an average of 0.776, indicating a strong cation exchange in the groundwater system (Gao et al. 2019).

Fig. 4
figure 4

Gibbs diagram representing dominant factors of groundwater quality

The water quality index (WQI) was employed to assess the overall status of groundwater quality in the study area and its suitability for drinking purposes. The weighted-sum approach of WQI makes it convenient to communicate the water quality information to diverse audiences as it assimilates all the parameters into a single numerical value. The WQI of individual sites are computed and are presented in Table 6. The WQI values are geospatially interpolated through ArcGIS to prepare the spatial distribution map, as presented in Fig. 5. It can be seen that 10 out of 28 sites have WQI above 100 and therefore, groundwater at these sites is unsuitable for drinking purposes. The groundwater at Chandu possessed the highest WQI (= 212.1), which is indicative of the highest pollution amongst all the sites. On the other hand, Kasan was the least polluted site with a WQI of 34.36. The majority of the study area has groundwater of just acceptable quality. The north-western portions of the Gurgaon district and southwestern portions of the Faridabad district are suffering from very high WQI values indicating very poor quality of groundwater, which must be avoided for drinking.

Table 6 WQI values of the 28 sites (WQI > 100 are unsuitable for drinking and are marked in bold)
Fig. 5
figure 5

Spatial distribution map of WQI over the study area

The inter-dependence amongst the 13 geochemical parameters over the 28 sites are assessed by determining their correlations and the results are presented in Fig. 6. Typically, a higher positive or negative value of r indicates a strong correlation; however, it is essential to assess the statistical significance or confidence of the correlation by p test. The statistically significant correlations (p value < 0.1) are represented by the asterisk marks (Fig. 6). The exact values of r and the corresponding p values are presented in Tables S3 and S4 of the Supplementary Information, respectively. It can be seen that the parameters mostly exhibited positive correlations amongst them. Particularly, the parameters viz., electrical conductivity, chloride, sulphate, calcium, magnesium, sodium, and total hardness showed significantly positive correlations amongst them. The concentration of these parameters in groundwater may be from the same source. The quality parameters from the same source are generally well correlated and vice-versa. However, these parameters showed a significantly negative correlation with pH. On the other hand, bicarbonates, nitrate, fluoride, potassium, and silica did not show any correlations, indicating their sources might differ.

Fig. 6
figure 6

Correlation (r) amongst the geochemical parameters in groundwater over the study area (the asterisk represents the correlation to be significant at 90% confidence level)

The 13 water quality parameters are also assessed for PCA, whose results are presented in Fig. 7. The individual and cumulative variances explained by each principal component (PC) are shown in Fig. 7a and b, respectively. Their exact values are also mentioned in Table S5 of the Supplementary Information. It can be observed that 95.72% of the variance is explained by PC1, whereas the percentage of variance explained by other PCs is much lower (Table S5).

Fig. 7
figure 7

The results of PCA in terms of a individual variance and b cumulative variance, explained by the principal components

Typically, the PCs cumulatively explaining 90% of the total variance are considered as that would lead to a reduction of dimensionality with minimum loss of information (Bahita et al. 2021b). In this case, the PC1 is able to capture almost the entire variance of the 13 parameters, thereby indicating a significant reduction in dimensionality. Therefore, the loadings of the original parameters for only PC1 are presented in Table 7. It is evident that the parameters viz., EC, chlorides, and total hardness have relatively higher loadings for PC1. On the other hand, the loadings of pH, bicarbonates, nitrate, fluoride, potassium, and silica are very low.

Table 7 Loadings of the originally correlated variables for PC1

Conclusions

This study assessed the groundwater quality over 28 sites from Faridabad and Gurgaon districts that have been subjected to rapid urbanization and industrialization in recent years. The concentrations of the water quality parameters (pH, EC, CO32−, HCO3, Cl, SO42−, NO3, F, Ca2+, Mg2+, Na+, K+, SiO2, and TH) are evaluated for hydrochemistry and human drinking suitability considering the permissible limits recommended by BIS (2012). The geospatial mapping of various water quality parameters is carried out to visualize their variations, which also helps to identify the contaminated regions. The number of sites exceeding the permissible limits of pH, EC, Cl, SO42−, NO3, F, Ca2+, Mg2+, Na+, K+, and TH is obtained to be 7, 15, 5, 6, 8, 3, 5, 7, 18, 3, and 8, respectively. The overall water quality of the sites was assessed by WQI, which revealed 10 out of the 28 sites to be unsuitable for drinking purposes. The analysis of correlation amongst the parameters and their significance revealed most of the parameters to be positively correlated except pH, which showed a negative correlation with other parameters. From PCA, the first principal component is found to explain more than 95% of the total variance. This study highlights the impacts of anthropogenic activities, i.e., rapid urbanization and industrialization, on the water quality of the two districts. In view of these findings of human health risk, this study also recommends that groundwater use for municipal water supply should be handled with utmost care. Moreover, this study has assessed only the physicochemical parameters; however, groundwater quality may also be deteriorated due to heavy metals, microbial pollutants, and other emerging contaminants. Hence, regular water quality monitoring is necessary to ensure safety of human health.