Multivariate and spatial methods-based water quality assessment of Chu Tran Valley, Gilgit Baltistan

Current study was performed to evaluate the physico-chemical, metals and microbial characteristics of the surface water available in Chu Tran valley located in sub-district Shigar of district Skardu, Gilgit Baltistan, Pakistan. A total of 24 water samples were collected and analysed to determine the water quality index (WQI). Multivariate analysis comprising principal component analysis (PCA) and spatial distribution using inverse distance weight (IDW) interpolation were also employed to ascertain the water quality available in the valley and public health concern assessment. The results of WQI comprehended that physico-chemical characteristics of the water samples are excellent. However, the concentration of metals in water samples is higher than recommended WHO standards and public health quality of water supply is not satisfactory; therefore, the water in the valley is unfit of human consumption. Multivariate analysis with PCA technology identified important water quality parameters and revealed that metals and microbial concentrations are major later factors which have significant influence on the water quality. IDW-based spatial distribution indicates that water samples collected from the central part of the valley are highly contaminated with metals and microbial load. This is the region where the major human settlements are located and agricultural activities, domestic dischargers and erosion are the fundamental sources of water pollution. People have no choice except to consume the contaminated water as no other water supply is available and hardly question about the water quality. The study also proved that combination of WQI, PCA and IDW is effective and promising tools for surface water quality assessment in other areas in order to get accurate results for public health monitoring. It is recommended that the sources of contaminations can be further explored to reduce the pollution load of the surface water of Chu Tran Valley that might be helpful in the promotion of sustainable ecotourism.


Introduction
Water quality monitoring and assessment is an important tool to safeguard public health quality and to conserve water resources (Yasin et al. 2020). However, in general the drinking water quality in developing countries is poor particularly in rural areas and the major source of multiple waterborne diseases (Li & Wu, 2019). Therefore, it is essential to assess and monitor the water quality of the available water resources which are primarily used for drinking purposes (Şener et al. 2017).
Sixth (out of 17 goals) United Nations Sustainable Development Goal (UN-SDG) clearly emphasizes the importance of access to clean water and sanitation. Unfortunately, like many other developing countries, the public health quality of drinking water is still being compromised in Pakistan (Azizullah et al. 2011). Across Pakistan, geospatial assessment of drinking water quality has been conducted at multiple geographic areas including Punjab (Bashir et al. 2020), Sindh (Bhatti et al. 2020), Balochistan  and Khyber Pakhtunkhwa (Javed et al. 2019). Similarly, water quality in Gilgit Baltistan (GB) has been assessed in various valleys and cities (Ali et al. 2013) and in Skardu springs (Ahsan et al. 2021;Farhat et al. 2021); however, studies comprising water quality index (Ahsan et al. 2021), geospatial assessment ) and statistical analysis can be rarely found in GB.
Water quality analysis, monitoring and assessment have been widely applied by researchers using data-driven multivariate statistical tools including cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminate analysis (DA). All of these methods are easy to apply and extract valuable characteristics of water quality datasets and to identify and monitor pollution (Duan et al. 2016). In recent past, the advancement in water quality assessment has been modified using combination of multivariate statistical tools with environmental remote sensing (ERS) and geographic information system (GIS) (Duan et al. 2021). The implications of spatial and temporal trends estimation of pollution in the water resources have helped in frequent monitoring of data (Duan et al. 2013) with application of GIS using interpolation by inverse distance weight (IDW), kriging and spline (Yang et al. 2020). These methods are most commonly applied in the field of hydrology science to predict the target pollution variables in water resources, and therefore, the assessment of the distribution of water quality parameters can provide valuable information of large-scale areas using limited data (Rostami et al. 2019). Unfortunately, uncertainties in the data sources must be considered while using interpolation tools. Therefore, estimation of observational uncertainties is critical to explore the contribution of data and methodological errors (Duan et al. 2013). These kinds of limitations can be strongly overcome by application of data transformation, rescaling tools, skewness and kurtosis analysis (Weber et al. 2007) during statistical modelling and by applying hybrid interpolation tools for better comparison (Shahbeik et al. 2014).
Therefore, this study has focused on the water quality assessment on Chu Tran Valley which is located in subdistrict Shigar of district Skardu in GB. This study has employed advanced statistical and spatial tools with the objectives to discover the primary sources of contamination in drinking water available in the valley. The study further delivers outcomes to achieve SDG 6 Clean Water and Sanitation by giving a call for action for water quality improvement in the remote and northern areas of Pakistan where water resources are already scarce, poor in terms of quality and a serious risk for public health.

Study area
Chu Tran Valley is located in District Shigar of Baltistan region. The valley stretches from 35° 27' to 35° 45' N and 75° 15' to 75° 48' E along Shigar River comprising villages,namely Zing Zing,Tisar,Ligup,Thandoro,Kashmal and Sildi at the eastern side of Indus River. Across entire valley, multiple glaciers including mighty Baltoro Glacier that feds the Indus and Shigar rivers are located on the either side of the valley. Owing to the presence of a hot water (thermal) springs (Zaigham et al. 2009) which have medicinal importance and therapeutic potential (Farhat et al. 2021), the name of the valley "Chu Tran" is based on the combination of two native language words "Chu" means "water" and "Tran" means "hot". Locally, Chu Tran Valley is also known as Tisar that comprises of very little human population (450-500 members). The lofty and harsh mountains of the area are representing the Karakoram mountains range (Farhat et al. 2021). The valley is accessible by Shigar valley road which connects this region with Gilgit-Skardu road and Braldu Valley road. The winters are long and harsh with a very short summer season. This short season provides a very little time for agriculture and the growth of vegetation (Seong et al. 2007). The community of this region strongly followed traditional Balti culture, particularly in housing infrastructure, agricultural and farming practices, cooking, dressing and sports (Abbas et al. 2017).

Water sampling
From Chu Tran Valley, 24 water samples were deterministically collected using random sampling approach during October 2020 (Fig. 1). Since this study aims to monitor water quality for public health, the water samples were collected from those drinking water sources which are commonly used by human settlements in the valley and on the basis of site accessibility along the entire Chu Tran Valley. The collected samples were stored in sterilized glass bottles and stored in ice box prior and then transported to the laboratory of Institute of Environmental Studies, University of Karachi. A comprehensive overview of workflow methodology is presented in Fig. 2.

Physico-chemical analysis
pH, turbidity, salinity and total dissolved solids (TDS) were determined on site. Turbidity in the water samples was measured using Eutech Meter (Model No. TN-100) while Hach Lange sensION 156 Multiparameter Device was used to measure pH and salinity. Gravimetric and argentometric methods were employed for TDS and Chloride (APHA, 2005). Sulphate was ascertained by gravimetric method while hardness was determined by EDTA titrimetric method. For nitrate and total phosphate, brucine-reagent and ascorbic acid methods were employed, respectively. Standard Methods for the Examination of Water and Wastewater were used for the analysis of above-mentioned parameters (APHA, 2005).

Microbial analysis
The microbiological parameters were examined in the water samples including total coliforms count (TCC), total faecal coliforms (TFC) and total faecal streptococci (TFS). Single and double strength lactose broth (Merck, Germany) was used for TCC while EC medium (Merck, Germany) was used for the determination of TFC. TFS were estimated by using sodium azide broth (Mallmann & Seligmann, 1950). Most probable number (MPN) technique was employed to determine the bacterial load in the water samples (APHA, 2005).

Descriptive statistics
Descriptive statistics of the parameters including arithmetic mean and mode was performed. Skewness and kurtosis were also calculated to analyse the normality of the data (Mustapha et al. 2012). The results of descriptive statistics were obtained using SPSS v22 (IBM Corp., 2013).

Water quality index (WQI)
Current research studies on water quality assessments are widely applying the water quality index (WQI)-based models to assess the nature of water-based rating of several parameters. The objective of WQI is to comprehensively represent a complex water quality data to a most simplified non-dimensional index. WQI also gives the advantage to perform aggregation functions-based analysis across spatiotemporally distributed parameters resulting in a single value (Uddin et al. 2021). WQI was calculated for the water samples as per WHO Guidelines (WHO, 2011) and methodology adopted from various studies (Ketata et al. 2012;Sahu & Sikdar, 2008;Şener et al. 2017;Shabbir & Ahmad, 2015). Depending upon the importance of each parameter for human health, all parameters including physico-chemical, metal and microbial variables were assigned a weight ( w i ) from 1 to 5 for water quality index evaluation (Şener et al., 2017). In this study, the highest weight of 5 was assigned to As, Pb, F − and microbial parameters since these parameters highly influence the human health (Table 1). Using a series of three equations (Ketata et al. 2012), WQI was calculated as following (Eq. 1-3): where W i is relative weight, w i is weight assigned to each parameter and n is the number of parameters observed, Q i is the water quality rating scale for each of the observed water quality parameters, V o is the observed level of each parameter and V s is the WHO threshold level for each parameter. Based on the above calculations, the resulting values of WQI were classified into five sets (Ketata et al. 2012;Shabbir & Ahmad, 2015) as shown in Table 2.

Multivariate analysis
Principal component analysis (PCA) is a most commonly used multivariate analysis method which determines the dynamics of all parameters observed for a system having aim to reduce the dimensionality of multivariate data by providing the useful information in small components. The resulting principal components contain all the typical characteristics of the system (Abdi & Williams, 2010;Gorgoglione et al. 2019).
Therefore, to monitor the water quality available in Chu Tran Valley and to analyse the most influencing water quality parameters and relation between them, PCA is applied on the obtained results to get useful information. PCA was applied on the water quality results followed by cluster analysis (CA) based on unweighted pair group ordination using the Euclidean distance (Alamgir et al. 2015;Yang et al. 2020). A series of algorithms were applied to obtain original data matrix (Eq. 4), standardized data after dimensionality reduction (Eq. 5), correlation coefficient matrix (Eq. 6), eigenvalues and eigenvalues (Eq. 7) and finally the principal components (Eq. 8), as shown below: where X is the output of data matrix; i and j are the sample location and variable data value, respectively; n and p are number of sampling sites and water quality parameters, respectively; x ij and x * ij are originally measured data and standardized variable (Yang et al. 2020), respectively; x * i is the standardized indicator variable; x j is the average value for jth indicator; s j is standard deviation of jth indicator; R is the correlation coefficient; F i is the principal component; corresponds to the variance of the principal component; and i and u i (i = 1, 2, … , n ) are eigenvalues and eigenvectors, respectively. All multivariate statistical analysis results were obtained using SPSS v22 (IBM Corp., 2013), OriginPro 2020b software package (OriginLab Corporation, Northampton, USA) and R Studio (RStudio Team, 2020) environment.

Spatial distribution by inverse distance weight (IDW)
Spatial distribution methods for water quality mapping by interpolation aid to determine the values of unknown (unsampled) points based on the weighted measures and proximity centred assumptions that closer points are more alike as compared to the points located comparatively far away. The most commonly used interpolation methods based on geo-statistical techniques are kriging and inverse distance weight (IDW). Kriging can be further classified into simple, ordinary and universal kriging and involves the application of the weights to the known or measured values on the basis of spatial orientation of the measured locations (Elumalai et al. 2017). Unlike kriging, IDW is only dependent on the proximity of the known (sampled) points based on the principle that closer samples points have greater influence on the unsampled location by applying linear-weighted combinations (Haldar et al. 2020;Nistor et al. 2020). Therefore, IDW is performed for this study as following (Eq. 9 and 10): where z is unknown value for interpolation; z i is ith data value of sampled location; n is the number of sampling points; x i is the weight for IDW analysis; d i is horizontal distance between the observed and interpolation points (Yang et al. 2020); and p is the power of distance. ArcMap 10.8.1 (Esri, 2020) Interpolation tool from Spatial Analyst ArcToolbox was used to perform IDW analysis for all parameters.

Results and discussion
This study has performed the water quality assessment based on physico-chemical, metals and microbial parameters in Chu Tran Valley water samples using statistical and spatial tools including WQI, PCA and IDW interpolation.
The descriptive statistics of all parameters are presented in Table 1 along with WHO Guidelines (WHO, 2011) and parameters for WQI calculation.

Water quality status
The results of the physico-chemical parameters of the water samples were within the prescribed WHO guidelines ( Table 1). The mean pH value of the water samples was 7.0-7.4 (7.17 ± 0.1). Similarly, water quality studies on other areas are located near the Chu Tran Valley such as Sultanabad Stream also reported a mean value of pH as 7.4 which depicted that the water is slightly alkaline in the region (Begum et al. 2014). Turbidity in water samples of Chu Tran Valley was found within the range of 0.18-0.63 Nephelometric Turbidity unit (NTU) having mean value of 0.39 ± 0.13 NTU, indicates that the results of the turbidity are following the WHO guidelines (< 5 NTU). Comparatively, the results of the turbidity from a previous study were about 17.1-96.0 NTU found in Danyore village of Gilgit (Shedayi et al. 2015). In terms of salinity, the water samples have salt content ranging from 0.1-0.65‰ with a mean value of 0.26 ± 0.14‰ which is following the WHO standards (1.2‰). However, this value is relatively higher as compared to the water samples of the Gilgit city (0.015-0.025‰) (Shedayi et al. 2015). Phosphate was also detected in the range of 0.86-1.76 mg/L corresponding a mean value of 1.37 ± 0.22 mg/L which is comparatively higher than the mean value of total phosphorus (0.0483 mg/L) reported in a recent study of GB region (Islam et al. 2021). On the basis of mean values obtained, the trend of metals concentration recorded in this study was As > Ca > Mg > Mn > Zn > Fe > Cu > Pb > Mo > Fl. Arsenic, calcium, magnesium and fluoride were detected within WHO guidelines limit (Table 1). However, the results revealed that other metals, including heavy metals, are found in much higher concentrations as compared to WHO guidelines (WHO, 2011). On the basis of results of current and previous studies (Baig et al. 2019;Lodhi et al. 2003;, the continuous uptake of these heavy metals through the use of drinking water is a potential threat to the public health. In terms of microbial parameters, all the samples are heavily contaminated (> 3 MPN/100 mL) with the organisms of the public health importance as shown in Table 1. Bacteriological contamination of the water resources in the GB region is also previously reported which supports the fact that faecal contamination by the domestic sources is the most common source of water pollution in GB region (Ali et al. 2013;Islam et al. 2021;Nafees et al. 2014). This may also be due to the fact that sewerage system is hardly available in the region including the Chu Tran Valley. No water treatment facility is available in and around the valley. The water resources are open and contaminate through the waste of both human and animal origin. This is one of the pertinent facts that the people suffer from a number of water-borne diseases in the valley.

Water quality index
In this study, WQI has been computed in three groups including physico-chemical, metals and microbial parameters in 24 water samples (n = 24). Parameter's weight involved in the computation of WQI is illustrated in Table 1. WQI indicates that 91.67% (n = 22) of the water samples are excellent in terms of physico-chemical characteristics followed by only 8.33% (n = 2) which are good. Contrary to it, all water samples (n = 24) are found contaminated with metals and therefore unfit for drinking purpose or human consumption. In terms of microbial characteristics, only 41.67% (n = 10) of the samples are good and rest of the 58.33% (n = 14) are found unfit for human consumption (Table 2 and 3). A study also reported that WQI of northern areas of Pakistan (District Gilgit and Ghizer) was not satisfactory particularly due to anthropogenic sources of pollution (Sohail et al. 2019). On the basis of WQI, the drinking water available at Chu Tran Valley is partially suitable for human consumption with the risk of the health impacts due to the presence of metals and high bacterial load.

Principal component analysis (PCA)
After data standardization, the results were subjected to PCA in order to reduce the dimensionality in the data and to find the most influencing factors. The results of PCA indicate that the influence of parameters can be described by eight principal components; however, the first three are the most significant ones having eigenvalues 5.835, 3.290 and 2.586, respectively. The first component governs almost 26.5% of the total variance followed by 14.9% and 11.7% of the variance for the second and third component, respectively ( Table 4).
The results of variable loadings show that first component is principally accounted for the physico-chemical factors enlisting the variables including salinity, TDS, chloride, hardness, sulphate and phosphate. Contrary to it, the second principal component governs with the metals including arsenic, calcium and magnesium, whereas the third principal component predominantly deals with the microbial parameters such as TCC and TFC. It was observed that zinc has major influence for the fourth principal component and pH and turbidity have also shown the influence for the fifth principal component. Comparatively, the contribution of the parameters involved in fourth and fifth component is not much influencing as that of first to third. The results also indicate that rest of the parameters including nitrate, copper, lead, iron, manganese, fluoride and molybdenum have shown minor influence on the first three principal components (Fig. 3).

Cluster analysis (CA)
Contrary to WQI, which showed that water quality of the Chu Tran Valley is satisfactory in terms of physico-chemical characteristics, the results of PCA indicate that the highest influence on the water quality of the valley is due to the physico-chemical parameters. Therefore, the cluster analysis was performed to observe the individual sample on the basis of its location. As divided into two groups (Group A and B), most of the sampling locations fall within Group B, whereas Group A only comprise 4 sampling sites. It was observed that B1 is the largest subgroup comprising most of the sampling locations. It indicates that the sampling points located in the central part of valley are highly ordinated with each other (Fig. 4). WQI further confirms that most of the centrally located sampling points have almost similar characteristics in terms of metals and microbial characteristics, thus depicting that physico-chemically, the water quality is excellent (Table 2).

Correlation (R) analysis
The correlation analysis was also separately performed for physico-chemical, metal and microbial parameters. Mean values of these parameters were added as an input in OriginPro 2020b software package (OriginLab Corporation, Northampton, USA). The correlation output among physico-chemical parameters showed that salinity, TDS, chloride, hardness and sulphate are positively correlated with each other; however, nitrate and phosphate have shown negative trend with other parameters as well as with each other. The highest positive correlation was observed between salinity and TDS (R = 0.96) (Fig. 5). Among metals, the highest positive correlation (R = 1) was observed between calcium and magnesium while rest of the metals does not show significant correlation among them (Fig. 6). All microbial parameters have shown strong positive correlation among themselves (Fig. 7).

Spatial distribution
IDW-based interpolation for spatial distribution of the physico-chemical, metals and microbial parameters found in the Chu Tran Valley water samples showed a non-uniform pattern throughout the valley. Higher pH values are detected in the central part of the Chu Tran Valley, whereas turbidity is high in the water samples collected from northern areas of the valley. Similar pattern of spatial distribution is observed for salinity and TDS, whereas chloride, hardness and phosphate are found in elevated concentration in different areas  (Fig. 8). Among metals, copper, lead, iron, zinc, manganese and molybdenum are spatially distributed in northern and central part of the valley in very high concentrations. Comparatively, higher concentrations of arsenic, calcium, magnesium and fluoride are only confined for fewer sampling sites (Fig. 9). It was observed that all sites are contaminated in terms of microbial contamination; however, the scenario of TFC is at worst. Almost all the sampling sites are highly contaminated which is an alarming situation for the public health (Fig. 10).

Contamination sources and public health concern
As far as public health concern is appraised, the health impacts of each parameter should be determined in comparison with the WHO guidelines (WHO, 2011). As indicated by WQI, the water quality of the Chu Tran Valley is excellent in terms of physico-chemical characteristics, as the mean values of all parameters are within WHO recommended limit. Therefore, no health concerns are anticipated due to physico-chemical parameters. However, these are the most critical group of parameters since salinity, TDS, hardness and chloride have the largest influence on the water quality as indicated by the results of PCA. Regular monitoring should be done to address the contamination issue at the earliest.
In contrast to it, the mean values of copper (0.35 mg/L), lead (0.28 mg/L), iron (0.38 mg/L), zinc (1.27 mg/L), manganese (2.08 mg/L) and molybdenum (0.05 mg/L) are not following the WHO recommended threshold levels (0.2 mg/L for copper, < 0.01 mg/L for lead, 0.3 mg/L for iron,1.27 mg/L for zinc, < 0.05 mg/L for manganese and 0.01 mg/L for molybdenum, respectively) in the water samples of the valley. The elevated concentrations of these metals are likely to cause health problems due to known health impacts upon consumption as a source food, drinking or dermal contact. Since the source of water is usually the surface and river water available in the area, including Shigar and Braldu rivers, the contamination of metals can be occurred through natural processes such as natural weathering of rocks (Huang et al. 2020) and anthropogenic sources particularly from domestic activities, air pollution due to transport, and other sources (Baig et al. 2019). Therefore, consumption of the contaminated drinking water by the metals, including heavy metals, can be a risk for the public health and the consequences can be in terms of neurotoxic, carcinogenic and cardiovascular impacts (Alamgir et al. 2019). The human health impacts of the consumption of high concentrations of metals are well determined and multiple studies have reported the adverse effects on humans. The prolonged exposure to the copper through drinking water may cause anaemia, liver dysfunction and kidney issues (Mustafa et al. 2017) while the exposure to lead may affect the learning capacity, memory loss and nervous system dysfunction. Slight variation in the exposure of these trace and essential metals, such as zinc and manganese, may also result in the form of psychological (Radwan & Salama, 2006) and neurological disorders (Iyare, 2019), respectively. Long-term exposure to high concentration of molybdenum is associated with anorexia, joint pain, tremor and loss of appetite (Smedley et al. 2014).
All the water samples collected from the Chu Tran Valley are found contaminated with respect to the organisms of public health importance. The major reason of this cause is the nonexistence of sewerage system in the valley. Moreover, water treatment facility is also not available. Similarly, no mechanism is in place regarding the monitoring of water supply. Essentially, the domestic waste is a potential source of heavy microbial load of water supply available in the valley. The domestic waste is generally treated through soak pits methods and ultimately, it discharges indiscriminately to the open fields. The entire Gilgit Baltistan region is currently facing this issue. Furthermore, the current climate change scenario with increasing high temperature and intense heat waves is also responsible for the proliferation of infectious diseases (Majeed et al. 2020).

Conclusion
In this study, combination of WQI, PCA and IDW methods were used to determine water quality of Chu Tran Valley. WQI has presented the water quality of the valley on three bases: (1) water quality is excellent in terms of physico-chemical characteristics, (2) heavy metals are present in the high concentration making the water quality unsuitable for drinking purpose, and (3) drinking water available in the area is highly unsuitable in terms of microbial load and can be a potential source of health risk for the public. PCA results confirms that although physico-chemical characteristics of the water are satisfactory, the influence of physico-chemical parameters and heavy metals is most influencing on water quality making it a critical factor to determine the overall water quality of the valley. IDW-based spatial analysis further confirmed that public health quality in the central part of the valley is at risk due to contaminated drinking water. The study recommends to perform regular and frequent water quality monitoring to investigate the pollution sources as well as to ensure the public health quality in the area. Moreover, the water treatment services should be established in the area so as to ensure the potable water supply available to the people.