Assessment of water quality using multivariate statistics and geographical information systems (GIS) of Wadi Aldabab, Taiz, Yemen

The shortage of water resources in Yemen has an implication on the availability and supply of safe water in the country. This study assessed the water quality in Wadi Al-Dabab, Taiz, Yemen. Water samples were collected from the springs and different types of wells (tube, manual) with depths ranging from 9 to 500 m. Multivariate statistical analysis was performed on 15 water quality parameters (WQP) from 15 locations to investigate the significant WQP and the possibility of data reduction. The water quality index (WQI) approach was used to assess its suitability for drinking purposes. Four principal components were identified to be significant, which explained 86.2% of the overall variance while four varifactors (VF) explained 80% of the data variance. The findings showed the possibility of data reduction by 20%, which could assist in water quality monitoring at a reduced cost. The WQI map shows that the water quality is good in a limited area and poor to very poor in most of the study area. The findings are likely to assist in identifying the important WQP for the protection of the drinking water sources while the less important WQP can be excluded, which might reduce the cost of water quality monitoring. The proposed approach is likely to be a cost-effective approach for the economically weak and water-stressed countries, which can contribute positively for the sustainable water resource management in Yemen and other water-stressed regions.


Introduction
Over the past few decades, Yemen has experienced an increase in freshwater demand due to rapid population growth, leading to a widened disparity between the available water resources and demands (Hellegers et al. 2008). The groundwater resources in Yemen, particularly in the Sana'a basin, have shown declining trends in water levels and increase in salinity due to the reductions in water reserves caused by urbanization, industrialization, and irrigation activities (Taher 2016). Protecting and monitoring these resources are crucial to ensure safe water supplies (Hellegers et al. 2008). The anthropogenic activities in the arid countries including the industrial and municipal wastewater discharges, and surface runoff can cause constant or seasonal contamination of water sources, leading to a significant problem in many countries (Qadir., et al. 2008,;Singh et al. 2004;Alberto et al. 2001). Protecting water quality is essential to minimize water treatment costs and to keep water sources usable (WHO 2017;Canada 2017;EPA, 2018). However, the complex nature and uncertainty of water quality parameters (WQP) can pose significant challenges in controlling water quality (Elhatip et al. 2008). Regular monitoring and assessment of WQP are necessary to address these limitations (Prathumratana et al. 2008).
Multivariate statistical analysis is commonly used in water quality interpretation because of its capability to explain and interpret complex water quality data matrices, and identify possible factors that impact water quality, offering reasonable solutions to pollution issues and water resource management (Reghunath et al. 2002;Lee et al., 2001). In addition, multivariate statistical approaches have successfully assessed the spatial variation of water quality (Chowdhury and Husain 2020;Phung et al., 2015;Marinović and Ruždjak, 2015;Muangthong and Shrestha 2015;Olsen et al. 2012;Zhang et al. 2009a, b;Zeilhoferet al. 2006;Simeonov et al. 2003) and can be applied to characterize, Extended author information available on the last page of the article evaluate, and validate tempo-spatial variability of freshwater quality (Chowdhury & Al-Zahrani 2014;Singh et al. 2005). The hydrochemical and statistical analysis have been used in various studies to evaluate groundwater resources and their hydrogeochemical characteristics in the Arabian Peninsula including Saudi Arabia (Nazzal et al. 2015;Al-Omran et al. 2018), United Arab Emirates (Mohamed and Hassane 2016) and Yemen (Nasher and El-Sagheer, 2012;Nasher et al. 2013;Saleh et al. 2018).
The Geographical Information System (GIS) plays an essential role in environmental data management, enabling the presentation of multiple scenarios to managers and scientists to forecast spatial distribution and data trends, thereby avoiding impending environmental crises (Igboekwe and Akankpo 2011; Zeilhofer et al. 2007). In addition, GIS has essential applications in water resource management and pollution control, as demonstrated in various studies, including those conducted in Al-Howban Basin, Taiz-Yemen (Aqeel et al. 2017), El Khairat deep aquifer, Tunisia (Gebrehiwot et al. 2011), and Al-Ula Area, Saudi Arabia (Toumi et al. 2015).
Limited studies have been conducted in Taiz, Yemen where the current study area is located. Metwali (2003) evaluated Taiz city's physicochemical and bacteriological drinking water quality where the author reported that most of the water samples, especially from private wells, contained a high concentration of total coliforms. Al-Amry (2009) reported that groundwater in the Hidehran and Alburayhi basin, northwest of Taiz city, has excessive fluoride ion concentrations, which poses a high risk to humans due to its ingestion, and the majority of groundwater samples are oversaturated with respect to calcite and undersaturated with respect to fluoride. Ahmed Al-Shargabi (2015) concluded that the majority of Taiz city's groundwater is unsuited for drinking due to the high concentrations of various pollutants. Naser et al. (2020) assessed the intensity and spatial extent of fluoride in groundwater in the southern part of upper Wadi Rasyan, Taiz. The results showed that 71% samples exceeded the WHO guideline of 1.5 mg/l, with wide variation in fluoride content within the same aquifer and water type. Alansi et al. (2021) reported heavy metal contamination in groundwater around Al-Buraihi sewage station in Taiz, due to treated sewage, which is unsuitable for irrigation and can have adverse effects on plants, animals, and human life.
Despite the effectiveness of multivariate statistical analysis in interpreting the water quality data, no previous study applied this technique for water quality management in the cities of Yemen. To determine the loadings on the major principal components (PCs) and varifactors (VFs), and the pollution sources, principal component analysis (PCA) and factor analysis (FA) were performed to analyze 15 WQP (Potassium (K), alkalinity (T.ALK), sulfate (SO 4 ), calcium (Ca), magnesium (Mg), hardness (T.H), total dissolved solids (TDS), sodium (Na), iron (Fe), and pH). The GIS was used to generate a WQI map for Wadi Al-Dabab in Taiz, Yemen. Finally, the possibility of reducing the dimension of WQP was discussed.

Study area
The Wadi Al-Dabab study area is about 9 km southwest of Taiz city. Taiz city is about 100 km to the east of the Red Sea and 253 km to the south of Sana city, the largest capital city of Yemen (Fig. 1). Wadi Al-Dabab lies between north latitudes (1,502,000-1444000) and east longitudes (389,000-384000) and covers 117 square kilometers. The springwater, tube wells, and manual wells are the major water sources for the populations living in the Wadi Al-Dabab region. The aquifer of Wadi Al-Dabab consists of two porous layers resulting in the increased probability of water contamination by natural/anthropogenic activities. There are non-point pollutant sources that randomly disposed pollutants in the study area. Wastewater, industrial waste, municipal solid waste, oils, and structural waste are the primary sources of non-point pollutants. These sources affect the water quality in the basin, resulting in the elevated risks to approximately 24,140 people as they frequently use these sources.
The rocks in the study area are characterized as the rock units formed in the Cenozoic Volcanic Group. These were formed by volcanic activities that affected Yemen clouds during the Tertiary age as a result of the tectonic movements that formed the Red Sea and the Gulf of Aden. The rock units can be divided into three groups: • The basal volcanic rocks appear in the form of hills and plateaus on both sides, and in the middle of the fog valley, which are hard or fracture volcanic rocks made of basalt, scoria, rhyolite, andisite, and diabase • The granitic tectonic intrusions appear in the eastern part of the study area • 3-The alluvial sediments of the Quaternary age, consisting of boulders, gravel, sand, silt and clay, which were formed in the fan environments covering most of the streams of the river in the main valley and mountain heights.
As for the geological structures, the most important rifts appear through a major rift extending from east to west and a group of local rifts extending northwest-southeast perpendicular to the main rift (Fig. 2).

Sampling and sample preparation
The National Water Resource Authority (NWRA) in Sana, Yemen, reviewed previous technical studies on water quality for Wadi Al-Dhabab and Taiz city. These studies were conducted as the parts of the periodic monitoring program for water quality in the Taiz governorate. Based on these studies, 15 sampling points were identified in Wadi Al-Dabab for water quality sampling (Fig. 3). The NWRA adopted criteria for selecting water facilities to perform sampling based on hydrogeological and socioeconomic variables including comprehensive representation of water layers and reservoirs, different water sources and their qualitative behavior, sampling from polluted and residential areas, and consideration of different applications of water sources.
The NWRA took several measures to ensure the accuracy and quality of the field samples including visiting the sites, assigning two teams to take five samples per day over three days from the water fields, adhering to technical conditions and standards when taking field samples, and properly sterilizing and preserving the samples for laboratory analysis. Field electrical conductivity, pH, temperature, and turbidity measurements were also taken. The team followed standards for water sampling, including selecting standard plastic bottles, sterilizing tools and containers, and taking samples with experienced laboratory specialists. All laboratory instruments and parameters used in

Sample analysis
The WQP were analyzed in the laboratory of Taiz's General Authority for Water Resources. The laboratory specialists and field sample collectors collected five samples daily and conducted laboratory analysis immediately after collection. Both chemical and biological tests were performed on the samples, and field measurements were taken for each sample (electrical conductivity, pH, and temperature). Chemical elements, such as calcium, sodium, potassium, chloride, sulfate, nitrate, iron, and fluoride, were analyzed in the laboratory. In contrast, other elements were calculated, including total dissolved salts, total hardness, total alkalinity, magnesium, and bicarbonates.

Calculation and statistics
The study aimed to evaluate the water quality of a specific area using multivariate statistics and water quality index (WQI). The laboratory analysis involved both physical and chemical tests such as electrical conductivity, hydrogen ion concentration, temperature, calcium, sodium, potassium, chloride, sulfate, nitrate, iron, fluoride, total dissolved solids, total hardness, total alkalinity, and bicarbonate. PCA and FA were employed to identify the main factors influencing water quality using the chemical parameters obtained from the laboratory analysis. The WQI was calculated based on the physical and chemical parameters to provide a single value representing the overall quality of water. In summary, the study used a combination of sample collection, laboratory analysis, multivariate statistics, and the WQI to assess the water quality of the study area.

Scatterplots and pairwise correlations
The Scatterplots and pairwise correlations for the WQP in the Wadi Al-Dabab were investigated using the JMP and the

Principle component analysis (PCA)
Using PCA, the original correlated variables are transformed into new uncorrelated variables known as PCs. The most important parameters identified by PCA can be used to describe the majority of the data set (Helena et al. 2000). PCA can also reduce data dimensionality, resulting in fewer transformed variables that reflect the majority of data variance. (Jackson et al., 1991;Helena et al. 2000). The PC can be represented by the equation below: where Z = a component score, i = component number, a = component loading, x = a measured value of a variable, j = sample number, and m = number of variables. (1) The factor analysis (FA), which reduces the contribution of less important variables to create a new set of variables called varifactors (VF), will further simplify the transformed variables obtained from PCA. The VF includes hypothetical, unobservable, and latent water quality variables (Vega et al. 1998;Helena et al. 2000). The significant PCs are extracted using the normalized variables in PCA analysis to exclude the less significant variables' contribution (Bu et al. 2010;Zhang et al. 2009a, b). As a result, information derived from a considerably larger collection of original correlated variables can now be extracted from a smaller number of uncorrelated variables. The factor analysis (FA) can be represented by the equation below: where Z = measured variable, f = factor score, a = factor loading, e = residuals, m = the number of factors, and i = sample number. The PCs are considered significant if eigenvalues ≥ 1.0 (Chowdhury and Al-Zahrani 2014; Shrestha and Kazama 2007;Varol et al. 2012). This study

Water quality index (WQI)
The water quality index is a score that reflects the composite impact of various WQP on the overall water quality (Sahu and Sikdar 2008). The WQI is used to extract easily usable and understandable statistics from complicated water quality data. The selected WQP (e.g., pH, TH, TDS, HCO3, SO 4 , Cl, Fe, NO 3 , etc.) were assigned different weights (wi) based on their relative importance in the drinking purposes and according to their impacts on health to calculate the WQI (Sahu and Sikdar 2008). Equation (3) is used to calculate the relative weight (Wi) for each parameter: where wi = weight of each parameter, and 'n' = number of parameters. Then Eq. (4) is used to assign a quality rating scale (qi) for each parameter.

Results and discussions
Descriptive statistics Table 1 shows the summation of the anions and cations for checking the errors or discrepancies in the sample analysis, water sampling, or the data entry that have occurred (Rice et al., 2017). Since all potable waters are electrically neutral, the sum of the cations and anions must balance when expressed as milliequivalents per liter, which can be achieved if the difference is within 1.5-5% (Rice et al. 2017). In this study, the overall difference was calculated to be 3.2% (Table 2). The percentage difference was calculated using the following equation: The statistical summary of 15 WQP is shown in Table 2. The maximum likelihood methodology was used to estimate the statistical distributions, and the Chi-square and Kolmogorov-Smirnov methods were used to test it. The log-normal distributions were found to be the best fits for the WQP. Each parameter's minimum and maximum values were calculated. The spatial distributions of fifteen parameters were characterized. The conductivity(EC) values were in the range of 1160-9520 mg/l with higher concentrations in the northern part of the study region (Fig. 4a). Figure 4b shows that alkaline water was in most of the study area with pH in the range of 6.15-8.8, high only in the N1-040 well and low only in the N2-089 well. As seen in Fig. 4c, total hardness (T.H) concentration was high in most of the sampling region and ranged from 250 to 1998 mg/l. The presence of industrial waste caused an increase in total dissolved solids (TDS) concentration, which varied from 754 to 6188 mg/l (Fig. 4d). The Potassium(K) concentration, which ranged from 0.59 to 17.16 mg/l, was low in the whole study area except N1-045, N1-052, and N2-089 wells (Fig. 4e). The sodium(Na) concentration ranged from 39.1 to 1239.7 mg/l, with the highest concentration toward the north (Fig. 4f). The manganese (Mg) concentrations ranged from 24.08 to 348.4 mg/l, with the highest concentration in the northern part (Fig. 4g).
Similarly, the highest concentration of calcium(Ca) in the north part, and its concentration ranged from 60.12 to 405.2 mg/l over the whole study area (Fig. 4h). The bicarbonate concentration (HCO 3 ) was in the range of 57.34-830.82 mg/l and was high over the entire study area except N1-040 and N1-052 wells, which were within the allowable range. In the N2-103 well, HCO3 was under the permissible range (Fig. 4i). The spatial distribution of chloride(Cl) is shown in Fig. 4j, which ranged from 35 to 1617.7 mg/l in the study area, with the concentrations being highest toward the north. Sulfate (SO4) concentrations in the northern section of the research area ranged from 30 to 1710 mg/l over the whole site (Fig. 4k). Figure 2l shows the spatial distribution of nitrate (NO3) concentrations, ranging from 2.39 to 146.19 mg/l, with higher concentrations in the N1-052 and N2-089 wells. The iron (Fe) concentrations varied from 0 to 1.7 mg/l with higher values in the N2-139 well (Fig. 4m). The fluoride (F) concentrations were in the range of 0.1-3.46 mg/l with a low value in N2-103 and high values in the northern part (Fig. 2n). Figure 4o shows the spatial distribution of total alkalinity(T.ALK), which was in the range of 48.7-681 mg/l.

Water quality assessment
The water quality of Wadi Al-Dabab was assessed using the WQI. The relative weight of each WQP was estimated based on its relative importance (Table 3) (Gebrehiwot et al. 2011), and the WQI was computed for each well (Table 4). It should be mentioned that 26.7% of the samples fell into the  Figure 5 shows the WQI map of the study area, which was created using GIS. Accordingly, the water quality of Wadi Al-Dabab was classified as "Good" (17.2%), poor (66.7%, very poor (8.3%), and unsuitable (7.9%) drinking water source. , Cl, and Na, NO3 had very weak and weak positive correlations with all other parameters. SO4 showed similar very weak positive correlations with HCO 3 , and T.ALK; Cl with HCO 3 , and T.ALK; HCO 3 with K, Na, and Ca; T.ALK with K, Na, and Ca and very weak negative correlations between pH with all parameters except for NO 3 ; Fe with K; NO 3 with SO 4 , Cl, and Na. In general, a significant increase in Cl, Na, and SO 4 indicate that dissolved minerals are found in large part of the aquifer. Sodium, conductivity, chloride, and sulfate form strong cluster (Fig. 6), which reflect a possible common source for these ions. The strong correlation between calcium and sulfate and the moderate correlation between magnesium and sulfate indicate that calcareous magnesium materials exist in the study area (Singh et al. 2011). SO 4 and K have a strong correlation indicating that their input sources might be similar (Nair et al. 2018). The findings show that calcium and magnesium bicarbonate add alkalinity to water. It was found that WQP had nonlinear patterns such as total alkalinity with conductivity, nitrate, and TDS. As a result, these nonlinear relationships require further investigations to understand better the data correlations and their effects on water quality.

Principal component analysis (PCA)
The number of significant principal components (PCs) was calculated using an eigenvalue ≥ 1.0. (SAS Inc 2018). This study found four significant PCs explained 54.2%, 15%, 10.2%, and 6.9% of the variance. According to the findings, the first four PCs can explain 86.2% t of the total variance in WQP in Wadi Al-Dabab.
To determine the significant WQP, it is necessary to understand how WQP contributes to the major PCs. This study used a threshold value of 0.55 to investigate the significant loadings of WQP on the major PCs. This threshold value suffices to recognize between small and large loadings (Chowdhury and Al-Zahrani 2014;Chowdhury and Husain 2020). Figures 7 and 8 show the loading and scatterplots on the significant PCs for Wadi Al-Dabab. F, SO4, Cl, K, Na, Mg, Ca, TH, TDS, and EC were determined to be higher than the 0.55 threshold on PC1 (Fig. 7), resulting in significant loadings in PC1. HCO3 and T.ALK had significant loadings on PC2. In PC3, Fe was assigned significant loadings. NO 3 had significant loadings on PC4. Figure 8 presents the biplots of the four significant PCs. According to the PC1x PC2 plot, HCO3 and T.ALK were only significantly related with PC2, but F, SO4, Cl, K, Na, Mg, Ca, T.H, TDS, and EC were all substantially associated with PC1. One or two metrics from each cluster may be necessary to represent the water quality. As shown in Fig. 6, a strong cluster with correlation coefficient (r) in the range of 0.67-1 was formed by F, HCO 3, and T.ALK. Also, Na, Cl, TDS, EC, Mg, and T.H formed a strong cluster with r in the range of 0.9-1. The four PCs were mainly associated with F, SO 4 , K, Cl, Ca, Mg, Na, T.H, TDS, EC, HCO 3 , T.ALK, Fe, and NO 3 . Excluding pH, fourteen WQP Table 5 Correlation of different parameters

Factor analysis (FA)
FA was applied on four major PCs to obtain VFs (Table 6). According to absolute loading values, the factor loadings were classified into strong (> 0.75), moderate (0.75-0.50), and weak (0.50-0.30) (Lin et al., 2003). SO 4 , Na, Cl, Mg, Ca, TDS, T.H, and EC had strong positive loadings on VF1, which explained 54.2% of the overall variance, while F and K had a medium loading on VF1. VF2 explained 15% of the overall variance and had strong positive loadings from T.ALK and HCO 3 , while VF3 explained 10.2% of the overall variance and had strong negative loadings from Fe. VF4 had substantial positive loading from NO 3 and explained 6.86 6.86% of the total variance. To evaluate the possibilities of data reduction, the loadings of the original parameters on the major VFs were assessed. If the original parameter's loading on the major VFs is higher than 0.7 or 0.75, it contributes significantly to temporal variation ( Varol et al. 2012;Liu et al. 2003 Table 4). TDS can be selected as the significant parameter since TDS, EC, Cl, H.T, Mg, and Na are all strongly correlated, as demonstrated in Table 5. As a result, a total of twelve parameters (SO 3 , Cl, Na, Mg, Ca, T.H, TDS, EC, HCO3, T.ALK, NO 3 , and Fe) are required to explain 86.24% of the data variance (nearly 80% of the 15 parameters) (Table 4) (Liu et al. 2003). The linear/nonlinear relationships can help to reduce the number of WQP further. Table 7 shows examples of nonlinear relationships for selected parameters, which can be applied for more parameters resulting in a further reduction for WQP.

Conclusions
Based on the assessment of WQP in Wadi Al-Dabab, it is evident that the area is highly susceptible to pollution, both from natural and anthropogenic sources. The two layers of water aquifers are vulnerable to contamination with the shallow free water table being directly affected by pollutants due to its unique properties. On the other hand, the deep confined aquifer composed of fractured rock is dependent on fractured zones for water production and has limited wells, making it less important than the shallow water table. However, both layers are potential targets for pollution, and the area is vulnerable to contamination from various sources including household waste, industrial waste, and construction waste. Sources of pollution in Wadi Al-Dabab include wastewater and industrial waste that reaches the area from Hazran, mixed with sewage water that flows into the area. Used oils, construction waste (e.g., saws), and other waste from car workshops in liquids and solid phases reach the area from Hazran. Household waste that is present on the banks of the Wadi Al-Dhabab and has no general drainage is disposed of by digging wells in the valley.
The study analyzed the spatial variability of WQP in the area and showed that water quality in Wadi Al-Dabab is deteriorating, with most of the water not suitable for drinking purposes unless treated. The chemical analysis revealed that the parameters such as chloride, nitrate, manganese, iron, hardness, bicarbonate, conductivity, total alkalinity, calcium, sodium, total dissolved solids, and sulfate exceeded the WHO and Y.S limits in at least one well. Moreover, the WQI map indicates that 66.7% of the study area had poor water quality, 8.3% very poor, and 7.9% unsuitable for drinking. In light of these findings, effective source protection measures and necessary monitoring plans should be implemented to prevent further deterioration of water quality. The study emphasizes the importance of regular monitoring of WQP, especially in areas prone to pollution. The results can provide useful information to decision makers and local communities in developing effective management strategies to preserve the water resources in Wadi Al-Dabab.