1 Introduction

The available freshwater resources to fulfill the water supply demands cover a small percentage of the world’s water resources. A significant part of the freshwater is provided by groundwater [1]. Most people worldwide rely on groundwater for drinking and agricultural purposes [2, 3]. The utilization of groundwater for domestic and agricultural purposes has raised its demand globally since it contributes to economic and social stability [4]. Due to the expansion of civilization, the groundwater quality has worsened due to natural and human-induced activities, which significantly negatively affected the hydrogeological environment and human health [5, 6]. Natural contamination is caused by inorganic substances in rocks and soils, while man-made contaminants are released into groundwater by sewage disposal, agricultural runoff, and excessive use of fertilizers [7, 8]. More than 52% of Khartoum’s overall water supply is provided by groundwater, and the Nile River supplies the remaining demands. In Khartoum state, people rely heavily on groundwater for drinking and agricultural purposes since Nile River water is quite turbid and pumping surface water to agricultural fields is expensive. Groundwater quality deterioration due to extensive groundwater abstraction is brought on by the lack of drinking water delivery infrastructure [9]. In order to maintain and secure water resources sustainably, it is crucial to continuously evaluate and monitor groundwater quality [10]. In this study, groundwater quality index, multivariate statistical methods, and human health risk assessment were employed to assess the suitability of groundwater for drinking purposes in northern Khartoum state.

The term groundwater quality index (GWQI) refers to a numerical value that can be calculated to characterize and classify groundwater using physical, chemical, and biological parameters dissolved in groundwater [11]. The classification of water quality using GWQI is more efficient than conventional approaches in which the detected parameters are only compared to a given water quality standard [12]. Besides that, the traditional methods are costly and time-consuming. The chemical composition of groundwater strongly impacts GWQI since the major ions significantly impact groundwater quality. In the past few decades, several water quality indices were developed, and the selection of an index is based on the intended outcomes and the groundwater input characteristics [13,14,15]. The most effective way to describe the change in groundwater facies is by using GWQI combined with geographical information systems (GIS) [16]. Recently, GIS has become a more effective tool for monitoring and developing groundwater resources. Thus, the investigations conducted using GIS combined with hydrochemical methodologies to identify the sources of groundwater pollution are beneficial for monitoring groundwater quality [15]. Numerous studies on using GWQI and GIS for groundwater quality assessment have been reported in the literature [17,18,19,20].

The GWQI, however, is constrained and cannot identify the sources of contamination [21]. For this reason, multivariate statistical methods were applied. Effective analysis and evaluation of the physicochemical parameters of groundwater can be conducted using multivariate statistical methods [22, 23]. Multivariate statistical approaches such as correlation analysis (CA), hierarchical cluster analysis (HCA), and principal component analysis (PCA) have been utilized to provide information on similarities between groundwater samples based on their chemical characteristics and also to deduce the factors that govern groundwater quality [24]. The depictions of groundwater quality are mainly controlled by factors such as diversity in lithology, geologic mechanisms, and human activities [25]. The comprehensive assessment of groundwater quality requires studying the effect of these factors, and statistical methods are a powerful tool to perform this. Application of multivariate statistical methods for groundwater quality evaluation and identification of the natural and artificial sources of groundwater pollution is widely reported in the literature [26,27,28,29,30].

The methods mentioned above may indicate the amount of contamination in groundwater; however, they cannot reflect any possible risks that the contaminants pose to human health. Therefore, the concept of health risk assessment has steadily evolved and become popular in recent years. By defining the link between groundwater contamination and human health using the toxicity data of the pollutants, the primary goal of health risk assessment is to determine the possible human risk level and provide recommendations for the prevention and control of pollution. This technique has been used frequently to evaluate potential negative health effects brought on by exposure of humans to polluted water [2, 31,32,33,34,35]. Nitrate contamination of groundwater has a detrimental effect on human health. Several researchers have investigated nitrate contamination as one of the world’s most complicated issues, especially in agricultural arid and semi-arid areas. It was reported that consuming polluted water results in 485,000 fatalities from diarrhea yearly [36]. Therefore, it is essential to examine groundwater quality and quantify the health risks associated with leachate pollution [37]. To ensure that pollution sources can be appropriately regulated, it is also necessary to fully understand the hydrogeological and hydrogeochemical properties of the aquifers system.

Several hydrochemical studies are conducted in northern Khartoum state to evaluate the suitability of groundwater for drinking purposes [24, 38, 39]. However, these studies are based on comparing the concentrations of the physiochemical parameters to the local and global standards without giving due account to the sources of groundwater contamination and human health consequences resulting from consumption of the polluted water. This research aims to comprehensively evaluate hydrochemical characteristics and non-carcinogenic human health risks using an integrated approach, including GWQI, multivariate statistical methods, and human health risk assessment (HHRA) models. The hypothesis implies that through the calculation of a water quality index and application of multivariate statistics, it is possible to assess the overall groundwater quality based on various parameters, simplify the complex hydrochemical data, and identify the key factors that contribute to the overall groundwater quality and assess their potential implications for human health risk. This research presents a novel framework that offers a more nuanced understanding of groundwater contamination and enhances decision-making processes to protect the vital resource for future generations.

2 Methods and materials

2.1 Study area

The study area is situated in the eastern Nile River, northern Khartoum state, Sudan (Fig. 1a). The area covers about 389 m2 and is bounded to the west by the Nile River (Fig. 1b). The region is situated in the northern Savanna belt, characterized by a hot climate and low humidity, with annual precipitation varying from 100 to 200 mm [40]. Approximately 80% of the population resides near the Nile River, and the population density gradually decreases to the east and west of the Nile River. The study area is agricultural land where farming is the main source of livelihood of the resident. Farms spread along the Nile River and the central part of the study area. Generally, the region is considered flat, with an average elevation of 340 m above mean sea level (a.s.l). The elevation gradually increases to 440 m in the eastern part of the area (Fig. 1c). The drainage system is dominated by the Nile River and local networks of ephemeral streams such as Wadis and Khors, which are mostly east–west flowing. Northern Khartoum state consists of three main geological units, from oldest to youngest, as Precambrian crystalline rocks, Cretaceous Nubian formation, and recent Quaternary deposits [41]. Figure 2 shows the geological map of the study area in which the main geological units are presented. The Precambrian basement rocks include gneisses, schists, and granites found on the surface outside the area, particularly on the northern and eastern edges [42]. The Precambrian rocks are unconformably overlain by the Nubian formation of the Cretaceous age, which is regarded as the main groundwater aquifer in the study [43]. This formation is composed of conglomerates, sandstone, and mudstone [44, 45]. Recent deposits include Qoz sand, wind-blown sands, and alluvium wadi deposits [46]. In confined to semiconfined settings, groundwater occurs in the loosely cemented sandstone layers of the Nubian formation, whereas silts and clays serve as aquitards and aquicludes [47]. The main sources of groundwater recharge to the Nubian formation are the Nile River, and ephemeral streams since the direct recharge from precipitation are limited [9, 48]. The groundwater levels range from 21 m close to the Nile River to 42 m above mean sea level AMSL in the eastern and northern portions of the research region. As a result, groundwater flow shows diverse directions (Fig. 3) but mainly from the west to the eastern part of the area.

Fig. 1
figure 1

Location of the study area in Khartoum state (a), the geographical distribution of groundwater samples (b) and the digital elevation model of the study area (c)

Fig. 2
figure 2

Map showing the main geological units and geomorphological features in the study area

Fig. 3
figure 3

Map of groundwater levels showing the main flow direction in the study area

2.2 Groundwater sampling

During the post-monsoon season of 2018, 17 groundwater samples were collected from privately owned groundwater well distributed in the study area (Fig. 1). Khartoum State Water Corporation (KSWC) analyzed the samples for 11 physicochemical parameters. These parameters include hydrogen ion activity (pH), electrical conductivity (EC), total dissolved solids (TDS), total hardness (TH), chloride (Cl), sulfate (SO42−), nitrate (NO3), calcium (Ca2+), magnesium (Mg2+), sodium (Na+), bicarbonate (HCO3). On-site measurements of the pH, EC, and TDS were made using portable instruments to prevent the decomposition of the sample. The multi-task instrument is used to measure these parameters and it was calibrated to ensure accurate measurements. The groundwater samples were preserved in polyethylene bottles, and they were cleaned with the groundwater sample several times to minimize disturbance and obtain representative samples. The bottles were then moved to a freezer set at 5 °C for preservation and later sent for hydrochemical analysis. Once the analysis results were received, the correctness of the analysis was evaluated by the electrical balance (EB) between the cations and anions in milliequivalents per liter using the formula (Eq. 1) supplied by [49]. The calculated EB indicated that the conducted analysis is of good accuracy since it ranges within ± 5%.

$$\left( {{\text{EB\% }}} \right) = \frac{{\sum {\text{cations }} - \sum {\text{anions }}}}{{\sum {\text{cations}} + { }\sum {\text{anions}}}} \times 100$$
(1)

2.3 Hydrochemical analysis

Piper plot is used to understand the groundwater facies in the study area. The graphical representation helps to visualize the hydrogeochemical situation and sheds light on the dominating hydrochemical facies. The proportion of milliequivalent concentrations, illustrated in two triangular domains of cations and anions, are reflected onto the central diamond of a multi-dimensional shape. The cations include Na+, Mg2+, and Ca2+, and anions (SO42−, HCO3, NO3, and Cl) served as input for the Piper diagram. The upper part of the diamond in the Piper diagram symbolizes permanent hardness and depicts water with a high concentration of Ca2+, Mg2+, Cl and SO42− ions. Due to temporary hardness, the samples projected in the left corner are rich in Ca2+, Mg2−, and HCO3. Alkaline bicarbonate water (Na+ and HCO3) is projected at the bottom of the diamond-shaped, and saline water (Na+ and Cl) is displayed mostly on the right side of the diamond.

To provide a visual representation of the hydrochemical parameters across the study area, spatial distribution mapping is carried out using inverse distance weighting (IDW) interpolation technique. The produced maps are used to visualize groundwater quality differences and to spot patterns, trends, and anomalies. IDW method is a widely used technique in geospatial analysis to convert vector-based data, into raster or grid format. It involves estimation of elevation values at each cell in the raster grid based on the known elevation values of the surrounding data points. The IDW method assigns a weighted average elevation value to each cell in the raster grid based on the distances to surrounding known elevation points. It is estimated as follows (Eq. 2):

$$Z_{j} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{{Z_{i} }}{{d_{i}^{p} }}} \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{1}{{d_{i}^{p} }}} \right)}}$$
(2)

where Zj represents the estimated value at cell j in the raster grid, Zi is the known value at each data point I, d is the distance to the known values, p is a power parameter that controls the influence of distance on the interpolation. Common values for p range from 1 to 3, where smaller values give more weight to closer points. IDW method assumes that closer known elevation points have a greater influence on the interpolated values. However, it can result in over-smoothing areas in the interpolated raster if the power parameter (p) is set too high or if there are large data gaps the input elevation data. In this research, and due to the fact that the distance between the stations is relatively long, p of 1 is used to give more weight to the closer stations and thus, avoid the over-smoothing.

2.4 Groundwater quality index (GWQI)

GWQI is a rating technique that generates an overall representation of groundwater quality based on physical and chemical parameters. This method minimizes the groundwater quality data multidimensionality to a dependent numerical value. Allocating weights, calculating the rating scale, and accumulating the sub-indices are the main steps in calculating GWQI. In this study, eleven [11] physiochemical parameters (i.e., pH, EC, TDS, TH, Cl, SO42−, NO3, Ca2−, Mg2−, Na+, HCO3) for 17 samples were used to calculate GWQI. Depending on their influence on groundwater quality, weights are applied to the physiochemical parameters. As a result, Eq. 3 is used to determine the relative weight (Wi) for each parameter. Table 1 illustrates the allocated weights and the corresponding relative weights used in this investigation. Generally, in GWQI calculation, the weights are assigned based on the author’s knowledge of the hydrochemical characteristics of groundwater [50]. This approach has formed a controversy among hydrogeologists. In this study, a statistical-based approach is followed to avoid this drawback. The weight assignment is based on the correlation between the physiochemical parameters and TDS [51] since TDS is considered a primary parameter for understanding groundwater contamination [52]. The highest weight is assigned to the parameters with a high correlation with TDS and vice-versa [11]. In the second stage, the rating scale is calculated. The standard limits recommended by [53] were used to establish the rating scale in this investigation. The rating scale is developed using Eq. 4. The subindices with their weights are aggregated as the last step in the GWQI calculation. Equations 5 and 6 were used to calculate the final index value. The categorization of groundwater samples according to the provided GWQI values is shown in Table 2.

$${\text{Wi}} = \frac{{{\text{wi}}}}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {\text{wi}}}}$$
(3)
$${\text{Ri}} = { }\frac{{{\text{Xi}}}}{{{\text{Xs}}}}{ * }100$$
(4)
$${\text{SI}} = {\text{Wi*Ri}}$$
(5)
$${\text{GWQI}} = \sum {\text{SI}}$$
(6)
Table 1 The weights and the relative weights of the physiochemical parameters
Table 2 Classification of groundwater based on GWQI as given by [52]

Ri represents the rating scale in which Xi is the value of the analyzed parameter and Xs is the standard prescribed by WHO [53]. SI is the sub-indices of the physiochemical parameters used for GWQI calculation.

2.5 Multivariate statistics

The factors impacting groundwater quality in the study area were investigated using multivariate statistical methods, including Kendall correlation analysis, principal component analysis (PCA), and hierarchical cluster analysis (HCA). Firstly, the correlation between the physiochemical parameters is applied to determine the degree of their interconnections and the function of each parameter in the overall groundwater salinity [54]. The advantage of Kendall-rank correlation matrix is that it is a non-parametric measure that can be used for continuous and ordinal variables regardless of their distribution. Kendall correlation coefficient ranges from − 1 to + 1. A value of − 1 indicates a perfect negative monotonic relationship, + 1 indicates a perfect positive monotonic relationship, and 0 indicates no monotonic relationship. The magnitude of the coefficient represents the strength of the monotonic relationship.

Secondly, principal component analysis (PCA) is used in this study to identify the sources of contamination and comment on the key factors affecting groundwater chemistry in the study area. Principal components (PCs) are extracted by PCA, which are linear combinations of the original variables [55]. PCA is used to minimize the dimensionality of the data to examine the interrelationships among a large number of variables and elucidate their common underlying components [56]. In PCA, the correlated variable is transformed using an orthogonal transformation procedure to a group of uncorrelated variables (PCs). These PCs are powerful in identifying the hidden variables affecting the data set. The first PC is associated with the highest variance. Consequently, each succeeding component has a higher variance than the next [57]. Singular value decomposition (SVD) of the data matrix is frequently used to conduct PCA. In this research, various physiochemical parameters for groundwater samples are arranged in the data matrix (X) to serve as an input for PCA where each (n) row represent the physiochemical parameters of groundwater sample and (p) column represent the physiochemical parameter (Eq. 7). The data in the matrix X then rearranged and decomposed using the matrix decomposition formula (Eq. 8) [58] as:

$${\text{X}} = \left[ {\begin{array}{*{20}c} {{\text{pH}}_{1} } & {{\text{EC}}_{1} } & \cdots & {{\text{HCO}}_{31} } \\ {{\text{pH}}_{2} } & {{\text{EC}}_{2} } & \cdots & {{\text{HCO}}_{32} } \\ \vdots & \vdots & \vdots & \vdots \\ {{\text{pH}}_{{\text{i}}} } & {{\text{EC}}_{{\text{i}}} } & \cdots & {{\text{HCO}}_{{3{\text{i}}}} } \\ \vdots & \vdots & \vdots & \vdots \\ {{\text{pH}}_{{\text{n}}} } & {{\text{EC}}_{{\text{n}}} } & \cdots & {{\text{HCO}}_{{3{\text{n}}}} } \\ \end{array} } \right]$$
(7)
$${\text{X}} = {\text{FW}}^{{\text{T}}} + {\text{E}},$$
(8)

where F stands for the N by M factor score matrix (N is the number of the groundwater samples and M is the number of the extracted principal component), W for the L by M factor loading matrix, (L is the number of the physiochemical parameters) E for the N by L matrix of residuals, T is the matrix transpose operator. The observed variables are created by combining elements linearly. The PC1 accounts for the largest variance in the data, and the other subsequent components account for a comparatively smaller variance.

Finally, hierarchical cluster analysis (HCA) is conducted to classify groundwater samples according to their chemical properties [59]. Cluster analysis aims to divide the data objects into groups and classify the components of a given data collection according to their common characteristics. The concept of data distance serves as the basis for clustering. Minimal distances within groups and maximum distances between groups are the objectives of grouping. HCA groups the groundwater samples according to a specified selection criterion, with all samples being similar in each cluster. Thus, clusters created demonstrate both great internal homogeneity and large interclass heterogeneity [60]. Dendrograms illustrate the results of the hierarchical cluster analysis. The dendrogram simplifies the complexity of the original data and presents a graphical depiction of clusters and their similarities. This study used a special circular dendrogram to represent the extracted clusters. The most used method is hierarchical clustering to measure the data distance or the similarity between the clusters is the Eucledian-distance method. The Euclidean distance (DE) between every two objects can be measured (Eq. 9) as.

$${\text{D}}_{{\text{E}}} \left( {{\text{X}}^{{\left( {\text{i}} \right)}} ,{\text{X}}^{{\left( {\text{j}} \right)}} } \right) = \sqrt {\left\{ {\left( {{\text{X}}^{{\left( {\text{i}} \right)}} - {\text{X}}^{{\left( {\text{j}} \right)}} } \right)^{{\text{T}}} \left( {{\text{X}}^{{\left( {\text{i}} \right)}} - {\text{X}}^{{\left( {\text{j}} \right)}} } \right)} \right\}}$$
(9)

where X(i) and X(j) are the two multivariate observations from a population of p random variables (X1,…Xp), and T is the transpose function of the objects.

2.6 Human health risk assessment (HHRA)

Long-term consumption of polluted water can be seriously detrimental to human health and cause a variety of diseases. Appraisal of health risks is essential for managing and evaluating water quality. The link between groundwater contamination and public health can be determined by health risk assessment and cumulative hazard [32]. The HHRA is estimated in four stages [61]. In the initial stage, the possible risks and the vulnerable people are both recognized. Depending on the level of the harmful substance in the area, the amount of average daily doses for the intended groups is computed in the second step. The third phase entails assessing the exposure depending on the quantity and duration of the exposure of the individuals to contamination. These elements are integrated into the fourth phase to determine the human health risk (Fig. 4). In this study, the assessment is based on the non-carcinogenic human health risk based on Nitrate hazard. The health risk assessment is accomplished by estimating the daily ingestion and hazard quotient. The average daily intake can be calculated (Eq. 10) as follows.

$${\text{ADI}} = { }\frac{{{\text{C}}_{{\text{i}}} {\text{*IR*EF*ED}}}}{{\text{ABW*AET}}}$$
(10)

where ADI is the average daily ingestion of the polluted groundwater [mg/(kg*day)], Ci is the concentration of NO3 in the groundwater sample (mg/L), IR is the daily ingestion rate (L/day), EF is the exposure frequency to the polluted water (365 days), ED is the exposure duration (year), ABW and AET are the average body weights (kg) and exposure time (day) respectively.

Fig. 4
figure 4

The protocol followed during the assessment of human health risk based on NO3 pollution proposed by [62]

The proportion of normal daily consumption to its standard oral dosage is known as the hazard quotient. It is calculated by Eq. 11 as

$${\text{HQ}}_{{\text{i}}} = { }\frac{{{\text{ADI}}}}{{{\text{RD}}_{{\text{i}}} }}$$
(11)

where \({\mathrm{HQ}}_{\mathrm{i}}\) is the hazard quotient of the detected parameter, and \({\mathrm{RD}}_{\mathrm{i}}\) is the oral reference dose of the parameter. The values of the input parameters used to assess the human health risk is summarized in Table 3.

Table 3 The parameters used in assessing health risks for children and adults

3 Results and discussion

3.1 Groundwater hydrochemistry

In this study, 11 physiochemical characteristics are used to assess groundwater quality in northern Khartoum state. These parameters are pH, EC, TDS, TH, Cl, SO42−, NO3, Ca2−, Mg2−, Na+, and HCO3. The result of the hydrochemical analysis and the descriptive statistics of the analyzed parameters is shown in Table 4. The pH range of the groundwater samples is between 7.1 to 8.6, suggesting neutral to alkaline water. The regional distribution of pH exhibited the greatest values in the east and the lowest values in the central part of the study area. The EC of groundwater samples ranges from 317 to 1500 µS/cm. Almost all samples except for sample 14 had an EC of less than 1500 µS/cm. TDS is the most significant indication of groundwater contamination. The TDS ranged from 190 to 6225 mg/L in this investigation. A possible cause of the fluctuation in TDS is the interaction of rock and groundwater. TH values vary from 124 to 1172 mg/L, and the maximum concentration is higher than the standard prescribed by the World health organization (WHO) [53]. The TH concentration is highly related to the amount of Ca+2 and Mg+2 in groundwater [63]. The range of Cl is from 4 to 2120 mg/L. Sample 14 has the greatest concentration, whereas sample 15 contains the lowest. The concentration of SO4−2 in groundwater varies from 3 to 1500 mg/L from the west to the east of the research region. The range of NO3 concentration is between 0.07 and 10.2 mg/L. Sample 14 demonstrates the highest concentration, whilst Sample 1 shows the lowest. The primary source of NO3 is likely to be the agricultural activities [24]. The Ca2+ content ranged from 16 to 133 mg/L in the groundwater samples. In the eastern part of the research region, sample 14 has the highest Ca2+ concentration. The concentration of Mg2+ ranges from 8.5 to 202 mg/L. Dolomite dissolution and silicate weathering are two prevalent sources of Mg+2. Due to the fact that Mg2+ primarily controls TH, the regional variation of TH and Mg2+ exhibits a similar pattern. Na+ is the most important ion in groundwater chemistry. From west to east, its concentration increases progressively throughout the area, and S14 in the eastern part of the study region has the greatest concentration (1884 mg/L), whereas S15 has the lowest value (14.4 mg/L). Silicate weathering and halite dissolution might explain the high concentration of Na+ [64]. Between 130 and 620 mg/L, the HCO3 concentration fluctuates. HCO3 is the predominant anion in the area. The sources of HCO3 include calcite and dolomite dissolution or ion exchange [65].

Table 4 The result of the hydrochemical analysis and descriptive statistics for the analyzed parameter

Figure 5 shows the areal variation of the physiochemical parameters using IDW method. Nearly the concentration of all the parameters, except for Ca2+, HCO3, and NO3, rise from the east to the western part of the area. The Nile River water contains high levels of Ca2+ and HCO3 due to the geological characteristics of its upstream sources [9]. As the Nile water infiltrates into the aquifer, it increases the concentration of Ca2+ and HCO3 of groundwater and therefore, unlike the other parameters, they show high values in the vicinity of the Nile River. The other parameters show reverse pattern since the infiltrated fresh water dilute these parameters and dominate Ca2+ and HCO3. NO3 concentrations also showed different pattern and the high values in indicated in the central part which represent the agricultural lands. The high fluctuation in the concentration of the parameter is likely due to the variation in the mineral composition of the hosting aquifer. For instance, the high salinization is explained by halite minerals dissolution or presence of volcanic patches within the Cretaceous sandstone formation. Halite and other salt minerals exhibit high solubility, causing them to dissolve when in contact with undersaturated water [66]. This led to formation of high salinity pocket [67] resulted in high differences of parameters concentrations.

Fig. 5
figure 5figure 5

Geographical variation of the physiochemical parameters used in this study where the black polygons in the upper left corners of the maps represent the Nile River

3.2 Hydrochemical facies

The evaluation of hydrogeochemical facies aids in identifying the groundwater flow characteristics and chemical evolution of groundwater. In this study, Piper diagram is projected to evaluate the geochemistry of the groundwater samples and, thus, reveal the hydrochemical facies. Figure 6 illustrates the Piper plot in which groundwater samples are plotted in the cations and anions triangles and then projected into the central diamond shape. In the anion’s triangle, the plot indicated that most of the samples are dominated by HCO3, while the cations almost equally contributed to overall chemistry with a relatively strong influence of Na+. Ca–Mg–HCO3 hydrochemical facies dominate most groundwater samples (70%) in the multi-dimensional diamond. These samples are located in the western part within the radius of the Nile River influence and are directly recharged from the Nile River. Therefore, the chemical composition of these samples is influenced by the composition of the Nile River water, which is rich in Ca+2 and HCO3. Mineral dissolution, especially calcite, and dolomite, can also cause this groundwater type. Five samples, including S1, S10, S11, S16, and S17, are projected in the mixed water zone. The mixed facies are a result of mixing different types of water. These samples are located in the central part of the study area. S14, located in the eastern part of the area, is plotted in Na–Cl hydrochemical facies. The change in the groundwater facies from the western to the eastern part of the study area is likely to be produced by the cation exchange reaction in which Na+ and mineral dissolution replace Ca2+ along the flow path [68]. It can be concluded that rock-water interactions and the cation exchange process are the main factors influencing groundwater chemistry in the eastern Nile River. However, due to the fact that the groundwater samples were collected only in post-monsoon period, slight changes in groundwater facies might be indicated due to the effect of climatic conditions. Thus, it is crucial to collect both pre-event and post-event groundwater samples to have a comprehensive understanding of the hydrochemical conditions in the study area.

Fig. 6
figure 6

Piper diagram showing the different hydrochemical facies of the groundwater samples

3.3 Groundwater quality index (GWQI)

The most crucial step in creating the GWQI model is allocating weights for each parameter [11]. The correlation coefficient between the physiochemical parameters is used in this research to determine the appropriate weights for each parameter. TDS is given the highest weight as its concentration determines the suitability of groundwater for drinking purposes. The strong relationship between each parameter and TDS defines its great role in the overall groundwater chemistry [69]. In order to assess the acceptability of groundwater for household uses, GWQI is calculated. The samples were categorized into three groups since the estimated GWQI values varied from 20.5 to 419.5 (Table 3). Approximately 76.4% of the samples are classified as excellent water, 17.6% are good water, and 5.9% of the groundwater samples are deemed unfit for human consumption. According to Fig. 7, which depicts the spatial variation in GWQI, most of the study region is covered by excellent and good water types. The groundwater quality characteristics gradually change from the western to the eastern part. S3 shows the lowest value, while S14 has the highest GWQI. The parameters of TDS, TH, Ca2+, Mg2+, Na+, Cl, HCO3, and SO42− influence the high GWQI of S14. The abundance of these parameters in groundwater samples is probably affected by rock-water interaction [23]. Except for the S14, the groundwater in the research region is appropriate for drinking, according to the measured GWQI. However, specific physiochemical parameters significantly impact some samples (S1, S17); consequently, care must be taken when using these groundwater samples for drinking purposes.

Fig. 7
figure 7

The geographic distribution of the calculated GWQI in the study area

3.4 Multivariate statistical analysis

3.4.1 Correlation analysis

The Kendall correlation matrix of the measured parameters is illustrated in Fig. 8. TDS is correlated with EC, HCO3, Na+, TH, SO42−, and Cl, with correlation coefficients of 0.84, 0.71, 0.68, 0.63, 0.62, 0.51 respectively. Since TDS is the main parameter used to understand ions content in groundwater, this high correlation with these parameters may indicate a high contribution of these ions to the salinity of groundwater. TDS is uncorrelated with NO3 and pH, which suggest the least contribution of these parameters on the overall salinity. TH is correlated with Mg2+ (0.69), SO42− (0.63) and HCO3 (0.63) indicating their high contribution relative to the other parameters on groundwater hardness.

Fig. 8
figure 8

Kendall correlation matrix with its color scale showing the correlation between the physiochemical parameters

The inter-ionic reactions between different physiochemical have been studied graphically to evaluate their origins and comprehend the chemical components of groundwater. Figure 9a shows the linear regression between Na+ and Cl. Na+ and Cl are significantly commonly produced ions by mineral dissolution. If the Na+ and Cl in the groundwater were produced by the weathering of halite, then the Na+ and Cl concentrations would nearly match the 1:1 line in the plot [18]. The majority of the samples included in this investigation fell along and above the 1:1 line with one sample bellow it (1:15). Therefore, it is evident that halite disintegration primarily explains the high concentration of Na+ and Cl in the study area. If the Na+ to Cl ratio in groundwater is bellow than 1:1, silicate weathering could be the main source of Na+ in groundwater. Another possible reason of high Na+ to Cl is that Na+ is likely to be introduced to groundwater further by ion exchange [70]. This result is also supported by the provided Fig. 9a.

Fig. 9
figure 9

Linear regression between a Cl and Na+b SO42− and Ca2+c HCO3 and Ca2++ Mg2+d SO 2−4 + HCO3 and Ca2++ Mg2+e HCO3+ SO42− and Ca2++ Mg2+, and f Na-Cl- and Ca2++ Mg2++ SO42−-HCO 3 to reveal the interionic reactions. The black arrows indicate the deviation of the groundwater samples from the 1:1 (red), 2:1 (green), and 1:2 (blue) lines

Naturally, gypsum is considered one of the primary sources of Ca2+ and SO42−. Figure 9b displays the correlation between Ca2+ and SO42−. Nearly all of the samples, except for two, fall near to 1:2 line (1.25:2.4), indicating that gypsum dissolution is a potential source; besides, that other source, such as agricultural activities due to fertilizers may have released Ca+2 into groundwater. In Fig. 9c, the concentration of the Ca2+  + Mg2+ was generally equivalent to HCO3 in most of the groundwater samples, which confirms the dissolution of calcite and dolomite as a primary source of the Ca2+ and Mg2+. Figure 9d illustrates the SO42− + HCO3 against Ca2+  + Mg2+ plot. The decomposition of gypsum, dolomite, and calcite is the dominating process in the aquifer system if the samples rise much above 1:1 line, and if they fall below the 1:1 line, weathering of silicate is indicated [71]. The groundwater sample along the 1:1 line indicates that the hydrochemical characteristics are influenced by the dissolution of both carbonate and silicate minerals. In this study, 76% of the groundwater samples fall along the 1:1 line, whereas 24% fall below the line, indicating that both silicate and carbonate weathering are dominating in the study area.

The ion exchange process in the groundwater aquifer is also studied by applying linear regressions between the major parameters. The plot of HCO3 + SO42− to Ca 2+ + Mg2+ is used for this purpose (Fig. 9e). The effect of ion exchange is demonstrated by shifting of the samples plot to the right side of the 1:1 line [72, 73]. It clearly observed that some samples are plotted right to 1:1 line indicating the contribution of ion exchange reaction on the hydrochemical scenarios in the study area. Furthermore, the ion exchange reaction is demonstrated by comparing (Ca 2+ + Mg2+–SO42− –HCO3) to (Na+–Cl) ratios. Assuming that Cl comes solely from precipitation, when Cl is subtracted from Na+, groundwater unaffected by ion exchange will cluster near zero on the Na+–Claxis. Similarly, subtracting SO42− and HCO3 from Ca 2+ + Mg2+ should yield zero values if calcite, dolomite, and gypsum dissolution occurs evenly without ion exchange [74]. However, if ion exchange is the dominant process, the groundwater samples will align along a straight line with a slope of − 1. In this study, the scatter plot (Fig. 9f) shows a good fit between the sample with R2 equal to 0.83. The line indicated a slope of approximately − 0.78, nevertheless, some samples are projected in zero values in both axes. This may indicate the partial influence of ion exchange process on groundwater system. Thus, the dissolution of calcite, dolomite, and gypsum along with ion exchange are likely to be the primary contributor to the overall groundwater quality evolution.

3.4.2 Principal component analysis (PCA)

PCA is a pattern identification technique used to reduce the dimensionality of hydrochemical data to improve visualization and analysis. In this analysis, 11 PCs are identified; however, only PCs with eigenvalues larger than one were incorporated in the interpretation. PC loadings greater than 0.3 are considered for interpretation as having a significant impact on the component. This study considers three PCs (PC1, PC2, and PC3) (Table 5), accounting for 86.07% of total variances. PC1 accounted for the majority of the variance (60.18%) and was associated with a high loading of TDS, EC TH, Cl, SO42−, Mg2+, and Na+. Table 6 shows the PCs loadings; PC1 comprises the predominant cations and anions that seem to be associated with mineral dissolution [23]. This component may also characterize the groundwater salinity since TDS, Na+, and Cl exhibited the largest loadings in the PC. PC2 accounted for 15.16% of the cumulative variance and showed high positive loading of NO3, Ca2+, and HCO3. As the study area is agricultural land, the presence of these parameters in groundwater is likely due to agricultural contamination [75, 76]. Since NO3 is weakly correlated with salinity parameters and the primary source of NO3 is agricultural activities, this PC may indicate contamination from fertilizers and irrigation return flow [77]. Calcium fertilizers can release NO3 into the groundwater, and calcium cyanamide combines to form calcium carbonate and ammonia (Eq. 12), and in the last phase, ammonia is oxidized to NO3 (Eqs. 13 and 14) [78]. This result was also supported by the spatial variation of NO3, HCO3 and Ca2+ since they show similar trends where the high concentrations of NO3, HCO3, and Ca2+ are observed near the agricultural field in the central part of the study area.

$${\text{CaCN}}_{2} { } + { }3{\text{H}}_{2} {\text{O }} \Rightarrow {\text{ CaCO}}_{3} + { }2{\text{NH}}_{3}$$
(12)
$${\text{NH}}_{3} { } + {\text{ O}}_{2} { } \Rightarrow {\text{ NO}}_{2} + { }3{\text{H}}$$
(13)
$${\text{NO}}_{2} + {\text{H}}_{2} {\text{O }} \Rightarrow {\text{ NO}}_{3} + { }2{\text{H}}$$
(14)
Table 5 Eigenvalues of the correlation matrix and the cumulative variance of the principal components
Table 6 Extracted PCA loadings for PC1, PC2, and PC3

PC3 is responsible for 10.72% of the total variance and is associated with high loading of pH. The pH of groundwater shows different spatial variation trend and low correlation with salinity where the higher pH value is observed in low mineralized groundwater. This an indication that the main source of pH is not from natural sources. Abdelsalam et al. [67] and Mohammed et al. [79] correlated the pH and ammonia (NH3) content of groundwater in eastern Nile area and they revealed that organic contamination is the main source of high pH values. Consequently, this pH is related to organic contamination mainly from septic-tanks.

3.4.3 Hierarchical cluster analysis (HCA)

The classification of groundwater types based on their physiochemical parameters is achieved by hierarchical cluster analysis (HCA). The HCA is performed based on the extracted PC scores to categorize the groundwater samples into different groups. The groundwater samples are classified into three classes based on their chemical composition. Figure 10 shows the dendrogram of the HCA. The first cluster consists of S1. Some parameters of this sample, including TDS, TH, Na+, and HCO3 are beyond the standard limits of WHO [53]. S1 is classified as good water based on the calculated GWQI (96.6) and thus can be described as a moderately mineralized sample. The TDS in this sample is above 1000 mg/L; as a result, it’s classified as brackish water. The persistent use of this sample for drinking purposes may stimulate human health risks, WHO [53]. The second cluster includes 88% of the total samples (S2–S13 and S15–S17). The similarity of the chemical composition between the groundwater samples in this cluster is 75%. Parameters of low concentrations characterize the samples in the second cluster, and the maximum values of the parameters are lower than the WHO [53] standard. Groundwater samples cluster 2 have a lower chemical content than cluster 1, and the similarity between the two clusters is 65%. The third cluster consists of S14. The groundwater sample in this cluster is associated with higher TDS (6225). Consequently, it can be described as unsuitable for human consumption, WHO [53]. There is no similarity between this cluster and clusters 1 and 2 since all the physiochemical parameters exceeded the allowable limit prescribed by WHO [53]. S14 is highly mineralized water and is considered unsuitable for drinking based on the GWQI. Based on the HCA, it can be concluded that groundwater samples in the study area can be classified as low (Custer2), moderate (Cluster1), and highly (Cluster 3) mineralized.

Fig. 10
figure 10

Dendrogram showing the clusters of groundwater samples based on their chemical contents

3.5 Evaluation of health risk

Average daily intake (ADI) and hazard quotient (HQ) were determined for adults and children who consumed NO3. Table 7 displays the computed ADI and HQ for the northern Khartoum state. The ADI varies between 0.003 to 0.46 and 0.05 to 7.95 mg/(kg*day) for adults and children, while the HQ for adults and children ranges from 0.002 to 0.288 and 0.03 to 4.97. Furthermore, Fig. 11, a, and b, respectively, demonstrate the spatial distribution of HQ for adults and children. In accordance with the results, the eastern and southwestern regions of the study area exhibit the highest HQ for both adults and children. In 11 groundwater samples (64.7%), the HQ levels for children are higher than the safe level (1 <), which will lead to an adverse effect on their health. Because chemical fertilizers are widely utilized in the eastern portion of the research area, agricultural activities such as irrigation return flow and crops are likely to cause a high concentration of NO3. Animal feedlots, septic tanks, and sewage slug discharge are additional potential sources [80]. Natural factors such as mineral dissolution may also contribute to the high salinization observed in the eastern part of the study area.

Table 7 The calculated daily intake and hazard quotient of NO3 for adults and children
Fig. 11
figure 11

The areal variation in the hazard quotient for (a) adults and (b) children in the study area

Children lose weight due to the elevated NO3 levels in groundwater, which also cause methemoglobinemia or blue baby syndrome [81]. Other possible side effects of NO3 include major health issues like exhaustion, weakness, and dizziness [82]. Therefore, to ensure that groundwater can be safely used for drinking purposes, NO3 contamination remediation techniques are required. In the research area, anthropogenic activities, including using open septic tanks and nitrogen-based fertilizers in agriculture, often correlate with NO3 content in groundwater. Reverse microfiltration is a potential treatment method for contaminated groundwater, and it can be very efficient when combined with a strong base anion to remove NO3 through ion-exchange process. It is essential to strengthening the groundwater management plan in order to raise public awareness of NO3 pollution. Finally, the implementation of appropriate legislation, the use of organic fertilizers, and modernization of sewage systems are all important components of an effective pollution management approach.

4 Conclusions

This research successfully investigated and evaluated the groundwater quality and its suitability for drinking in north Khartoum state, Sudan, using GWQI, multivariate statistical techniques, and human health risk assessment methods. The main conclusion of the study is summarized as follows:

  • The general physiochemical analysis revealed that, relative to the west, groundwater samples in the eastern parts are significantly mineralized, and groundwater facies gradually change from Ca–Mg–HCO3 to Na–Cl due to the cation exchange process. The groundwater samples were collected in post-monsoon period and therefore, slight changes in groundwater facies are expected in pre-Manson periods.

  • GWQI is calculated based on the analyzed physiochemical parameters, and it was indicated that most of the groundwater samples (76.4%) are classified as excellent water types. In contrast, only one sample is considered unsuitable for drinking purposes.

  • The effectiveness of multivariate statistical approaches in assessing groundwater quality has been proved. Based on the physiochemical parameters, PCA and HCA were employed to predict the main factors influencing groundwater quality and categorize groundwater samples based on their salinity. Three PCs accounted for 86.07% of the total variance and suggested two variables that affect groundwater quality: mineral dissolution and, to a lesser extent, agricultural practice and organic pollution. Additionally, hierarchical clustering categorized groundwater samples into three classes: low, medium, and highly mineralized.

  • The non-carcinogenic human health risk was assessed based on NO3 concentration for adults and children. The obtained hazard quotient indicated that no health risk is posed for adults, while for children, 64.7% of groundwater samples are beyond the permissible limit (1 <). As a result, remedial measures are suggested for the safe use of groundwater.

  • In identifying and assessing groundwater quality for drinking purposes in the northern Khartoum state, the integrated methodologies showed to be resilient and effective. However, this study recommends applying a combined carcinogenic and non-carcinogenic health risk assessment for sustainable groundwater supply. For instance, chromium and arsenic as carcinogenic parameters have severe health consequences such as kidney and stomach cancer. Furthermore, this research also recommends implementing a groundwater monitoring system to allow for identification and characterization of potential contaminant sources, which can aid in determining the extent of contamination and designing appropriate mitigation measures. Groundwater monitoring allows for sampling and data collection for both pre- and post-monsoon which helps in comprehensive assessment of the groundwater resources in the study area.