1 Introduction

Groundwater (GW) is the largest available source of freshwater, supporting both human needs and economic development. The tremendous growth in agricultural, industrial, and domestic activities in recent years has significantly increased the demand for high-quality water to meet these rising needs [1]. Water quality degradation is one of today's key environmental challenges, as pollutants infiltrate groundwater, compromising its quality. Once contaminated, remediation through artificial flushing or treatment is often infeasible or prohibitively expensive [2,3,4].

Nitrate contamination is a common issue in groundwater in many parts of the world [5]. According to the World Health Organization (WHO) and the European Community (EC), the maximum contaminant level (MCL) of nitrate is 50 mg/L. In the United States Environmental Protection Agency (EPA) guidelines, the MCL for nitrate is set at 44 mg/L in drinking water. Additionally, the EC specifies a guide level (GL) of nitrate at 25 mg/L [6, 7]. Human activities are the primary source of nitrate pollution in groundwater [8, 9]. Studies indicate an increase in nitrate concentration correlates with global agricultural production [9], particularly due to the use of mineral fertilizers, which significantly elevate nitrate levels in groundwater [9,10,11]. Poultry and livestock farming are also notable agricultural contributors to this issue [6, 12]. Other anthropogenic sources of groundwater pollution include municipal leachate, solid waste disposal, sewerage systems, industries, municipal waste, garages, and fuel stations [13, 14]. From a land-use perspective, agricultural lands and urban areas are major contributors to groundwater pollution [13].

The depth to the groundwater table is a key determinant of groundwater contamination vulnerability [13, 15,16,17]. It significantly correlates with the time nitrate contaminants are required to reach groundwater [18, 19]. Nitrate concentration is often considered a potential indicator of contaminant movement, as its presence in groundwater is generally not natural [8].

However, variability in spatial data and constraints complicates groundwater monitoring, making remediation costly and challenging [16]. Therefore, spatial information on hydrogeology is crucial for the effective management and protection of groundwater quality [20] as well as for groundwater exploration. Spatial interpolation can be a valuable tool in addressing this challenge [21,22,23,24,25]. However, the choice of interpolation methods can lead to significant discrepancies in the results [26, 27], impacting predictions and outcomes [23]. Many studies have been conducted on this aspect, but they have yielded varying results. Among the diverse interpolation methods, none is unequivocally optimal [28], and no universally accepted method exists [22]. Therefore, selecting a suitable interpolation method for a particular case is essential and requires evaluating the available methods [23, 28]. Geographical Information System (GIS) is a leading tool with significant potential for addressing various environmental problems. The emergence of geostatistical analysis as an innovative tool bridging the gap between geostatistics and GIS has enabled widespread analysis of spatial variations in groundwater characteristics [1, 13, 21,22,23,24,25,26].

The Elalla-Aynalem catchment is a critical wellfield, supplying 96% of Mekelle City’s water [29]. Although water quality pressure is a significant concern in the area, research on the groundwater table and nitrate levels is scarce. As a result, the optimized spatial maps for groundwater table depth and nitrate concentration in the Elalla-Aynalem catchment are not well-developed. Exceptions include studies such as ref. [13], which examined the contamination potential of the Elalla-Aynalem catchment, and ref. [30], which characterized nitrate concentration in Aynalem groundwater 20 years ago. Additionally, ref. [14] investigated the impact of municipal solid waste landfill leachate on Mekelle City’s water resources, though the study did not encompass the entire catchment. In ref. [29] focused on groundwater management practices and database development, ref. [31] assessed the groundwater resources of the Aynalem wellfield, and ref. [32] studied the implications of groundwater quality on corrosion problems in the Mekelle area.

This research aims to develop optimized spatial maps for the groundwater table and nitrate concentration by evaluating geospatial models of the Elalla-Aynalem catchment.

1.1 Description of the study area

The study area, known as the Elalla-Aynalem Catchment, is located in the northern part of Ethiopia, within the Tigray regional state, surrounding Mekelle Town. Geographically, it lies between 1,480,144 m and 1,503,274 m North and 539,743 m and 575,683 m East, covering an area of 493 km2. It is bounded to the north by the Agulae catchment, to the east by the Danakil Basin, to the south by the Chelekot Sub-basin, and to the west by the Giba River. The catchment’s elevation ranges from 1750 m in the west to 2634 m in the east. The Aynalem and Elalla Rivers are the major drainage systems of the catchment, flowing primarily from east to west before joining the Giba River, part of the Tekeze drainage system, which is a main tributary of the Atbarah River in Sudan. The drainage pattern of the catchment is predominantly dendritic (Fig. 1). The main rivers follow structurally weak planes associated with the Mekelle fault, which intersects various lithological units, while the tributaries follow the general slope inclination. The region experiences monthly minimum and maximum temperatures of 9.4 °C and 26.9 °C, respectively, with an annual rainfall of 577 mm, based on 20 years of data recorded at Mekelle Airport.

Fig. 1
figure 1

Study area’s location map

1.2 Geology and hydrogeology of the study area

The dominant geological formation in the area is Agula shale, interspersed with hills composed of dolerite sills and uplifted limestone (Fig. 2a). Based on the groundwater piezometric level, the flow direction is towards low-altitude areas [33], with groundwater moving toward the major river valleys.

Fig. 2
figure 2

Geological map (adopted from Abreha 2014 [33]) a and land use land cover b of the area

1.3 Topography, soil, and land use land cover

Most of the ElallaAynalem catchment is flat with some hills and dominated by Clay loam soil followed by clay and sandy loam. Agricultural land, natural vegetation, bare land, and building (settlement) areas are the main Land use and land cover types in the study area (Fig. 2b).

1.4 Data sources and methods

1.4.1 Data sources

The Static water levels and nitrate concentrations data inputs for six years were collected from the Tigray Region Water Works and Construction Enterprise (TWWCE), Tigray Region Water Works, Design, Study, and Supervision Enterprise (TWWDSSE), Mekelle Water Supply Office, Tigray Region Water Resource Development Bureau, and Tekeze Dip Wells Drilling P.L.C. (Appendix 1 and 2).

The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Digital Elevation Model (DEM) with a 30 m resolution was downloaded from the United States Geological Survey (USGS). Additionally, ten data points for contaminant source areas were collected directly from the study area.

1.4.2 Depth to groundwater and Nitrate concentration

In this study, a total of 123-point data for depth to groundwater and 71-point data for nitrate concentration are used for analysis after removing 4-point data for groundwater level for their mismatching their location with the study area.

1.4.3 Interpolation methods

Two groups of interpolation methods were employed in this study. They are the Geostatistical methods or kriging family (Ordinary, Universal, Simple, and Empirical Bayesian) and the Deterministic interpolation (Inverse Distance Weighing, Radial Base Functions, Global Polynomial Interpolation, and Local Polynomial Interpolation). According to ref. [34] all these spatial interpolation methods can be represented as weighted averages of the measured data and have the same general estimation formula as in Eq. (1).

$$F\left( {X{\text{p}}} \right) = \sum _{{i = 1}}^{m} \lambda {\text{i}}F(X{\text{i}})$$
(1)

where F (Xp) is the estimated value of an X p point of the plane. F (Xi) is the measured value for each point Xi. The weight assigned to each sampled point is represented by λi. Finally, m is the number of sampled points used in the estimation.

The major difference between all interpolation methods lies in how the weight values for each point within the neighborhood are calculated.

  1. 1.

    Geostatistical models

Geostatistical interpolation employs more complex models that incorporate statistical correlations between sampled points, also referred to as spatial autocorrelation. These methods are widely recognized for their ability to characterize spatial variability, interpolate between sampled points, and generate prediction maps [1, 35].

  1. a.

    Ordinary Kriging

Ordinary Kriging estimates unknown data points by averaging subsets of neighboring data points. It assigns weights to the known data points, with higher weights given to points closer to the unknown locations. These higher weights reflect the greater influence of closer known data points on the estimation.

  1. b.

    Simple kriging

Simple Kriging is the most basic form of kriging, characterized by its simple mathematical formulation. This method relies on the overall average of the observed data points for its estimations.

  1. c.

    Universal kriging

Also known as kriging with a trend or kriging in the presence of a drift. A spatial trend or a drift represents any detectable tendency for the values to change as a function of the coordinate variables. It splits the random function into a linear combination of deterministic functions.

  1. d.

    Empirical Bayesian Kriging

Empirical Bayesian Kriging automatically optimizes its parameters by using multiple semivariogram models rather than a single semivariogram. This method is particularly suitable for moderately nonstationary data and small datasets, enabling accurate predictions. It ensures that the parameters required to build a valid kriging model are automatically fine-tuned.

  1. 2.

    Deterministic interpolation

Deterministic interpolation refers to methods that use simple mathematical models to estimate unknown values based on surrounding known points [26]. The techniques used include Inverse Distance Weighting (IDW), Radial Basis Functions (RBF), Global Polynomial Interpolation (GPI), and Local Polynomial Interpolation (LPI).

  1. a.

    Inverse distance weighting

This method assumes that the weighted average of known data points in the local area surrounding an unsampled location determines the attribute value at that location. Points closer to the unknown data point are considered more similar to it compared to those located farther away.

  1. b.

    Radial Basis Function (RBF)

According to ref. [36] Radial Basis Function (RBF) methods use a series of exact interpolation techniques, and the surfaces generated by this method pass through each measured sample value. Each basis function has a unique shape, resulting in a slightly different interpolation surface. It approximates or smooths a function from a set of known and scattered data points. The approximation depends on the distance from a known center point or origin.

  1. c.

    Global Polynomial Interpolation (GPI)

This is a fast yet imprecise method that applies multi-level regression across the entire dataset. The resulting surface can be heavily influenced by data points near the borders or edges [37]. This technique is particularly useful when the known data points represent an undulating surface [26].

  1. d.

    Local Polynomial Interpolation (LPI)

LPI generates a grid surface suitable for short-variance interpolation and is best applied to points within a specific neighborhood, rather than across the entire dataset [38].

1.5 Visualization and exploratory data analysis

Figure 3 illustrates the methods and techniques employed in this study to predict the groundwater table and nitrate concentration. The process begins with screening data values through visual analysis to identify mismatched coordinate points. Basic summary statistics, including means, medians, variances, skewness, histograms, and normal plots, are used to describe the data and detect outliers that could negatively impact spatial prediction. Notably, the variogram is highly sensitive to outliers [28]. A log transformation was applied, and the histogram was regenerated. A total of 123 data points for depth to groundwater and 71 for nitrate concentration were used in this study. Depth to groundwater and nitrate concentration were interpolated using Ordinary, Universal, Simple, Empirical Bayesian, Inverse Distance Weighting (IDW), Radial Basis Functions (RBF), Global Polynomial Interpolation (GPI), and Local Polynomial Interpolation (LPI) methods using geostatistical analyst tools.

Fig. 3
figure 3

Methodology used for spatial prediction of groundwater table and nitrate concentration

About 160 tests were conducted to find the optimal method for depth to groundwater and nitrate concentration in the study area. Then, the final eight optimal models were compared and the best one was therefore selected. The cross-validation technique [39] was used to compare the different interpolation approaches that are commonly used to validate the accuracy of an interpolation method [1]. The evaluation criteria that were used included the root mean square error (RMSE) and the mean error (ME) [26, 40], which are defined as follows:

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{\left[{z}^{*}\left({x}_{i}\right)-z\left({x}_{i}\right)\right]}^{2}}$$
(2)

where z*(xi) and z(xi) are the estimated and measured values at location xi, respectively, and n is the number of observations.

These approaches were selected due to their effectiveness, simplicity, and widely used method in measuring the accuracy of the studies and are employed in a broad range of fields. Interpolation and mapping were performed by using the ArcGIS Geostatistical Analyst.

We looked at the difference between the known and predicted data using the root mean squared error (RMSE) to Compare and contrast the different interpolation methods as [1] use it for PH of groundwater. A method with the lowest RMSE value was used as best prediction method among all the tested techniques in this study [1]. The graphical methodology used to spatial prediction and model evaluation for groundwater table and nitrate concentration representation is described on Fig. 3.

2 Results and discussion

2.1 Exploratory analysis

The skewness values for the first line (depth to groundwater) and the third line (nitrate concentration) were obtained without applying any transformations. After log transformation, the kurtosis values for depth to groundwater (3.26) and nitrate concentration (4.03), being greater than 3, indicate a leptokurtic distribution, which is more peaked than a normal distribution. The skewness values for depth to groundwater (-0.59) and nitrate concentration (-0.95), being less than 0, suggest asymmetry in the data distribution. The mean (2.60) and median (2.71) for depth to groundwater are similar, indicating that the data approximates a normal distribution. Similarly, the mean (2.76) and median (3.13) for nitrate concentration are relatively close. Skewness values for the histograms are listed in the second and fourth lines of Table 1. Using log transformation, the data was adjusted to match a normal distribution, and semi-variograms were subsequently calculated. When skewness coefficients are close to zero and kurtosis coefficients approximate the value of 3, the data is considered to follow a normal distribution [41, 42].

Table 1 Basic statistics of the raw data and the log-transformed data

2.2 Geostatistical analysis

2.2.1 Prediction of groundwater level and nitrate concentration

Cross-validation results revealed that geostatistical methods outperformed deterministic interpolation methods in predicting both depth to groundwater and nitrate concentration in the Elalla Aynalem catchment.

Empirical Bayesian Kriging (EBK) was identified as the best interpolation technique for predicting depth to groundwater, achieving the lowest root mean square error (RMSE), followed by Simple Kriging. Moreover, Simple Kriging was found to be the most effective method for predicting nitrate concentration followed by Empirical Bayesian Kriging (EBK).

Among deterministic methods, Inverse Distance Weighting (IDW) was the best predictor for the dataset, followed by Radial Basis Function (RBF), which performed better than other methods (Table 2).

Table 2 Cross validation performance of prediction generated maps by different interpolation methods for depth to groundwater and nitrate concentration

The results also showed that the Global Polynomial Interpolation (GPI) model was the least accurate spatial interpolation model based on the statistical indicators. The preference ranking of models for predicting the groundwater table was as follows: EBK > SK > OK > UK > IDW > RBF > LPI. For nitrate concentration, the ranking was SK > EBK > IDW > OK > UK > RBF > LPI. Studies conducted in Morocco, Iraq, and Qatar [21, 22, 34] also concluded that geostatistical methods better represented groundwater levels than deterministic interpolation methods.

The groundwater levels in the study area ranged from 1 m to 69.9 m, with a mean of 18.1 m and a standard deviation of 13.4 m. Based on classifications by refs. [13, 15], the groundwater table depths were divided into six classes: < 4.6 m, 4.6–9.1 m, 9.1–15.2 m, 15.2–22.8 m, 22.8–30.4 m and > 30.4 m with area coverage of 64.41 km2 (13.1%), 137.2 km2 (27.8%), 202.8 km2 (41.2%), 46.15 km2 (9.4%), 41.69 km2 (8.5%) and 0.77 km2 (0.16%) respectively. These classifications range from high-depth zones (minimal vulnerability) to shallow-depth zones (maximum vulnerability) under the assumption that other factors remain constant (Fig. 4). Shallow water table depths in the flat areas of the northwest and central parts of the study area are associated with high groundwater contamination rates. The shallow depths result in a short travel time and distance for pollutants to reach the groundwater [15, 16].

Fig. 4
figure 4

Map of depth to groundwater using different interpolation methods

The nitrate concentration in the Elalla Aynalem Catchment ranges from 0.21 to 336.1 mg/L, and is classified into three levels according to [43] which are: < 20 mg/L, 20–50 mg/L, and > 50 mg/L (Fig. 5). This indicates significant anthropogenic interference in the study area. According to [44], nitrate concentrations exceeding 13 mg/L NO₃⁻ indicate anthropogenic influences. Most areas with nitrate concentrations above 50 mg/L are located in regions with shallow groundwater tables and near pollutant sources, particularly in the northwest and central parts of the catchment (Fig. 6c). Studies by refs. [30, 33] also found that many water resource observations in the area do not meet WHO water quality standards, largely due to anthropogenic effects around Mekelle.

Fig. 5
figure 5

Map of nitrate concentration using different interpolation methods

Fig. 6
figure 6

Prediction standard error map of a groundwater level, b NO3 concentration, and c NO3 concentration and pollutant source’s location

2.2.2 Prediction standard error of groundwater level and nitrate concentration

Figure 6a and b illustrate the spatial distribution of uncertainties in the prediction of groundwater level and nitrate concentration across the study area. The uncertainty ranges from 1.73 to 5.83 m in areas near the measured points and gradually increases to 23.91 to 29.67 m in regions farther away from observed values for groundwater level. Similarly, the uncertainty for nitrate concentration varies between 0.31 and 31.33 mg/L in the neighborhood of measured points and increases to 188.35 to 247.48 mg/L in areas distant from measured points. These maps provide a valuable basis for identifying optimal sites for groundwater potential and for monitoring and controlling contaminants in the area. They are particularly useful for enhancing the accuracy of planning, development, and decision-making processes by stakeholders in the Elalla Aynalem.

3 Conclusion

This study conducted a comparative analysis of eight widely used interpolation techniques to identify the best models for spatial interpolation of groundwater table depth and nitrate concentration in the Elalla Aynalem Catchment. Empirical Bayesian Kriging (EBK), followed by Simple Kriging, was identified as the best technique (lowest RMSE) for predicting groundwater table depth, while Simple Kriging, followed by Empirical Bayesian Kriging, performed best for mapping nitrate concentration. The findings conclude that geostatistical methods outperformed deterministic interpolation methods in representing both groundwater table depth and nitrate concentration. Additionally, the optimal spatial interpolation methods generated uncertainty maps associated with groundwater table depth and nitrate concentration in the study area. The northwest and central parts of the study area have shallow water table depths, with most of these regions exhibiting nitrate concentrations above 50 mg/L and being located near pollutant sources. This indicates significant anthropogenic interference in the area. Geostatistical models provided an effective framework for analysis and demonstrated high capability in handling spatial datasets for this and similar study contexts. This research is expected to be valuable for planning, development, and decision-making processes by stakeholders in the Elalla Aynalem region. Finally, the study highlights the need for further investigation, particularly seasonal hydrochemical analyses of pollutant indicators and the identification of both point and non-point sources of pollutants, as these remain gaps in the current research.