Introduction

Groundwater, occupying the interstitial spaces within soil layers (Todd and Mays 2004) constitutes a vital global resource, encompassing approximately 23.4 million cubic kilometers of water volume (Frappart and Ramillien 2018). The significance of groundwater lies in its pivotal role in sustaining life, supporting agricultural and industrial activities, and serving as a critical resource of freshwater in semi-arid and densely populated regions worldwide (Panahi et al. 2017; Shojaei and Rahimzadegan 2022). The equilibrium between recharge and extraction governs groundwater storage, making its continuous monitoring crucial, especially in regions facing drought or overexploitation, which can lead to resource depletion (Frappart and Ramillien 2018).

There are different methods to monitor groundwater level (GWL) changes, including water level measurement at groundwater wells, numerical modeling, and satellite estimation. Water table measurements are performed using classical instruments at groundwater wells (Rostami et al. 2020; Senthilkumar and Elango 2011) and present precise values. However, point measurements are accurate and reliable only at the measured points, and extrapolating the measured values to other places results in different uncertainties (Masood et al. 2022). Numerical modeling of groundwater, which has been extensively used worldwide, is a very useful tool for evaluating and predicting groundwater resources (Khalaf and Abdalla 2015; Lyazidi et al. 2020; Mohanty et al. 2013). In this regard, different numerical models have been developed and have been used all over the globe, such as MODFLOW (Calderón Palma and Bentley 2007; Jang et al. 2012; Tahershamsi et al. 2018; Wang et al. 2008), Water—Global Analysis and Prognosis (WaterGAP) (Döll et al. 2012; Fatolazadeh and Goïta 2021), Groundwater Modeling System (GMS) (Khalaf and Abdalla 2015), MIKE SHE (Shu et al. 2018), HydroGeoSphere (HGS) (Erler et al. 2019), Monte Carlo analyses (Wang et al. 2020), and Groundwater Spatiotemporal Data Analysis Tool (GWSDAT) (Jones et al. 2014). However, Numerical groundwater modeling depends on point-based groundwater monitoring and some other data, which limits its use for regions with limited field data.

Satellite estimations are cost-effective and real time data, which are used in different disciplines (Tariq et al. 2020, 2021, 2022; Tariq and Shu 2020), such as estimating GWL changes (Alshehri and Mohamed 2023; Chanu et al. 2020). Among satellite estimations of GWL changes, the gravity recovery and climate experiment (GRACE), has increasingly been employed (Alshehri and Mohamed 2023; Amiri et al. 2023; Rahimzadegan and Entezari 2019). Studies utilizing GRACE data have shed light on GWL variations in many regions, including in Central Valley (Liu et al. 2019; Thomas and Famiglietti 2019; Valley 2009) and some other aquifers in the USA (Rateb et al. 2020), Middle East (Voss et al. 2013) especially in Iran (Abou Zaki et al. 2019; Amiri et al. 2023; Forootan et al. 2014; Joodaki et al. 2014; Rahimzadegan and Entezari 2019), GRACE data have exhibited remarkable reliability, demonstrating strong correlations with observations at groundwater wells (Rateb et al. 2020).

Groundwater modeling, facilitated by powerful programs such as MODFLOW, especially via helpful graphical user interfaces (GUIs) such as GMS complements satellite-based assessments, offering valuable insights into regional groundwater hydrodynamics (Behera et al. 2022; Khalaf and Abdalla 2015; Lyazidi et al. 2020; Al-Taiee and Hasan 2006; Mohanty et al. 2013; Senthilkumar and Elango 2011; Tahershamsi et al. 2018; Wang et al. 2020). These models help to understand groundwater flow patterns, evaluate the impact of interventions such as the effects of underground dams, and forecast future groundwater levels under various extraction scenarios.

As the previous studies showed, GRACE data have been proven valuable in investigating GWL changes on global and regional scales. However, its applicability on a local scale necessitates comparison with observational data for validation and accuracy assessment. Moreover, the newly released GRACE-FO satellite requires further investigation to ascertain its performance compared to GRACE. Additionally, the evaluation of the performance of GRACE and GRACE-FO in estimating GWL changes compared to groundwater modeling results was less addressed in the previous studies. Hence, the primary objective of this study is to evaluate the accuracy of GRACE and GRACE-FO estimations of GWL changes on a local scale using observational data from groundwater wells in the five selected provinces in Iran. Moreover, the significant trends of the GWL changes using different data sources were evaluated using Mann–Kendall and Sen’s slope tests. Additionally, the estimations of satellite for GWL changes for one of the provinces were evaluated against groundwater modeling results. This research endeavors to enhance our understanding of satellite-based monitoring of GWL changes on a local level, contributing to informed decision-making and sustainable water resources management.

The study area and used data

Study area selection

In the current research, five provinces in Iran were selected for analysis East Azerbaijan, Khorasan Razavi, Golestan, Sistan and Baluchistan, and Fars (Fig. 1). These provinces were chosen due to the availability of extensive hydrologic and hydrogeologic data, making them suitable for a comprehensive assessment.

Fig. 1
figure 1

The map of the selected provinces, along with the locations of the observational wells used in this study

In addition to comparison of satellite data with observations from groundwater wells in the five selected provinces, this research incorporates numerical modeling for a more detailed investigation in Azarshahr plain aquifer in East Azarbaijan province and comparing the results with data. The modeling was performed using MODFLOW through GMS10.4.

Azarshahr plain aquifer is geographically situated between 45°49’ to 46°21’ eastern longitudes and 37°37’ to 37°45’ northern latitudes (Fig. 2). This aquifer is a part of the Urmia Lake catchment, one of Iran's thirty basins, covering a total area of 457 square kilometers, including approximately 124 square kilometers of alluvial plains. The elevation of the highest and lowest points within the study area is 3100 and 1282 m above mean sea level, respectively (Moghaddam 2004).

Fig. 2
figure 2

The geographic extent of the Azarshahr plain aquifer located in East Azarbaijan province

Used data

The dataset utilized in this study was provided from four primary sources, namely GRACE data, GRACE-FO data, data from the global land data assimilation system (GLDAS) project, and observational data from groundwater wells. Each of these data sources is briefly introduced below.

GRACE and GRACE-FO used data

The twin GRACE satellites were launched in March 2002 by cooperation of NASA (National Aeronautics and Space Administration, USA) and DLR (German Aerospace Center). Their primary mission was to monitor Earth's gravity changes. After their retirement in 2017, they were replaced by GRACE-FO satellites, launched in June 2018. The laser ranging interferometer (LRI) used by GRACE-FO provided higher measurement accuracy than microwaves used by GRACE (Abich et al. 2019; Kornfeld et al. 2019).

The missions of both of the satellites included tracking ice sheet and glacier changes (Ciracì et al. 2020), monitoring total water storage (Kornfeld et al. 2019; Rahimzadegan and Entezari 2019), investigating ocean mass changes (Uebbing et al. 2019), drought monitoring (Shojaei and Rahimzadegan 2022), and flood potential (Xiong et al. 2021). GRACE and GRACE-FO provide useful information on global groundwater depletion (Alshehri and Mohamed 2023; Amiri et al. 2023; Pfeffer et al. 2022). GRACE data are useful for groundwater monitoring on a regional scale at monthly to seasonal scales (Masood et al. 2022).

This study used 163 monthly data series from GRACE (Level 3, RL06) covering the period from 2002 to 2017, and 17 data series from GRACE-FO (Level 3, RL06) covering the period from 2018 to 2020 (Table 1). Data from four below-mentioned centers were employed for processing and converting satellite measurements into equivalent water height: GFZ (Earth Science Research Center, Germany), JPL (Jet Propulsion Laboratory, NASA), CSR (Center for Space Research at the University of Texas, Austin), and CNES (French Space Studies Center) with Mass Concentration blocks (MASCONS) (Deng and Bailey 2020; Oleson et al. 2013).

Table 1 Characteristics of the used data for this study

Data from GLDAS project

The GLDAS project is a global land surface project led by scientists from NASA, Goddard Space Center (GSC), National Oceanic and Atmospheric Administration (NOAA), and National Centers for Environmental Prediction (NCEP) (Rodell et al. 2004; Xia et al. 2019). Its mission was to combine satellite data and data provided by standard terrestrial methods using land surface models, flow mode simulator, and optimum surface flow.

The GLDAS project incorporates several hydrological models, including community land model (CLM), NOAH land surface model, mosaic, and variable infiltration capacity (VIC) (Rodell et al. 2004; Xia et al. 2019). Among these models, the higher accuracy of CLM was mentioned by Lo et al. (2010); Niu and Yang (2006); Rahimzadegan and Entezari (2019). The model combines three models of National Center for Atmospheric Research (NCSR), atmosphere-biosphere transfer model, and Land Surface Model (LSM) developed in the Chinese Institute of Atmospheric Physics (IAP) (Oleson et al. 2013). CLM can calculate soil moisture in ten layers with a total thickness of 4.33 m from the ground surface (Dai et al. 2003). In this research soil moisture data, snow water equivalent, and vegetation moisture equivalent were extracted from the CLM model, obtained from the GLDAS project (https://ldas.gsfc.nasa.gov/). The soil moisture was measured for both the entire soil thickness (upper-most layer and middle layer) and only the middle layer (Oleson et al. 2013).

Fig. 3
figure 3

The generated maps for a Thiessen polygon, b transmissivity map, and c Geological Map for Azarshahr plain (JI: Light gray, thin—bedded to massive limestone ( LAR FM), K1l: Massive to thick—bedded orbitolina limestone, Kush: Limestone, argillaceous limestone; tile red sandstone and gypsiferous marl, Kvbv: Basaltic volcanic, M3p1: Pyroclastic and claystone with vertebrate fauna remains (MARAGHEH FM.), M3p2: Ash flows and associated rocks (MARAGHEH FM.), M3p3: Ash flows and associated pyroclastic rocks, conglomerate, sandstone and shale (Maragheh Fm), Plsav: Pliocene rhyolitic to rhyodacitic subvolcanic, PLvib: Andesitic to basaltic volcanic, Qsl: salt lake, Qt1: High level piedmont fan and valley terrace deposits, Q3: Lake and Qtr: Teravertine)

Data from observation wells

Data from observation wells shown in Table 2, transmissivity maps, and geological maps were employed to construct the conceptual model for the Azarshahr aquifer (Fig. 3). All data were acquired from the Iran Water Resources Management Company (https://www.wrm.ir/).

Table 2 Time series of the data used in groundwater modeling

Methodology

The used methodology in this study is shown in the form of a flowchart in Fig. 4.

Fig. 4
figure 4

The flowchart of the methodology

Pre-processing of the observations

After selecting 2500 wells which were the most proper wells in terms of having maximum available data during the desired time period, 1944 wells with water level change of more than 200 cm per month were eliminated from the prepared data set due to the unreasonable rate of water level change. Also, a weight proportional to the inverse of the distance from each well located in a pixel to the pixel center was assigned to each of the rest of the wells, and the weighted monthly water level change for each pixel was obtained (Rahimzadegan and Entezari 2019).

The data acquired from GRACE and GRACE-FO for GWL changes were modified by subtracting the GLDAS output from those. Also a time lag approach was used following the methodology suggested by Rahimzadegan and Entezari (2019). The rationale behind the time lag implementation was twofold. Firstly, it accounted for the duration required for the entire aquifer to respond to changes in groundwater levels across all observation wells. Secondly, it considered the time it takes for satellites to capture changes in groundwater levels within the aquifer. The existence of time lag between GWL changes estimated by GRACE and measured at groundwater wells measurement were proved by other researchers such as Rzepecka and Birylo (2020) and Jyolsna et al. (2021). The investigation involved estimating correlation coefficients (R) between satellite derived groundwater levels and observed groundwater levels, and was performed twice: once without applying any time lag and next after introducing 1- and 2-months’ time lags to the observational data. Graphical representations of the correlation coefficients were subsequently analyzed and compared and presented in the results section.

Checking the significance of data

To assess the significance of data obtained from observation wells, modeling results, GRACE, and GRACE-FO, two distinct approaches were employed. The first approach involved considering the moisture data acquired from the entire soil layers, encompassing both the upper-most layer and the middle layer, as obtained from the CLM data (Oleson et al. 2013). The second approach considered the moisture data acquired only from the middle soil layer (Wahr et al. 1998).

To establish the significance, a T-test with two independent samples was applied, and the resulting significance level (P-Value) was determined from the test results (Okoroiwu and Akwiwu 2019; Vafadar et al. 2023). The T-test facilitated a comparison of the means between two statistical samples. Specifically, two random groups—neglecting the number of samples—were selected from two different communities, and their respective averages were compared to assess any significant difference (Nosratpour et al. 2022; Panda et al. 2007).

Investigating changes in groundwater levels

To investigate, the probable trend presented in data acquired from GRACE, GRACE-FO, and observational wells the Mann–Kendall statistical test (as a non-parametric test) was conducted. Additionally, the Sen’s slope statistical test was employed to investigate the presence of any trend over the period of 2002–2020.

Mann–Kendall test

The Mann–Kendall test, originally introduced by Mann (1945) relies on two assumptions: zero and one. The zero assumption denotes randomness in the time series without any discernible trend, while the one assumption signifies the presence of a trend in the time series (Panda et al. 2007).

With Mann–Kendall test, the index Zs is calculated. Positive and negative signs for \({Z}_{s}\) implies the presence of incremental and decreasing trends in the studied time series, respectively. A value of zero for \({Z}_{s}\) implies the absence of a trend (Vinushree et al. 2022). The zero assumption is rejected and the one assumption is accepted if \(|{Z}_{s}|>{Z}_{1-\frac{\alpha }{2}}\) (Frimpong et al. 2022). The value of \({Z}_{1-\frac{\alpha }{2}}\) can be determined from the table of standard normal distribution. Significance levels of 5% (\(\alpha =0.05\)) and 20% (\(\alpha =0.2\)) were considered in this research for the data acquired from the satellites and the observational wells, respectively. If \(|{Z}_{s}|>1.96\) for a significance level of 5%, or if \(|{Z}_{s}|>1.29\) for a significance level of 20%, the time series would exhibit a significant trend, leading to the rejection of the zero assumption (Panda et al. 2007).

Sen’s slope test

Inspired by statistical method presented by Theil (1950), Sen (1968) devised a nonparametric approach for investigating temporal changes in a time series. While the Mann–Kendall test indicates the existence or absence of a trend in time series, the Sen's slope test is employed to quantify the trend (Frimpong et al. 2022; Vinushree et al. 2022; Yusuf et al. 2018). Sen’s slope was used in trend investigation of different time series data, such as temperature (Frimpong et al. 2022), water vapor (Makama and Lim 2020), evapotranspiration (Pourmansouri and Rahimzadegan 2020), precipitation (Nosratpour et al. 2022), and GWL changes (Vinushree et al. 2022).

The positive or negative sign for calculated median parameter in this test indicates an increasing or decreasing trend, respectively, while a zero value implies the no trend in the time series (Yusuf et al. 2018).

Numerical modeling of groundwater using GMS 10.4

Numerical modeling was exclusively carried out for Azarshahr aquifer located in East Azerbaijan province. The data related to discharge wells, observation wells, and qanats located at the aquifer were used as input data in GMS interface. The modeling process encompassed steady-state simulations for September 2018 and transient simulations for the time range of October 2018—September 2019. Validation was subsequently conducted for the time range of October 2019—March 2021. The following sections elucidate the procedures of building the steady-state and transient models, along with their calibrations and validations.

For the steady state model, the hydrogeological data including hydraulic conductivity, recharge rate, as well as boundary conditions were incorporated into the model. The hydraulic conductivity and recharge rate parameters were calibrated both manually and automatically using PEST package of MODFLOW (Deng and Bailey 2020; Tahershamsi et al. 2018).

Regarding the transient model, the previously calibrated steady-state model was employed (Behera et al. 2022). Since, Azarshahr aquifer is unconfined specific yield was used too in the model accompanied with data from observation and discharge wells. The transient model was recalibrated to optimize the storage coefficient and recharge rate. Subsequently, the model's accuracy was validated by extending the time period to include October 2019 to March 2021. The objective was to determine the model's capability to forecast the aquifer situation in the future.

Comparing GRACE-FO data and groundwater modeling results

After calibrating and validating the transient model, the water levels obtained from the model were spatially interpolated using interpolation methods such as Kriging (Rostami et al. 2020), inverse distance weighting (IDW) (Balakrishnan et al. 2011), spline (Balakrishnan et al. 2011), and thiessen (Ghosh et al. 2020). The changes in interpolated water levels within each pixel were averaged and subsequently compared to the changes in water levels obtained from GRACE-FO using R and performing parametric statistical T-tests.

Results and analysis

Comparison of GRACE and GRACE-FO data

The primary objective of this comparison of the accuracy of the two used satellite GWL estimations. Moreover, the R values, as well as significant findings from the comparisons are presented in this section.

Evaluating GRACE and GRACE-FO data without applying the time lag

Initially, the R values between the liquid water equivalent thickness (LWET) data obtained from satellites and the data measured at observation wells were estimated and recorded as “GRACE LWET” and “GRACE-FO LWET” columns in Table 3. Subsequently, the satellite data were modified by subtracting the output values from GLDAS project, and the R values between the modified GRACE and GRACE-FO data and the data from observation wells for the entire soil thickness were estimated and shown as the “LWET-CLM” columns in Table 3. Furthermore, the R values between the data obtained from observation wells and the modified satellite data solely for the middle soil layer were shown as the “MLLWET-CLM” columns in Table 3.

Table 3 R values between GWL changes measured at the observation wells and LWET estimated by GRACE and GRACE-FO in the studied provinces before applying the time lag

Analysis of Table 3 indicates that despite the abovementioned modification the R values from all four data centers are low for all provinces, except for Fars province most probably because of inaccurate data. In line with the methodology explained earlier, it is essential to apply a 1–2 month time lag in the observational data relative to the satellite data.

Assessment of GRACE and GRACE-FO data with applying time lag

The graph of the measured data at the observational wells and the data obtained from GRACE satellite with and without applying a time lag of 1–2 months is shown in Fig. 5, which confirms the positive effect of applying the time lag.

Fig. 5
figure 5

The graphical presentation of the comparison between data acquired from observation wells and data from the GRACE satellite, with and without the application of a time lag of 1–2 months for the studied provinces, including a Fars, b Khorasan Razavi, c Sistan and Baluchestan, d East Azerbaijan, and e Golestan

Based on the findings from Fig. 5, to improve the R values the time lags of one and two months were applied to observational data compared with GRACE-FO and GRACE data, respectively (Table 4).

Table 4 R values between GWL changes from observations and from estimations by GRACE and GRACE-FO in the studied provinces, after the application of the time lag

The observed positive impact of implementing the time lag indicates its significance in enhancing R values. Consequently, in order to optimize the R values, a time lag of one and two months was respectively applied to the observational data in relation to the data from GRACE-FO and GRACE satellites. The outcomes of this adjustment are shown in Table 4. Figure 5 shows the observations and GRACE satellite data time series for various provinces.

Analysis of R values with applying time lag

The comparison of Table 3 and 4 shows that the R values of data estimated by GRACE and GRACE-FO satellites against data acquired from observation wells were significantly improved in all provinces after the application of time lags to the observational data. Comparing Table 4 and 5, the highest correlation coefficient was resulted when a two-month time lag was applied to the observations in comparison with GRACE data, and a one-month time lag was applied to the observations in comparison with GRACE-FO data. Notably, the GFZ data center and the GLDAS-CLM hydrological model demonstrated the highest R values for both satellites in the middle soil layer. Although in some provinces such as Fars (Table 3) good R values were resulted between observations and GRACE-FO satellite data even without applying a time lag, in other provinces like Khorasan Razavi and Golestan (Table 3 and 4) a two-month time lag was necessary to achieve good R values. Overall, relatively satisfactory results were obtained in all five studied provinces after applying the time lags. Generally, the accuracy of the results is indicated by almost similar results obtained from all of the examined data centers. The difference in time lags applied to observational data in comparison with GRACE and GRACE-FO data can be attributed to the technical difference that existed in the tools used to measure distance by the two satellites. As mentioned before, GRACE uses microwaves, while GRACE-FO utilizes lasers with a higher accuracy.

Table 5 The R values between the averaged groundwater model results interpolated by four interpolation methods, and the GRACE-FO data from the four data centers

Analysis of groundwater modeling results

Calibration of steady state model

The outcomes of the steady state modeling of Azarshahr aquifer are shown in Fig. 6. Before calibration, the correlation line in Fig. 6a shows a lack of consistency between groundwater levels obtained from observations and the model. However, after calibration, the two datasets exhibited improved consistency. The calibration process involved fine-tuning of several essential parameters including hydraulic conductivity coefficient, rainfall-induced infiltration rate, inlet and outlet boundaries, transmissivity at the boundaries, river condition, infiltration rate from the river, and bedrock depth Fig. 7.

Fig. 6
figure 6

The comparison of observed GWLs in Azarshahr aquifer with steady state model results a before calibration, and b after calibration. c and d present the errors in piezometers before and after calibration

Fig. 7
figure 7

Maps in steady state for Azarshahr aquifer; a flow direction, b 3D model, and c boundary conditions

Manual and automatic calibration using PEST package of MODFLOW were employed to optimize these parameters, leading to the results shown in Fig. 6b. Additionally, by setting a threshold limit of 1 m during the initial run of the model there were close agreement between observations and modeling results in some piezometers as shown in Fig. 6c. However, overall, the correlation between observations and modeling results was not satisfactory, with significant discrepancies observed in some piezometers. The correlation coefficient between observations and modeling results reached its highest value after calibration as shown in Fig. 6d.

Calibrating the transient model

Following the steady state model calibration, the transient model was subjected to automatic calibration using the PEST package. In this process, parameters such as specific yield, outlet and inlet boundaries of the studied aquifer along with their corresponding groundwater levels, precipitation rate, river discharge, and recharge rate were fine-tuned. The calibration was performed for the time period of October 2018-September 2019. The results of the calibration are shown in Fig. 8. As the figure shows, the calibration led to a reduction in errors, leads to a notable improvement in the correlation coefficient between the observed and modeled groundwater levels during the transient state.

Fig. 8
figure 8

Comparison of Groundwater Levels in Transient state

Figure 8 shows the groundwater levels obtained from observations compared to the model during three different periods of 1, 6, and 12 months after calibration. The graph demonstrates the model’s ability to closely replicate the observed groundwater levels over the examined periods Fig. 9.

Fig. 9
figure 9

Azarshahr aquifer maps resulted from transient model for a HK before calibration b HK after calibration and c SY after calibration

Validation of groundwater modeling results

After the calibration, the transient model underwent a rigorous validation process using data for the time range of October 2019 to March 2021. Groundwater levels from observation wells and discharge rates of pumping wells were integrated into the model. The validation results, shown in Fig. 10, showcase a remarkable agreement between the observations and the modeling results, with significantly low errors and high R values. As a result, the model was deemed reliable for future forecasting purposes.

Fig. 10
figure 10

Observed groundwater levels compared to modeling results during Validation Stage

Figure 10 displays a comprehensive comparison of the observed groundwater levels against the model results during the validation stage. The graph presents three distinct periods (a) 1 month, (b) 9 months, and (c) 18 months. The model’s good performance is evident as it exhibits a good agreement with the observed groundwater levels.

Comparison of satellite data with modeling results

The monthly groundwater levels obtained from the transient modeling were averaged using four interpolation methods of IDW, spline, kriging, and thiessen, within GIS. Notably, the results yielded by these interpolation methods closely corresponded to each other. Subsequently, the monthly averages were juxtaposed against GRACE-FO data, initially without applying a time lag (as shown in Table 5) and satisfactory R values were observed. However, higher R values were achieved by applying a one-month time lag to observational data. The improved accuracy validated further the model with closer alignment with the satellite data acquired from the four data centers.

Analyzing the results of the parametric statistical T-test

To assess the significance of the correlation between the observations at groundwater wells and the data obtained from GRACE and GRACE-FO, the independent parametric statistical T-test was conducted. The results both with and without the application of the time lag, are shown in Table 6. The table includes two main columns: “CSR,” considering the total thickness of the soil, and “Middle CSR,” considering the thickness of only the middle soil layer. These test results serve to further validate the reliability and statistical significance of the correlations observed in the studied area.

Table 6 Significance check of the correlation of observations and GRACE estimations in the studied area

With the application of the parametric statistical T-test at an initial significance level of 0.05, Table 6 reveals significant correlations of the observations from groundwater wells and the GRACE satellite data in all provinces except East Azarbaijan. This discrepancy could be attributed to the lower quality of data in that province. Nevertheless, by increasing the significance level from 0.05 to 0.1, a significant trend was established in East Azarbaijan as well.

Similarly, Table 7 shows the acquired results of the significance of the correlation between the observations at groundwater wells and the data obtained from GRACE-FO satellite, with and without applying time lags. According to the findings, a significant relationship exists between the observations and GRACE-FO satellite data in all provinces except East Azarbaijan and Fars, which could be due to comparatively lower quality data in those regions. To address this issue, a higher level of significance (e.g., 0.1 or higher) was employed in cases where it was necessary (Okoroiwu and Akwiwu 2019). By increasing the significance level from 0.05 to 0.1, the P-Value fell below the threshold for significance, establishing a significant relationship between the observations and GRACE-FO data in East Azarbaijan and Fars provinces as well. The significance level should be selected using engineering judgment based on the specific circumstances of each case.

Table 7 The significance levels between the observations and GRACE-FO data in the studied areas

In the subsequent analysis, the results of the independent parametric statistical T-test were utilized to examine the meaningful and significant trend between the groundwater modeling results and GRACE-FO data. As Table 8 shows the significance level between the spatially averaged changes in water level derived from groundwater modeling and the GRACE-FO satellite data is less than 0.05. Therefore, a significant relationship exists between groundwater modeling results and GRACE-FO data.

Table 8 The significance levels between the groundwater modeling results and GRACE-FO data in the studied areas

The results of Mann–Kendall and Sen’s slope statistical tests

Table 9 shows that in all provinces, for both GRACE and GRACE-FO data, the absolute value of Zs is higher than 1.96, and the Sen’s slope is negative. Similarly, for data obtained from observation wells, the absolute value of Zs is greater than 1.29, and the Sen’s slope is negative in all provinces except East Azarbaijan and Golestan. These findings indicate the presence of a significant decreasing trend in the time series data. Therefore, the overall trend of GWL change in all provinces is negative, showing a declining trend, which is in line with the results acquired by Iranian Water Resources Management Organization (IWRMO) (2018).

Table 9 The results of trend analysis of the changes in groundwater level measured at groundwater wells and obtained from GRACE and GRACE-FO, using Mann–Kendall and Sen’s slope tests

Additionally, Fig. 11 illustrates the trend of the deviation of groundwater level from satellite data. Thus, the analysis substantiates the establishment of a trend in the time series of the data. However, it is worth noting that East Azerbaijan and Golestan provinces exhibit no significant trend, which could be due to the lower quality of the data time series in those provinces as shown before.

Fig. 11
figure 11

The trends of changes in GWL from GRACE and GRACE-FO satellites over period of 2002–2019 in a Fars, b Khorasan Razavi, c Sistan and Baluchestan, d East Azarbaijan, and e Golestan provinces

Indeed, as the results shown in Table 9 show, the Sen's slope estimated based on the observations indicates a negative trend for all provinces. Consequently, Fig. 11 supports this finding, further confirming that the trend of changes in groundwater level is consistently negative and GWL is decreasing in all of the studied provinces. The negative trend shown in Fig. 12 aligns with the results of the statistical tests and reinforces the conclusion that groundwater levels experience a declining pattern over time in the studied regions.

Fig. 12
figure 12

The trend of changes in groundwater level observed at the monitoring wells during the time period of 2002–2019 in the provinces of a Fars, b Khorasan Razavi, c Sistan and Baluchestan, d East Azarbaijan, and e Golestan

Of particular interest is the steeper downward trend of groundwater levels in East Azarbaijan and Golestan provinces compared to the other provinces. This might be due to the higher agricultural water demand and consequently increased groundwater consumption in those provinces.

These findings indicate that there is a significant declining trend with groundwater levels in the studied areas, which requires more attention and appropriate management strategies to sustainably address the groundwater resources' consumption. The optimized management of groundwater usage for agricultural purposes needs the application of revised groundwater extraction regulations as a vital factor to mitigate the declining trend of groundwater level and to guarantee the long-term sustainability of groundwater resources in those provinces.

Notably, the provinces of Fars and Khorasan Razavi exhibit more severe declines in groundwater level. This could be attributed to the specific regional climate, low precipitation, and high groundwater demand to meet agricultural needs.

Discussions

Results of this study confirmed that GRACE and GRACE-FO can effectively monitor the GWL changes on a local scale. However, the time period of GRACE and GRACE-FO estimations is relatively short, it may import some uncertainties in their estimation too. On the other hand, the observational data are obtained only at the points of groundwater wells, there are some uncertainties in observational data compared to satellite estimations with spatial resolution of 1.

The results of such studies on GWL changes using different data sources, especially using satellite estimations, are useful in sustainable development goals (SDGs) (Hu et al. 2019; Li et al. 2020; Saqr et al. 2021; Shamsudduha et al. 2020), in case of decent work and economic growth, sustainable cities and communities, and respectable consumption and production (Allen et al. 2018; Hák et al. 2016).

This study proved the performance of GRACE data and the existence of time lag between them and the observational data is in line with previous studies (Chanu et al. 2020; Fatolazadeh and Goïta 2021; Khorrami and Gunduz 2021; Rahaman et al. 2019; Rahimzadegan and Entezari 2019). Moreover, the better performance of the GRACE-FO compared to GRACE in estimating GWL changes is proved in a few recent studies but for large scales (Fatolazadeh and Goïta 2021; Frappart and Ramillien 2018), not for local scales. Furthermore, few studies attempted to compare the estimated GRACE and GRACE-FO GWL changes with hydrological models, which the result was promising (Fatolazadeh and Goïta 2021; Pfeffer et al. 2022).

Conclusion

This study aimed to evaluate the accuracy of GRACE and GRACE-FO satellites’ estimations of GWL changes on a local scale in five provinces in Iran. Observations at groundwater wells in those provinces along with the results from groundwater modeling in one province were utilized to do the assessment. The R values between the observational data and GRACE data were calculated as 0.53, 0.42, 0.4, 0.51, and 0.36 for the provinces of Fars, Khorasan Razavi, Sistan and Baluchistan, East Azerbaijan, and Golestan, respectively. Similarly, the R values between the observational data and GRACE-FO data were calculated as 0.95, 0.67, 0.72, 0.78, and 0.3 for the same provinces, which indicated a higher reliability of GRACE-FO data. This was likely due to the utilization of a laser tool for distance measurement in GRACE-FO compared to a microwave-based distance measurement tool in GRACE. The statistical tests demonstrated significant relationships between estimation of GWL changes by GRACE and GRACE-FO on one hand, and observational data on the other hand at a significance level of 5%, in three provinces of Khorasan Razavi, Golestan, and Sistan and Baluchistan and at a significance level of 10% in two provinces of East Azarbaijan and Fars. On the other hand, Mann–Kendall test revealed significant trends in all provinces except in East Azarbaijan and Golestan. The discrepancies in the abovementioned provinces may be attributed to data quality issues. The groundwater modeling results for the Azarshahr aquifer in East Azarbaijan province were compared with GRACE-FO data and exhibited a reliable and accurate correlation confirmed by the correlation coefficient and the results of parametric statistical T-test. Overall, the results of the study indicated that in regions with lacking sufficient observational data, GRACE and GRACE-FO data can be effectively utilized for monitoring GWL changes. Notably, all five studied provinces experienced a substantial decline in groundwater levels, especially in the northern and northwest provinces primarily due to extensive agricultural activities which heavily rely on using groundwater for irrigation. Given, the significance of groundwater resources in arid and semi-arid areas such as Iran, an elaborated management plan is imperative to optimize the utilization of these valuable resources and to mitigate further decline in groundwater levels.