Assessment of changes in regional groundwater levels through spatio-temporal kriging: application to the southern Basin of Mexico aquifer system

A common approach for calculating the spatial distribution of groundwater level changes consists in choosing a set of different times, interpolating the groundwater level data available at each time over a spatial grid, and then calculating changes in each period by subtracting the interpolated values for these times. However, this can produce misleading results when the data are available in different positions for consecutive times. This paper presents an alternative procedure based on the interpolation of the groundwater level with spatio-temporal kriging, the assessment of the temporal groundwater elevation changes over a regional semiconfined aquifer, and the estimation of their error standard deviations. A comparative analysis of cross-validation results and error standard deviations provides a quantitative measure of the superiority of the introduced approach with respect to the one given in the literature. Moreover, the spatio-temporal case produces more reasonable estimates than the spatial kriging, notably fewer extreme recoveries and drawdowns, in an area under high water stress, such as the upper aquifer of the southern part of the Basin of Mexico aquifer system.


Introduction
Analyzing groundwater level changes at the regional scale constitutes an essential tool for evaluating the response of aquifers to climatic variation and management policies.Its correct interpretation is critical for hydrogeologists since it provides information about groundwater storage changes for a specific period and can also trigger warning alarms regarding undesirable water level decline rates.Thus, it represents a valuable asset to better understanding groundwater dynamics.The reliability of the spatial distribution of groundwater level changes for a given period depends on the quantity and quality of the available data at the initial and final times of the period.Measurement errors, differences between the number and position of monitored piezometers or wells at the initial and the final monitoring dates, and the estimation method used to calculate the variations are among the largest sources of estimate uncertainty.
Evaluation of the spatial distribution of temporal changes in groundwater levels can be done by using depth to groundwater (DG) or groundwater elevation (GE) estimates.Among the different existing approaches to estimate DG and GE, the geostatistical one has often been used with good results.
Table 1 summarizes the papers that have focused on estimating DG, GE, and temporal changes in groundwater levels using geostatistics, including this one.The papers are reported according to three categories, depending on their approach: spatial and temporal separately, multivariate, and spatio-temporal (ST).They are ordered chronologically within each category.For the first kind of analysis, groundwater levels at different times are considered independent spatial random fields, and groundwater levels at different positions are treated as independent temporal series.In the multivariate analysis groundwater levels at each time are figured as different random fields, temporally correlated.Finally, the ST analysis considers groundwater levels at all positions and times as a spatially and temporally correlated random field.Only the last two kinds of analysis account for the ST correlations of groundwater levels.
Most geostatistical analyses of groundwater changes over time were applied to unconfined porous aquifers and, just recently, to semiconfined aquifers, karstic aquifers, and aquifer systems or several aquifers laterally connected.
Concerning spatial scales, there are three ranges: local (10 km 2 ), medium ( 10 2 km 2 ), and regional ( 10 3 km 2 ).It is important to note that it has been only recently that ST analyses on regional scales have been published.The density of positions per kilometer for the spatial analyses or the average spatial density for the ST analyses is reported in Table 1 as a measure of the scarcity of spatial data.
Two kriging methods were applied for the ST approach, ordinary and residual ordinary kriging.Concerning the ST structure assumed for the covariances, three kinds have been used in the reported works, product-sum, sum-metric, and Spartan.Also, a non-Euclidean distance was used to represent better the effects of contrasting hydraulic conductivities on hydraulic levels of aquifers connected laterally.
The only works that used a ST approach and reported groundwater level temporal changes are Ruybal et al. (2019aRuybal et al. ( , 2019b)).The temporal changes for the reported periods were calculated by subtracting the groundwater elevation (GE) estimates, using ST kriging, for the initial and final dates of the period.However, they did not report standard deviation (SD) or variances for groundwater level change errors.On the other hand, of those using spatial kriging to estimate groundwater level changes, only Ahmadi and Sedghamiz (2007) included the spatial distribution of SD.For each well, DG in the final year was subtracted from that in the initial year, and afterward, interpolation was done using ordinary kriging.
Differently from the other contributions, this paper assesses the spatial distribution of the temporal changes in groundwater levels through spatial and ST kriging for the upper aquifer of the Southern part of the Basin of Mexico Aquifer System (SBMAS) and their estimate error standard deviations.It is worth noting that the standard deviations of the ST kriging estimate differences are presented for the first time.This work also contributes to establish a systematic procedure for ST geostatistical analysis by using the non-separability index for the choice of the appropriate class of the ST variogram model that fits the surface of the empirical variogram (De Iaco and Posa 2013;Cappello et al. 2018).Concerning the characteristics of the application, this is one of the few works that apply ST geostatistics to a semiconfined aquifer under high water stress on a regional scale.
The structure of this paper is as follows."Materials and methods" section presents the background for the spatial and the ST geostatistical-based approaches used in this research to estimate the GE level and the spatial distribution of groundwater-level temporal changes."Main characteristics of the study area" section describes the main demographic, climatic, physical, and hydrogeological features of the region where the analysis is done and the conceptual model of the groundwater flow dynamics."Geostatistical modeling" section presents spatial and ST geostatistical analyses for the GE data and compares the GE's spatial and ST kriging estimates and their changes between two specified years.Finally, the last two sections provide some remarks on the obtained results compared with previous studies and some conclusions regarding open issues.

Materials and methods
The spatial distribution of the temporal changes in GE for a given period is calculated as the difference in the values of the GE spatial interpolations for the period's initial and final dates.One of the most common alternatives for interpolating GE for a given date is using spatial geostatistical methods such as ordinary kriging.However, when data at different sample locations for the initial and final dates are available and used as input for interpolations, inconsistencies in the spatial distribution of temporal changes can be presented since kriging produces smoothed estimates depending on data availability.
In the present paper, the spatial distribution of temporal changes in GE for the upper aquifer of the SBMAS is calculated from 2002 to 2007 using spatial and ST geostatistical techniques.Subsequently, the GE differences are statistically compared.
For the spatial approach, the GE map of each year is estimated using data from the same year only.GE is assumed to be a finite realization of a real-valued spatial random field {Z(s)∶ s ∈ D} where s is a spatial location and D ⊆ ℝ d ( d ≥ 2 ) is the spatial domain.The random field is assumed to be second-order stationary, with an expected value constant over the domain, i.e., E[Z(s)] = m, ∀s ∈ D , and a covariance which is only dependent on the separation vector h , i.e., Under second order stationarity, this relationship between covariogram and variogram holds: Estimation problems can be addressed by resorting to the kriging method, such as ordinary kriging, which is used when the expected value is assumed constant and unknown (Chilès and Delfiner 2012).Applying kriging for estimation purposes implies knowledge of a spatial correlation model.Thus, it is necessary to estimate and model a measure of spatial correlation, like the variogram, based on the sample data.
It is important to recall that the variogram model selected to fit the empirical variogram must be conditionally negative definite; for this reason, well-known functions, i.e., spherical, exponential, Gaussian, and hole effect, which satisfy this condition, are widely used (Journel and Huijbregts 1978;Cressie 1993;Christakos 1984).
For the ST approach, data from the whole dataset are used to obtain the estimates for the initial and final years of the period fixed for computing the variation.In a ST context, the observations of the GE are assumed to be a finite realization of a real-valued ST random field, which is denoted with {Z(s, t)∶ (s, t) ∈ D × T} , where (s, t) is a ST location, D ⊆ ℝ d ( d ≥ 2 ) is the spatial domain, and T ⊆ ℝ is the temporal domain.Under second-order sta- tionarity, the random function Z is characterized by a constant expected value, i.e., E[Z(s, t)] = m, ∀(s, t) ∈ D × T , and a covariance or a variogram function which depends on the ST lag (h s , h t ) , i.e., C st (s, t; s � , t � ) = C st (h s , h t ) , where h s = (s − s � ) and h t = (t − t � ) .Similarly to the spatial domain, the following relationship between covariogram and variogram holds: In order to face estimation problems in space-time, the ordinary kriging estimator Ẑ(s, t) can be used (Chilès and Delfiner 2012).
As in the spatial case, the ST kriging system requires knowledge of the variogram model.For this aim, structural analysis for variogram estimation and modeling has to be conducted.To model the sample ST variogram, the literature offers a wide list of classes of variogram functions to choose from.In particular, the product class, where space and time are treated separately (Rodriguez-Iturbe and Mejía 1974;Posa 1993) has represented the base to generate other parametric families of ST variogram or covariance functions (De Iaco et al. 2001;Ma 2002Ma , 2003)).Otherwise, various classes of non-separable ST models were also developed by Cressie and Huang (1999), Gneiting (2002), De Iaco et al. (2002), among others.
In this context, the non-separability index, firstly introduced by Rodriguez and Diggle (2010) and then generalized by De Iaco and Posa (2013), was proposed in the literature in order to choose the appropriate class of model for describing the empirical ST correlation structure.After the selection of the appropriate class of ST variogram or covariance model and the estimation of the corresponding parameters, the specific model can be used for prediction purposes.
In the following the main notions concerning the non-separability index and the product-sum model will be recalled.The focus on the product-sum model is justified since its type of non-separability is consistent with the one shown by the empirical ST correlation structure of the GE data under study.

The non-separability index and its interpretation
In the literature, various possible types of non-separability for ST covariance functions have been provided.In De Iaco and Posa (2013) the definition of non-separability was adequately detailed and furtherly tested in Cappello et al. (2018).By recalling the definition in De Iaco and Posa (2013), the non-separability index for a ST stationary covariance function C st (h s , h t ; ) depending on a vector of parameters , is expressed as follows: where st (h s , h t ; ) is the spatial-temporal correlation function, satisfying the conditions  st (h s , h t ; ) > 0,  st (h s , 0; ) > 0 and  st (0, h t ; ) > 0.
According to this definition, a ST stationary covariance function C st is: By changing the direction of the above inequalities, uniform negative non-separability or poitwise negative nonseparability are defined.
In variogram form, the empirical non-separability index is computed as reported below: where Ĉst (0, 0) is the estimated variance, ̂ st (h s , h t ) is the empirical ST variogram, ̂ st (h s , 0) and ̂ st (0, h t ) are the purely sample spatial and temporal variograms, respectively.
De Iaco and Posa (2013) proposed a classification of some well known classes of covariance models, on the basis of the above mentioned definition.Among them, the product-sum covariance model, selected in the case study described hereafter, belongs to the class of models with uniformly negative non-separability.
From a computational point of view, box plots of empirical non-separability ratio, computed for both spatial and temporal lags, can help to efficiently identify the types of non-separability.The R package covatest (Cappello et al. 2020) is used to assess the non-separability, characterizing the ST correlation structure of the data under study.

The product-sum ST model
The product-sum model (De Cesare et al. 2001;De Iaco et al. 2001) can be formalized in terms of the covariance function as follows: with k 1 > 0, k 2 ≥ 0, k 3 ≥ 0 and where C t and C s are valid temporal and spatial covariance models, respectively.
As widely detailed in De Iaco et al. (2001), the Eq. ( 3) can be written in variogram form: where s and t represent valid spatial and temporal variogram models, C s (0) and C t (0) correspond to the sill values.
By applying the variogram property according to which the value of the variogram at the origin is zero, it implies that: and where k s and k t satisfy the following equations: (2) The coefficients k 1 , k 2 , and k 3 can be obtained as reported below: where C st (0, 0) is called global sill.
By recalling the Eqs.( 5) and ( 6), the Eq. ( 4) can be expressed as follows: where Iaco et al. (2001), this simplified form of st (h s , h t ) includes only k as the parameter, which depends on the global sill value C st (0, 0) .The Eq. ( 12) is admissible if and only if the following condition for k is satisfied: Further details are available in De Iaco et al. (2001).

Drift
When the expected value of a ST random function Z depends on the space location, on time or both, the second-order stationarity hypothesis is not satisfied for Z.In that case, the random function can be expressed as: where m(s, t) represents a deterministic function (known as drift) and R(s, t) , known as the residual, is supposed to be a zero mean second-order stationary ST random function modeling the space-time fluctuations around m(s, t) (Kyri- akidis and Journel 1999).To represent the function m(s, t) , a trend surface (typically polynomial functions) can be obtained by a least-squares fit to the data.In this way, zero mean stationary data (residuals) are obtained and used to calculate the variogram.

Main characteristics of the study area
Mexico City is settled in the southern part of the Basin of Mexico (BM) (Fig. 1), which is surrounded by volcanic sierras with altitudes higher than 3,500 meters above sea level (masl, from now on, m) and with alluvial fans, flood plains composed by sand, silt, clay and volcanic materials deposited in the central part of the valley (Arce et al. 2019).The BM is bounded to the south by the Sierra Ajusco-Chichinautzin, at the east by the Sierra Nevada, at the west by the Sierra de las Cruces and north by the Pachuca Sierra.It is considered an endorheic basin with no natural outflows; the natural drainage initially converged to the inferior part of the basin, giving origin to a system of 5 lakes with superficial and groundwater inputs (Herrera and Dumars 1995).Most of the groundwater recharge happens in the surrounding ranges, producing surface and subsurface flows towards the center of the basin, which originally were discharged through springs in the mountain areas and through the bottom of the lakes, mixing with water coming from surrounding rivers (Carrillo-Rivera et al. 2008;Bücker et al. 2017).
During the Spanish colonial period (1521-1821), the basin underwent artificial modifications as part of human-induced changes in the hydrogeological system.These alterations included the construction of catchment and drying channels.Additionally, to control flooding and facilitate the urban expansion of the city, the five lakes originally present in the area until the mid-19th century were subsequently dried.In more recent times, a deep sewage system was developed, redirecting rainwater and wastewater from the basin to a neighboring basin located to the north in the state of Hidalgo (Herrera and Dumars 1995).
The metropolitan area of Mexico City extends over a large portion of the study area (3,985 km 2 ), and according to the 2020 population and housing census (INEGI 2022), more than 21 million inhabitants live there, comprising 17.3% of the country's total population.This anthropological change has influenced the local hydrology through the intensive demand for groundwater for urban and industrial supply, and through urbanization of the basin and modification of local weather patterns.
According to Ávila-Carrasco et al. ( 2023), on average, the cumulative annual rainfall is of the order of 700 mm, which is distributed irregularly.The highest precipitation (up to 1400 mm/year) occurs in the mountains and ranges, where forest and natural areas are still conserved, while the lowest precipitation (400 mm/year) occurs in the lower and the plain regions at the center of the study area, where most of the urbanization and impermeable areas are located.However, there is still a recharge input in the plain area, related to urban parks, channels, agricultural areas, flooding areas, and leakage from the water supply, and sewer systems.Regarding the changes in temperature, the highest values occur from April to May, and the lowest from November to February, with an average value of 21 °C, conditioning related parameters such as evapotranspiration, the available soil moisture, and the infiltration process.Arce et al. (2019) cited several studies that have established the geology and stratigraphy of the BM during the past half century (Vázquez-Sánchez and Jaimes-Palomera, 1989), as well as recent studies defining the evolution of stratigraphic relations of the volcanic ranges around the basin.An initial hydrostratigraphic division was proposed by Mooser and Molina (1993) and subsequently modified by Ortega Guerrero et al. (1997), based on hydraulic properties of strata, conceptualizing the basin from the upper part to the bottom as: (1) low-permeability upper unit (LPUU, the upper aquitard), formed by Quaternary lacustrine clays, initially acting as a semi-confining unit for the permeable upper unit, with variable depth (5 to 130 m) and spatially placed where natural lakes were present, vertically connected to the lower unit and nowadays hydraulically disconnected in some areas due to decrease in piezometric level in the lower formation (Rudolph et al. 1991), having two recharge inputs (artificial recharge coming from sewer and supply systems, urban parks, channel, flooding areas, and natural recharge coming mainly from rain); (2) permeable upper unit (PUU, the upper aquifer in exploitation), formed by Quaternary-age alluvial and volcanic materials, as well as andesitic and dacitic Pliocene-age material with secondary porosity associated with fractures, and with 600 m depth on average having recharge inputs from leakage in the LPPU, natural recharge coming from rain inputs where the LPPU is not spatially present in the area and natural recharge from the surrounding mountains and ranges; (3) lowpermeability lower unit (LPLU, the lower aquitard), formed by Oligocene and Eocene age igneous rocks and sandstones, shales and limestones of the Late Cretaceous age, with very low permeability and porosity (Mooser et al. 1996), considered as a hydraulic lower limit for the PUU; (4) permeable lower unit (PLU, the lower aquifer), formed by Early Cretaceous age rocks (limestones), with high permeability caused by intense fracturing and the presence of dissolution channels, considered as a potential aquifer with limited quality for urban consumption.A more detailed description of the hydrostratigraphic units is presented in Herrera-Zamarrón et al. (2020).This study's efforts are centered on analyzing the temporal changes in GE in the PUU of the southern part of the BM aquifer system.Its limits are shown in Fig. 1.The National Commission of Water (Comisión Nacional del Agua, CONAGUA), which is the national entity in charge of water management, has divided the surface that comprises the LPUU and PUU hydrostratigraphic units (for administrative and legal purposes) into the Metropolitan Mexico City (MMC) Aquifer (2,104 km 2 ), the Texcoco Aquifer (934 km 2 ) and the Chalco-Amecameca Aquifer (947 km 2 ), corresponding to 41.5 percent of the total area of the BM (CONAGUA 2018).

Geostatistical modeling
This section will present a detailed description concerning the structural analyses of the GE data, the GE kriging estimates, and the GE changes between two specified years, for the spatial and ST cases.

Data description and exploratory analysis
The available GE data include annual observations for 21 years (from 1997 to 2017), distributed in the three administrative aquifers.They have been used to study, for the sake of simplicity, the temporal GE changes between the years 2002 and 2007; so that in the following, the complete dataset has been used for the ST analysis, while the spatial analysis has been conducted using GE data from the specified years.
Figure 2 shows the well positions in the dataset with special symbols in order to distinguish wells with data for 2002 and wells with data for 2007.The positions shown with black dots indicate wells with at least one data for the complete set of years.All the monitoring wells are production wells, which are irregularly distributed wells extracting water from the upper aquifer (PUU).Note that, although the measurements taken in active pumping wells could be heavily influenced by the exploitation strategy, a period of 24 hours (after the pump is turned off) must be considered for the GE measurements in the field according to the Mexican water agencies, so that the static groundwater level is reached.So, it is assumed that the GE measurements are static hydraulic heads.
In order to evaluate groundwater level differences in the valley area (where most of the wells are located), the raster layers are clipped by the topographic elevation of 2,300 m, shown with a blue dash line in Fig. 2, where the majority of the dataset is present.
It is worth pointing out that the average density of wells is larger compared with the regional studies reported in Table 1.However, while some areas and years lack information (for example, a broad area between the MMC and Texcoco aquifers), others are very dense.In some regions, 2002 and 2007 data can be complemented by data in other positions and years.For example, in the Chalco aquifer, several wells with data for 2007 lack data for 2002.Also, a portion of the MMC aquifer has no data for 2002 and 2007, but a significant amount of data for other years.So, depending on data availability, different spatial positions and GE values are used as data input for the spatial and ST analysis.The data from isolated wells outside the cropped area were removed.
The spatial and ST trends are modeled by using the second-degree polynomials (least-squares fitting), as shown in Table 2.Note that the independent variables x and y are the UTM coordinates in the west-east and north-south axes, respectively, and t denotes the time in years.By analysing the basic statistics in Tables 3 and 4 a more symmetric distribution can be detected for the GE residuals.After removing the trend from the data, the residuals for both, the spatial and the ST case, are used in further steps.
The ST database comprises 3,699 measurements over time at 474 spatial locations, with an average of 176 locations per year.Concerning the data per well, eight have the longest time series (21 annual data corresponding to the analysis period), and the average data per well is 8.
Figure 3 shows the box plots of the GE residuals for each year, which are very similar in the period 1997-2010, then the dispersion of the data tends to increase from 2011 to 2017.

Structural analyses
In this step, the spatial and spatio-temporal variograms have been estimated and modeled.First of all, the omnidirectional spatial variograms of the residuals of GE for 2002 and 2007 have been estimated and the exponential models have been fitted to each of them, as shown in Fig. 4. The parameters of the selected spatial models are given in Table 5.It is worth  pointing out that no significant anisotropy has been detected from the sample spatial directional variograms.
By focusing on the ST case, the sample ST variogram of the residuals is obtained using 7 spatial and 14 temporal lags.The empirical surface of the ST variogram and the corresponding marginals, together with the marginal models, are illustrated in Fig. 5.In particular, the sample spatial and temporal variograms reported in Fig. 5b and c are drawn by using all the couples of data from the ST dataset.
The spatial lag size is 3 km, and each temporal lag has an extension of one year.Note that the spatial and temporal lags are fixed by considering the maximum distance among the sample locations and the length of the time series for each sample spatial point.The temporal marginal (variogram constructed using the time series) as well as the spatial marginal (variogram obtained using observations distributed in the spatial domain) are estimated and modeled before producing the ST variogram model.
In order to choose the appropriate class of ST variogram model, the sample non-separability ratios (2) are determined and classified by spatial and temporal lags (Fig. 6).From the box plots shown in Fig. 6 it is evident that the sample ratios are always less than one, which implies that a uniform negative non-separability is detected.Thus, the product-sum model is considered a suitable choice, since its theoretical type of nonseparability is consistent with this evidence (De Iaco and Posa Fig. 3 Box plots of the GE for the years 1997-2017, together with the number of spatial locations per year 2013).Indeed, the type of non-separability of the product-sum is always uniformly negative, so that its non-separability index reflects the behavior of the empirical one.
The parameters of the selected ST product-sum model are shown in Table 6 and are such that the model is strictly positive definite (De Iaco and Posa 2018).

Cross-validation
In the following, the leave-one-out cross-validation procedure has been carried out in order to check the adequacy of the fitted spatial and ST models.
In Fig. 7 the scatter diagrams of the GE residuals and the estimated ones are shown, as well as the mean error (ME), the mean absolute error (MAE), the root mean square error (RMSE), the normalized root mean square error (NRMSE) and the Spearman correlation coefficients.For the spatial cases (Fig. 7a and c), the highest estimation error (overestimation) has been registered on the southwestern border of the domain and has been due to the presence of high observed residuals in its spatial neighborhood.In general, the dispersion of the cloud around the bisector has been caused by the reduced number of close neighbors for most of the estimates.
On the other hand, when using the ST approach for crossvalidating the residuals for 2002 and 2007 (Fig. 7b and d)), the goodness of the estimates significantly increases.These statistics provide a quantitative measure of the obtained improvement, thanks to the high performance for the selected ST variogram model and the geometry of the sample points in ST.

Groundwater elevation estimates
In this section, spatial and ST kriging estimates are obtained on a grid with 1,000 m × 1,000 m cells size, covering a sur- face of 1,585 km 2 over the plain area (2,300 m), by adding the corresponding spatial or ST trend (Table 2) to the kriging residual estimates.
Figures 8 and 9 illustrate the spatial and ST kriging contour maps of the GEs and the distribution of the estimated values through their box plots and histograms for 2002 and 2007, respectively.
In the following, the corrections provided by using the ST approach with respect to the spatial one have been underlined.
In 2002 the GE minimum values are very similar (2,167 m and 2,172 m for the spatial and ST cases, respectively).However, there is a difference of 46 m between the maximum values (2,246 m and 2,292 m).Moreover, from the Fig. 8c and d, a moderately left-skewed distribution emerges for both the spatial and ST case, with a slightly wider left tail in the last one.
A similar behavior of the GE distributions can be found in 2007 (Fig. 9), with a difference of 11 m between the maximum values (2,259 m and 2,270 m) and a GE minimum value slightly smaller for the ST case than in the spatial one (2,163 m and 2,155 m).
Figure 8a and b for 2002 and Fig. 9a and b for 2007 show also that the highest GEs are close to the mountains in the periphery, as especially highlighted for the ST case.For 2002, the minimum values are found in the MMC aquifer, for both, the spatial and ST case.However, in the ST contour map, values lower than 2,180 m and more extended areas with values lower than 2,190 m are also found in the Texcoco aquifer.
For 2007, the minimum estimates are obtained in the Texcoco aquifer in both cases, although values lower than 2,180 m are more evident for the ST kriging, as shown in Fig. 9b.Finally, a groundwater divide between the MMC and the Texcoco aquifers (close to the boundary of Mexico City), already recognized in previous works (Durazo and Farvolden 1989), is more clearly defined in the ST interpolations.Also, the Chalco aquifer has a consistent zone with high values in all the contour maps, which is better defined in the ST maps.
In addition, Fig. 10 illustrates the contour maps for the spatial and ST estimation error SD of GEs concerning the year 2002 and the distribution of their values through the box plots and histograms.It is worth pointing out that, for both the spatial and ST maps, the largest values are detected in the boundary between the MMC and the Texcoco aquifers, an area with almost no wells (see Fig. 2).Naturally, the smallest values are in the zones with a higher density of available data.The SD values tend to be smaller for the ST case with respect to the spatial one, due to the new geometry of the ST samples points and the ST model used to describe both the correlation in space and time together with the ST interaction.Indeed, 75% of the values are larger than 16.6 m in the spatial case, on the other hand, 75% of them are smaller than 14.5 m in the ST case.Moreover, by analysing Fig. 10c and d, it is evident that the estimation error SD distribution in the ST case is more symmetric than the one in the spatial case.
Similarly, Fig. 11 compares the contour maps of the spatial and ST estimation error SD of GEs for the year 2007.The general behavior of the distribution of the estimation error SD is similar to that of 2002, although those of the spatial case change more than the ST one.Note that the number of wells with data for 2007 are significantly less than for 2002, thus as expected, the ST estimates are more robust to the lack of data for one specific year.
Remark When comparing the spatial and ST interpolations, the main remarks are that the latter show larger areas characterized by the same low groundwater level (2,180 m) in both years.A groundwater divide between the MMC and the Texcoco aquifers is more clearly defined in the ST contour maps, which may be related to the large thickness of the aquitard in that area.Also, a zone consistently presents high values in the Chalco aquifer and is better defined in the ST maps.Considering the cumulative effect of historical withdrawals on the aquifer system, the cross-validation results, the lower SD error estimates for extended portions of the aquifer, and the more consistent error estimate SD distribution, it can be reasonably assumed that the ST estimates reflect more consistently the behavior of the natural response of a heavily pumped aquifer.

Groundwater level changes
In this section, the temporal changes of the GE, and their corresponding estimation error SD are estimated and mapped over the valley area (2,300 m) covered by the aquifer using a geographical information system.Finally, the results of the spatial and ST kriging approaches are compared and evaluated.
The spatial and ST kriging GE difference (GE in 2002minus GE in 2007) over the aquifer and the distribution of their values through their box plots and histograms are shown in Fig. 12.While in the spatial case, the level differences are between -41.2 m and 33.7 m with a mean of 2.21 m and a median of 6.5 m, in the ST case, they are between -21.5 m and 55.1 m with a 4.3 m mean, very close to the median.As shown by the box plot (Fig. 12c), most of the estimated differences for the ST kriging highlight a GE recovery up to 5 m (-5 m in the reported results) and a drawdown up to 10 m, which seems to be reasonable for the studied area considering the GE evolution shown by the observed data at each well for the interval under study.From a direct cell count, 85.4% of the GE difference values present recovery up to 5 m (-5 m in terms of differences) or drawdown up to 10 m, for the ST kriging and only 45.8% for the spatial kriging.
In particular, in the ST case there are more extended areas with drawdowns of about 5 m, meaning that for the period of analysis, the drawdowns of the groundwater level in the aquifer system is about 1 m/year.In both cases, there are consistent drawdowns between the latitudes 2,130,000 and 2,150,000, almost on a diagonal line that crosses from west to east in the ST kriging map, but only crossing the Texcoco aquifer and a portion of the MMC aquifer in the spatial kriging map.Note that the spatial kriging map has more extended areas with drawdowns larger than 10 m, which is heavily influenced by the available spatial data in years 2002 and 2007.
Checking GE recovery (negative values), 17.3% of the analyzed area is obtained for the ST kriging estimates and 32.5% for the spatial kriging.The last percentage for the spatial kriging provides an implausible pattern for a heavily pumped aquifer.Considering these percentages, the ST kriging recovery and drawdown results for the five years are reasonably smaller than those for the spatial case.
Zones with recoveries of 25 to 40 m (negative values) are observed in the northwestern area for the spatial kriging; on the other hand, they are estimated as drawdowns in the ST case.It is worth noting that both cases correspond to areas with large SDs of the estimation error (Fig. 13a), although smaller for the ST kriging (Fig. 13b).
Figure 13 compares the contour maps for spatial (a) and ST (b) estimation error SD for GE differences and the distribution of their values through their box plots (c) and histograms (d).The SD values are significantly smaller for the ST kriging (as shown in Fig. 13c), which has 75% of the values smaller than 6 m; in contrast, for the spatial case, 75% of the SD values are larger than 17.2 m, and only a few values are less than 6 m.The contour maps (Fig. 13a, b , where E 1 (s, t 1 ) = Ẑ(s, t 1 ) − Z(s, t 1 ) and E 2 (s, t 2 ) = Ẑ(s, t 2 ) − Z(s, t 2 ) are the GE estimation error SD for the position s and time t 1 and t 2 .On the other hand, for the ST case, the estimated error variance of the GE changes is s, 2002), E 2 (s, 2007)] ; the estimated error SD of the GE changes is reduced more in the ST case since the covariance between the GE estimate errors at the two years is considered (Fig. 13b).
Uncertainty in the GE change estimates is also checked by computing, for each grid cell, the absolute value of the coefficient of variation (ACV, i.e. the ratios between the SD and the absolute value of the GE difference estimate).75% of the ACV values are less than 3.5 for the spatial kriging and less than 1.5 for the ST kriging.Therefore, the latter considerably reduces the error estimate uncertainty of the groundwater level change.A complete report of the ACV values is presented in the electronic supplementary material (ESM) and the maps that show details on the spatial distribution differences between the two approaches for the CV and ACV of the GE changes.

Discussion
In previous works, temporal changes in groundwater levels, for a given period, were calculated by using two procedures based on spatial kriging.In the first approach, for each well, the groundwater level in the period's final year is subtracted from the corresponding level in the period's initial year, and afterward, kriging interpolation is done (for example, Ta 'Any et al. 2009 andAhmadi andSedghamiz 2007).This procedure is possible when the data are available for the initial and final years of the evaluated period at the same wells or if the missing data are filled in.In the second approach, temporal changes are calculated by subtracting the groundwater levels after interpolating each year's data over a spatial grid (for example, Sun et al. 2009).However, when data are unavailable in the same set of wells for the initial and final years of the evaluated period, this approach, although possible, may produce misleading results.
This work proposes an alternative for the second approach, in which the GE interpolation is obtained with ST kriging and then takes advantage of the available observations not only over space, but also in time.Ruybal et al. (2019a) used a similar approach to calculate temporal changes in piezometric levels.However, their emphasis was on analyzing the piezometric levels, and the temporal changes were only presented in maps.On the other hand, Ruybal et al. (2019b) focused on evaluating groundwater storage changes in the Denver Basin Aquifer System (USA), applying the ST approach of their previous work, emphasizing the hydrological implications and highlighting uncertainty in the volumetric predictions on the basis and the estimated errors of the groundwater level temporal changes, as evaluated in this work.Furthermore, in contrast with Ruybal et al. (2019a), in this work, the non-separability index (Cappello et al. 2018) is used to choose the appropriate class of the ST variogram model.Also, the SD of the estimate errors of the GE temporal change has been computed and presented for the first time using ST kriging.The results have highlighted the improvements in terms of more realistic estimates and reduced estimation error SD, obtained when data at other times are involved in the interpolation process.Ruybal et al. (2019a) proposed a list of pending issues in the ST geostatistical applications to Hydrogeology (Table 1).These were: (1) assessment of the applicability of additional space-time variogram models to predict groundwater levels; (2) evaluation of aquifers on a larger scale with an extent of thousands of kilometers; (3) further assessment of ST kriging for confined, semiconfined, and unconfined systems; (4) evaluation of longer time periods; (5) evaluation of more methods to remove global groundwater trends or drift; and (6) advocating for the advantages of ST applications on irregular or sparse groundwater datasets, which is common in groundwater applications.
Referring to the above mentioned issues this study has given a significant contribution to the successful application of the ST approach: (i) on a regional aquifer; (ii) on the semiconfined and unconfined conditions in the SBMAS; (iii) by using a long time series length (21 years); (iv) on the irregularly distributed GE dataset (even with broad areas lacking data between the MMC and Texcoco aquifers), where the ST geostatistical application dramatically reduces the uncertainty in the estimation error SD.
Finally, it is worth recalling that in the presented analysis, the trend has been removed from the data, so that the structural analysis and kriging have been conducted on the residuals.Once the estimated residuals have been obtained, the trend is added in order to determine the estimated values for the variable under study.This is a consolidated way to proceed in the presence of a trend, as also shown in the published papers on the same subject referred to in Table 1.However, considering that after subtracting the trend, as done when applying residual kriging, the experimental variogram of the residuals presents a bias (see, for example, Matheron 1971), the adoption of other methods that avoid this problem, like universal kriging or IRF-k kriging, represents a pending issue that could be addressed in future works.

Conclusions
The analysis conducted in this work focused on estimating groundwater level changes in the upper area of the SBMAS.For this aim, the performance of the spatial and ST approaches was evaluated, and estimation error SDs of the GE changes were presented.The improvements of applying the ST kriging for the GE estimates compared to the spatial kriging were quantified on the basis of the cross-validation results and the estimate error SDs, which were useful to better distinguish relevant hydrological behaviors.Especially for the computed differences in the GE, the ST kriging produces more reasonable results than the spatial kriging, since less extreme recoveries and drawdowns, associated to smaller SD of the estimate errors, were highlighted.
A thorough ST hydrogeological analysis of the groundwater level changes for the period 1997 to 2017 in the upper aquifer of the SBMAS and their different sources of uncertainty (including the interpolation error of the groundwater level changes) should be performed in the future.Moreover, the complex geostatistical approach, both in the spatial domain (De Iaco et al. 2003;De Iaco et al. 2013;De Iaco 2017;De Iaco, 2023b) and in the spatio-temporal context (Cappello et al., 2021a(Cappello et al., , b, 2022;;De Iaco, 2022, 2023a), holds significant potential for modeling groundwater velocity, as demonstrated in the work of De Iaco and Posa (2016).

Fig. 1
Fig. 1 Study area.Limit of the Basin of Mexico (BM) and the Southern Basin of Mexico Aquifer System (SBMAS)

Fig. 2
Fig. 2 Well locations for the complete set of data, from 1997 to 2017 (black dots).The position of wells with data for 2002 and 2007 are shown with different symbols (orange diamonds and yellow squares respectively).Where symbols overlap, data from more than one year concur

Fig. 4 5Fig. 5 a
Fig. 4 Sample spatial variograms (black points) and its fitted variogram models (continuous line) of the GE residuals for the year a 2002, b 2007

Fig. 6 Fig. 7
Fig. 6 Box plots of sample nonseparability ratios grouped for a spatial and b temporal lags

Fig. 8
Fig. 8 Comparison of GE for the year 2002: a spatial and b ST kriging contour maps, c box plots and d histograms

Fig. 9
Fig. 9 Comparison of GE for the year 2007: a spatial and b ST kriging contour maps, c box plots, and d histograms ) are similar to the maps of the estimated error SDs of the GEs for 2002 (Fig. 10) and 2007 (Fig. 11).However, the spatial estimated error SD values are of course larger than the ones in 2002 and 2007, s i n c e Var[E 1 (s, 2002) − E 2 (s, 2007)] = Var[E 1 (s, 2002)] +Var[E 2 (s, 2007)]

Fig. 10
Fig. 10 Comparison of the standard deviation of GE for the year 2002: a spatial and b ST kriging contour maps, c box plots, and d histograms

Fig. 11
Fig. 11 Comparison of the standard deviation of GE for the year 2007: a spatial and b ST kriging contour maps, c box plots, and d histograms

Fig. 12
Fig. 12 Comparison of groundwater level temporal changes: a spatial and b ST kriging contour maps, c box plots, and d histograms

Fig. 13
Fig. 13 Comparison of the estimation error SD of the groundwater level difference: a spatial and b ST kriging contour maps, c box plots, and d histograms

Table 1
Summary of reviewed papers on geostatistical methods for estimating groundwater elevation (GE), depth to groundwater (DG), and its temporal changes, and the present study S&T Spatial and temporal separately, S Spatial, M multivariate, ST Spatio-temporal, SD Spatial Density, ASD Average Spatial Density

Table 3
Basic statistics of the GE data for years2002 and 2007 (spatial case)

Table 4
Basic statistics of the GE data for the 1997-2007 period (spatio-temporal case)