Geostatistical analysis of groundwater levels in a mining area with three active mines

Mining activities can significantly impact groundwater reservoirs in their vicinity. Different approaches have been employed, with varying success, to investigate the spatial variability of groundwater levels in mining areas. Typical problems include the small sample size, the non-Gaussian distribution of the data, and the clustering of sample locations near the mines. These conditions complicate the estimation of spatial dependence. Under sparse and irregular sampling conditions, stochastic methods, which can provide estimates of prediction uncertainty, are preferable to deterministic ones. This research focuses on the comparison of two stochastic methods, stochastic local interactions (SLI) and universal Kriging (UK), using water level data from 72 locations around three mines in Northern Greece. UK is a well-known, variogram-based geostatistical method, while SLI is a computationally efficient kernel-based method that can cope with large spatial datasets. The non-Gaussian distribution of the data is handled by means of a flexible, data-driven Gaussian anamorphosis method that uses kernel functions. The spatial prediction performance of both methods is assessed based on cross-validation. UK performs better than SLI, due to the fact that the former incorporates a linear trend function. On the other hand, a comparison of the two methods using data from a single mine that contains only 28 measurement locations shows that SLI performs slightly better than UK. The prediction uncertainties for both methods are also estimated and compared. The results suggest that SLI can provide better estimates than classical geostatistical methods for small sample sizes that do not allow reliable estimation of the variogram model.


Introduction
The quantity, quality, and the degree of spatiotemporal variability of groundwater are key factors for the wealth of human communities and ecosystems; therefore, they play a significant role in effective groundwater resources management.Both human interventions, such as excessive pumping, and natural processes, such as a decline in recharge, can cause fluctuations in groundwater levels.Level variations could be long-term (on the order of years) or short-term (on the order of months).Global studies confirm the connection between excessive groundwater pumping and variations in groundwater levels (Islam et al. 2022;Tatas et al. 2022).Dewatering efforts for underground mining operations significantly reduce groundwater levels for a long time and in many cases may dry up nearby springs.Near-mine ecosystems typically depend on groundwater for survival.Water pumping for mining activities can lead to groundwater level drops which can be harmful to local communities and ecosystems (Schrader and Winde 2015;Davies et al. 2020).Therefore, monitoring the spatial distribution of groundwater levels around mining areas is an important goal for management purposes.
Various geostatistical methods have been used to investigate changes in the spatial distribution of groundwater levels.Some studies employ Kriging methods using only primary information (Budiman et al. 2022;Evans et al. 2020), while others involve auxiliary variables (Varouchakis and Hristopulos 2013a, b;Zirakbash et al. 2020;Calzolari and Ungaro 2012;An et al. 2015) or implement hybrid machine learning approaches (Parasyris et al. 2021;Manzione and Castrignano 2019;Theodoridou et al. 2017;Tapoglou et al. 2014).In mining areas, groundwater levels distribution is usually investigated through the use of Kriging methods such as ordinary Kriging, indicator Kriging and co-Kriging (Dai et al. 2011;Abiye et al. 2018;Keegan-Treloar et al. 2021).
Geostatistical methods typically rely on the existence of data that sufficiently sample the spatial variability in the area of interest.In mining areas, the monitoring stations are often clustered around small areas of interest (pits, mines or mine sectors).This practice prevails because such areas require careful monitoring due to increased environmental risks.The clustered pattern of the measurement locations hinders the reliable estimation of the variogram model, especially if the number of data points inside each cluster is small (Goovaerts 1997;Kovitz and Christakos 2004).Another problem commonly faced by geostatistical methods is the skewness of the data distribution.This is usually handled by means of nonlinear transforms (e.g., lognormal, Box-Cox) which aim to normalize the marginal data distribution (this procedure is known as Gaussian anamorphosis; Wackernagel 2003;Chilès and Delfiner 2012).
This work has two goals: First, it presents an alternative approach for spatial groundwater levels analysis that can overcome the difficulties related to estimating spatial dependence.This approach is based on the stochastic local interaction (SLI) model.This model imposes spatial correlations by means of local couplings (interactions); the SLI model can increase the computational efficiency of spatial prediction for correlated data (Hristopulos 2015;Hristopulos et al. 2021).Secondly, a recently proposed data-driven (nonparametric), kernel-based approach is used to conduct Gaussian anamorphosis (Pavlides et al. 2022;Agou et al. 2022).This method is more flexible than nonlinear transforms based on explicit functions and does not suffer from the problems that Hermite polynomials face in the tails of the distribution.
Groundwater level data (meters below surface) from three mines operating in the same region (northern Greece), are studied.Groundwater is used to support mining activity in the study area.Groundwater levels are investigated by means of universal Kriging (UK) and SLI.Uncertainty estimates for both SLI and UK are provided.Prediction intervals (at 5 and 95% probability levels) are estimated.Leave-one-out cross-validation (LOOCV) measures indicate that the performance of UK and SLI is adequate considering the small sample size-although both methods slightly overestimate a few low values.UK performs slightly better than SLI if the entire dataset of the three mines is taken into account; however, SLI performs better than UK even for smaller sample sizes (individual mines).MATLAB code for SLI is freely available at Hristopulos (2020b).
The rest of this paper is structured as follows: section "Theory and methods" reviews the theory for the spatial prediction methods employed (UK and SLI) and for Gaussian anamorphosis; section "Data description and preprocessing" includes the data description, exploratory analysis and data normalization; section "Results: universal Kriging" presents the application of UK to the groundwater level data including cross-validation analysis; section "Results: SLI" focuses on the application of SLI over the entire area and separately to one of the three mines, along with the respective cross-validation measures.Lastly, section "Discussion" comments on the results, and the final section presents conclusions and open questions for further study.

Theory and methods
The spatial distribution of variables is used using the mathematical framework of random fields (Adler 1981;Christakos 1992;Chilès and Delfiner 2012;Hristopulos 2020a).A random field is a collection of dependent random variables distributed over the spatial domain of interest.Herein, the random field concept is used to model groundwater levels.
A scalar random field X(s) where s ∈ ℝ d is the position vector, represents a random function D ⊂ ℝ d → ℝ , where D is the spatial domain; in this study d = 2 and s = (s 1 , s 2 ) ⊤ , where ⊤ denotes the transpose of a vector.A random field is fully determined by means of the n-point joint probability density functions, where n = 1, 2, … denotes a positive integer (the set of all positive integers is denoted by ℕ ).The Gaussian assumption is often used for the joint density functions.
The data are assumed to represent a partial sample of the random field; they are denoted by the set of sampling sites {s 1 , … , s N } and the respective water level measurements {x 1 , … , x N } (lowercase letters denote specific realizations of the random field).The field is reconstructed on a rectangular map grid which comprises the set of points (nodes) {z 1 , … , z P } .Estimation points will be denoted in general by u ∈ ℝ d , where u refers either to a point in or one of the sampling points (e.g., for cross-validation analysis).
For practical purposes, random fields are often decomposed into a deterministic part (trend) and a stochastic part (fluctuation) as follows: The function m X (s) is assumed to represent the mathematical expectation of the random field, i.e., m X (s) = [X(s)] .The expectation [X(s)] of a random field at the point s repre- sents a stochastic average over all probable configurations of the field (as defined by means of the joint probability distribution; Hristopulos 2020a).Hence, m X (s) is a determin- istic function and typically it reflects slow spatial variations (1) X( ) = m X ( ) + X � ( ) (trend) of the random field X(s) ; usually, m X (s) is modeled as a polynomial.Herein, only first-degree polynomial trends are considered, i.e., where A = (a 0 , a 1 , a 2 ) ⊤ is the vector of trend coefficients, and s = (s 1 , s 2 ) are the coordinates of the two-dimensional (2D) position vector s.
The fluctuation X � ( ) is a random field that represents the finer-scale, stochastic component of X(s) .It is obtained by removing the trend from X(s) and is thus equal to the field's residuals (Pavlides et al. 2015).Since it is assumed that m X (s) = [X(s)] , it follows that [X � (s)] = 0.A random field is considered stationary if the joint n-point probability distributions (for all n ∈ ℕ) do not depend on the spatial location but only the relative configuration of the points.Since this condition is not easily testable, the condition of weak stationarity is used in practice.Weak stationarity means that the expectation of the random field is constant and the covariance is a function purely of the time lag (not the particular instant in time).A stationary Gaussian random field is thus fully determined from the mean function m X (s) and the covariance function c X (r) , where r ∈ ℝ d is the space lag between two points s and s � = s + r and c X (⋅) is a positive-definite function.It is assumed that the data sites comprise the set of points {s i } n i=1 while the groundwater level measurements at these sites will be denoted by the data vector x = (x 1 , … x n ) ⊤ .

Variogram modeling
The models attempting to represent spatial processes rely on the spatial correlations between the data.These correlations are reflected in the covariance function that characterizes the random field; however, for practical reasons, it is more convenient to estimate the variogram function instead of the covariance function (Chilès and Delfiner 2012;Kitanidis and Vomvoris 1983).
The variogram function X (r) for stationary random fields is connected to the covariance function c X ( ) as shown in Eq. ( 3).
where r is the distance vector between two points.Thus, the variogram function can be used interchangeably with the covariance function if the field of the fluctuations X ′ is considered a stationary field.The variogram function is defined as follows w h e r e Var i s t h e va r i a n c e o p e r a t o r, i .e ., The empirical (experimental) variogram is obtained from the data using Matheron's moment estimator (Matheron 1962) or Cressie's robust variogram estimator (Cressie 1993).The empirical variogram is then fitted to a theoretical model in order to account for all possible pair distances.Various admissible variogram models can be found in Goovaerts (1997); Chilès and Delfiner (2012); Hristopulos (2020a).The tested variogram models are fitted to the empirical variogram, using the weighted least squares method (Olea 2006).The spherical variogram model which is used herein satisfies the equation where 2 is the variance of the random field, |r| is the spatial lag, and is the correlation length under the assumption of statistical isotropy (i.e., lack of directional preference; Hristopulos 2020a).

Kriging
Kriging refers to a group of stochastic spatial interpolation methods.They are by construction linear, unbiased, and minimum variance estimators.Kriging methods have a large range of applications in the mining and environmental sectors (Cressie 1990;Varouchakis et al. 2018;Varouchakis and Hristopulos 2013b).Kriging provides significant benefits compared to deterministic interpolation methods (Gong et al. 2014;Varouchakis and Hristopulos 2013a;Pavlides et al. 2015).
The value of the random field X(u) at an unmeasured loca- tion u ∈ is estimated by means of a linear combination of the measurements at n(u) nearby points s 1(u) , … , s n(u) , where s i(u) ∈ {s 1 , … , s N } is a neighbor of u for all i = 1, … , n(u) .A map of the spatial distribution of the field is obtained by repeating the estimation process at every node of a suitably selected grid.Such maps can be accompanied by estimates of the prediction variance at each point.The variance can adequately represent the prediction uncertainty if the data probability distribution is Gaussian.If the data follow a skewed probability distribution, the Kriging-based uncertainty estimates are not reliable (Agou 2016).
Simple Kriging (SK) assumes that the random field is stationary and its mean, m X , is known.The SK estimator is obtained by means of the following equation (Krige 1951;Cressie 1990): (5) In Eq. ( 6), i (u) are the Kriging weights at the target point u .
The Kriging weights are calculated by solving the following linear system where is the covariance (or variogram) matrix of the fluctuation random field X � (⋅) at the data loca- tions, i.e., [C X ] i,j = c X (s i − s j ) for i, j = 1, … , n(u) , = ( 1(u) , … ,  n(u )) ⊤ is the vector of the Kriging weights, and ⊤ is the covariance (or variogram) vector between the unknown point u and each of the n( ) neighbor points that contribute to the esti- mate (Chilès and Delfiner 2012).The Kriging estimate x(u) at u is obtained by replacing in Eq. ( 6) the x i with the respec- tive data values, m X with the estimate of the mean, and the Kriging weights with the solution of Eq. ( 7) for .

Universal Kriging
For many datasets the constant mean assumption is not suitable; even if the assumption is reasonable, it is not always possible to accurately determine the value of the mean (e.g., if the sample is small).Several variations of Kriging have been suggested to address this issue, such as ordinary, regression, and universal Kriging (Goovaerts 1997;Schabenberger and Gotway 2004;Cressie 1990).
In this study, universal Kriging (UK) is employed, which is also known as Kriging with a drift.This method assumes that the trend m X (s) depends on the coordinates s in terms of simple basis functions (Mesić Kiš 2016;Kitanidis 1997).Assuming a simple linear polynomial trend, m X (s) = a 0 + a 1 s 1 + a 2 s 2 , where the a 0 , a 1 , a 2 are real-valued, unknown coefficients, the UK estimator at location u = (u i , u j ) is In Eq. ( 8), the polynomial coefficients of the trend m X are considered unknown and should be estimated along with the covariance coefficients from the data, thus increasing the uncertainty of the prediction.The linear weights are calculated as the solution of Eq. ( 9) where the vector B contains both the n(u) weights and three Lagrange multipliers (one for each trend coefficient) used to enforce the no-bias condition.The covariance matrices and are expressed as follows where s k;1 and s k;2 are the coordinates of the point n(u) , 1 (u) , 2 (u) and n(u) is suppressed.The covariance (or variogram) used in Eq. ( 10) should be based on the residuals which are obtained by subtracting the trend from the data).However, the trend is considered unknown, and this creates a cyclical problem.Methods to address this issue are suggested in Kitanidis (1997); Goovaerts (1997).If the trend is first estimated based on the data and then removed to obtain the residuals, a more appropriate methodology is regression Kriging.
In this study, the variogram model parameters are estimated by means of maximum likelihood estimation (MLE) Fletcher (2000); Hristopulos (2020a).The spherical variogram model, given by Eq. ( 5), is used.The maximization of the likelihood is equivalent to the minimization of the negative logarithmic likelihood (NLL).The latter is given by where x = (x 1 , … x N ) ⊤ is the data vector, m is the vec- tor of the trend estimates at the sampling locations, i.e., m i = a 0 + a 1 s i + a 2 s j , and det is the determinant of the covariance matrix .The trend parameters (a 0 , a 1 , a 2 ) are estimated along with the parameters ( 2 , ) of the covariance model by minimizing the NLL.
The minimum error variance of the UK prediction, 2 UK (u) , is obtained from the following equation (Kitanidis  1997, p. 127):

Kernel functions
Kernel functions are routinely used in nonparametric estimation methods (Ghosh 2018).Herein, kernel functions are used for two purposes: (1) To define the SLI weights (see section "Stochastic local interaction model"), and (2) to estimate the cumulative distribution function (CDF) of the data.The SLI model is discussed in section "Stochastic local interaction model".The CDF estimation is described in the following. (10) A kernel function K(⋅) ∶ ℝ → ℝ satisfies the following requirements (Ghosh 2018): • K(x) is nonnegative, i.e, K(x) ≥ 0 and symmetric func- tion, i.e., K(−x) = K(x) for all x ∈ ℝ.
is also a kernel ∀ > 0.
The parameter adjusts the kernel's range via the bandwidth parameter b = 1∕ .Larger values of b imply a longer kernel range.Table 1 summarizes some kernel functions that are used in this study.At this point, a comment on notation is necessary: the symbol h is typically used to denote the bandwidth.Since in the current study kernel functions are used for CDF estimation and in SLI models, to distinguish between the two kernel bandwidths b is used for CDF estimation (b has units of [X]) and h in SLI modeling (h has units of distance).
Next, the nonparametric CDF estimation procedure is described.Let {x [i] } N i=1 represent the ordered sample of the water level values x : this is defined so that for any . Then, the nonparametric, kernel- based estimate of the CDF (KCDE), FK (⋅) , can be obtained from the following weighted sum (Pavlides et al. 2022) where K(⋅) is the CDF kernel step defined by means of the integral (Ghosh 2018; Hristopulos 2020a) The KCDE-based FK (x) has several advantages compared to the empirical staircase CDF estimate F(x) .For example, while F(x) is discontinuous, FK (x) provides smoothed kernel steps by adapting the bandwidth b to the data.A continuous estimate of the CDF is especially useful for numerical simulations (Pavlides et al. 2022).For the KCDE, Eq. ( 13), a plug-in bandwidth b is used, based on the algorithm developed by Botev et al. (2010).This bandwidth selection method was successfully tested for non-Gaussian data (Pavlides et al. 2022).

Stochastic local interaction model
The SLI model employs a local representation that improves the computational efficiency of spatial prediction for large datasets (Hristopulos 2015(Hristopulos , 2020a;;Hristopulos et al. 2021).SLI is based on a conditional probability density function (PDF) defined by means of an energy functional that involves local interactions between neighboring sites.The energy functional represents the "probability cost" for specific spatial configurations (patterns) of field values.The local interactions are implemented by means of kernel functions with locally adaptive kernel bandwidths h.The SLI model is expressed mathematically by means of a precision matrix J .The term "precision matrix" refers to the inverse of the covariance matrix.In classical geostatistical analysis, the precision matrix is obtained by inverting the covariance.In contrast, in SLI the precision matrix is constructed first.This representation leads to semianalytical expression for the prediction (conditional mean) of the random field, while the conditional variance is easily expressed in terms of precision matrix elements.Assuming a Gaussian distribution, the conditional mean and variance fully determine the conditional probability distribution at the prediction point.Hence, SLI prediction avoids the computationally costly inversion of the covariance matrix required by Kriging methods.
The PDF of an SLI model is given by means of the following Boltzmann-Gibbs exponential expression (Hristopulos 2003(Hristopulos , 2020a)), where v is a set of model parameters The constant Z(v) is a normalization factor of the PDF which is obtained by integrating exp [−H(x;v)] over all possible val- ues of the data vector x .However, the value of Z(v) is not needed in spatial prediction.

Precision matrix formulation
The SLI energy functional H(x;v) , where v is the vector of SLI parameters (to be defined in the following), can be expressed in terms of the precision matrix as follows: In Eq. ( 16), v � = (c 1 , , , k) ⊤ is a reduced parameter vector and J(v � ) is a symmetric precision matrix given by the fol- lowing equation where , c 1 > 0 are SLI parameters, I is the identity matrix, h is the vector of kernel bandwidths, and J 1 is the interaction network matrix (gradient precision submatrix); the latter is determined by the sampling pattern, the kernel function, and the kernel bandwidth as explained in Hristopulos (2015); Hristopulos et al. (2021).
The SLI vector v = (m X , c 1 , , , k) ⊤ contains the fol- lowing parameters: m X is the mean value (assumed con- stant), c 1 controls the amplitude of the term which "mim- ics" the sum of square gradients, and controls the overall amplitude of the fluctuations.Finally,  > 1 and k ∈ {2, 3, 4, …} are parameters that control the bandwidth vector h = (h 1 , … , h N ) ⊤ .The integer parameter k determines the order of the near-neighbor used to define the local bandwidth, while is a global multiplication factor.The bandwidth h q for a data point s q depends on the distance of s q from its k-nearest neighbor s k(q) as follows In SLI estimation, k is preselected, while m X , , c 1 , are inferred from the data by means of MLE.If a superposition of basis functions is used to model the trend instead of a constant m X , the coefficients of the superposition can also be estimated by means of MLE.Such a trend analysis is not used in the current application of SLI.

SLI prediction
In order to predict the water level at an unmeasured point s p (in this section s p is used instead of u for notational con- venience), it is inserted in the energy functional H so that it interacts with the neighboring sampling points via precision matrix elements J p,n , for n = 1, … , N .The "updated" energy function H(x, x p ;v) now defines, in terms of the Boltzmann-Gibbs PDF, the marginal conditional distribution at s p .Due to the local nature of the interactions, most of the J p,n values are zero.The mode of the conditional ( 16) distribution at s p is then given by the following predictive equation (Hristopulos 2015;Hristopulos et al. 2021) where xSLI (s p ) is the prediction, x q − m X is the fluctuation at location s q and J q,p , J p,p represent elements of the precision matrix.Note that the SLI prediction of Eq. ( 19) is independent of the scale coefficient .
The precision matrix uses kernel-based weights w p,q to embody interactions between nearby points.More specifically, there are two weights for each pair of points s n and s m as follows where d n,m = ‖s n − s m ‖ is the Euclidean distance between the points.The bandwidths h n , h m are determined from the sampling density in the neighborhoods of s n and s m respec- tively.Since the sampling patterns in the neighborhoods of s n and s m can be quite different, in general it holds that w m,n (h) ≠ w n,m (h).
The entries of the gradient precision submatrix are determined from the weights as follows (Hristopulos 2015;Hristopulos et al. 2021) where n,m = 1 if n = m and n,m = 0 if n ≠ m is the Kro- necker delta.
The SLI conditional variance at s p is given approxi- mately by Hristopulos et al. (2021) As evidenced in Eq. ( 23), higher values of c 1 tend to reduce the variance.This happens due to increasing energy cost for large increments in the energy functional H(x;v) .In addi- tion, prediction points with more neighbors (points s q such that w q,p > 0 or w p,q > 0 ) have lower conditional variance than prediction points in sparsely sampled regions.Similar to the Kriging variance, the SLI conditional variance does not explicitly depend on the data values.The dependence is indirect, to the extent that the data influence the estimation of the model parameters.

Gaussian anamorphosis
In many cases, spatial data follow a skewed PDF that deviates significantly from the normal distribution.In the present study, the water level distribution is clearly asymmetric (see section "Data description and preprocessing").Nonlinear, monotonic transformations such as the logarithm, Box-Cox, normal-score, modified Box-Cox versions, and deformed logarithmic transforms are used to restore normality (Box and Cox 1964;Varouchakis et al. 2012;Goovaerts 1997;Hristopulos 2020a;Hristopulos and Baxevani 2022).The application of normalizing transforms is often referred to as "Gaussian anamorphosis".The spatial analysis is then carried out using the transformed data.Finally, predictions in the original domain are derived by applying the inverse transformation.
In this study, Gaussian anamorphosis is performed using KCDE (see section "Kernel functions") coupled with the normal scores transform.KCDE derives FK (x) from the water level data {x 1 , … , x N } , using kernel smooth- ing to avoid the discontinuities of the staircase CDF function.Normalized estimates, {x * 1 , … , x * N } , are then calculated based on the normal scores transform.These conform with the standard normal distribution N(0, 1) .Spatial interpolation of the normal scores leads to predictions {x * 1 , … , x * P } .Finally, an inverse transform is used to map the normalized predictions {x * 1 , … , x * P } back to water level values {x 1 , … , xP }.
The inverse transform in Gaussian anamorphosis is implemented via a lookup table that maps the predictions of the normalized values back to water level values.The inverse transform is denoted by x = F−1 K (Φ(x * )) , where Φ(⋅) is the CDF of the standard normal distribution.Hence, the lookup table implements the mapping x * → x.
To create the lookup table, the expression Eq. ( 13) for FK (x) is used to calculate the CDF for a large set of discre- tization points {z l } L l=1 (herein, L = 10, 000 ).The table thus generated comprises L rows and three columns that correspond to the discretization points {z l } L l=1 , the respective normal scores, and the corresponding CDF values { FK (z l )} .The discretization points are linearly distributed between the minimum and maximum possible values, i.e., z min = x min − h and z max = x max + h where x min = min{x 1 , … , x N } and x max = max{x 1 , … , x N } as shown in Pavlides et al. (2022).The inverse transform of any prediction x * is calculated as follows: first, the respective Φ(x * ) is determined; next, the closest probability level to Φ(x * ) in the lookup table is deter- mined; if this corresponds to FK (z o ) , where z o ∈ {z 1 , … , z L } , the inverse transform of x * is given by x = z o .

Assessment of predictions
Validation measures are used to assess the quality of the spatial prediction achieved by the models.Furthermore, stochastic methods provide estimates of the prediction uncertainty for each location.
The predictive accuracy of both UK and SLI was assessed using LOOCV (Chilès and Delfiner 2012).In LOOCV the sampling sites s n , n = 1, … , N = 72 and the corresponding measurements are cyclically removed from the data one at a time, and the water level, xn at the point removed is estimated from the remaining 71 drillholes.The estimation error  n = x n − x(s n ) , for n = 1, … , 72 is then calculated.
Ninety percent prediction intervals (i.e., based on a 90% confidence level) are used to characterize the uncertainty of the predictions.These intervals are calculated in two steps.First, prediction intervals for each point are generated for the transformed (normalized) data based on the prediction variance; the latter is given by Eq. ( 12) for UK and Eq. ( 23) for SLI.The 90% prediction interval at location u for either method is given by where x * 0.05 (u) = x * (u) − 1.645(u) is the 5% quantile, and x * 0.95 (u) = x * (u) + 1.645(u) is the 95% quantile of the Gaussian predictive distribution at u .The water level pre- diction intervals at u are obtained by applying the inverse of the normalizing transformation to the 5% and 95% quantiles, x * 0.05 (u) and x * 0.95 (u) respectively.The inversion employs the principle of quantile invariance under monotonic transformations (Hristopulos 2020a) and is implemented by means of the lookup table defined in section "Gaussian anamorphosis".Thus, the respective water level quantiles x0.05 (u) and x0.95 (u) are obtained.To better visualize the uncer- tainty of the prediction, the 90% prediction interval range R = x0.95(u) − x0.05 (u) is displayed.

Data description and preprocessing
The study area comprises three mines operating in the same region in northern Greece.The area is geologically composed of various metamorphic and igneous rocks; biotite gneisses with intercalations of marble horizons are located in the area; granodiorites and acid to intermediate subvolcanic rocks intrusions into the gneisses cover small areas.The study area is characterized hydrogeologically as semipermeable, and its vertical profile comprises three hydrostratigraphic units.Secondary porosity (fractures) is responsible for hydrological connectivity.In certain areas, the rocks exhibit strong fragmentation which increases with depth thus creating pathways for slow groundwater flow (ENVECO 2010).The data used in this study comprise the 10-year average (2011-2020) of biannual water level measurements (below surface, BSL), from 72 drillholes.All the values are negative since water level is measured in terms of meters below surface.The measurement boreholes are located around the three mines operating in the area of interest (see Fig. 1b): • Mine A (28 drillholes) • Mine B (28 drillholes) • Mine C (16 drillholes)

Preliminary data analysis
This section presents a statistical summary of the temporally averaged water levels from the entire area.The frequency histogram and the data locations are shown in Fig. 1.The median distance of a borehole to its second-nearest neighbor ( k = 2 ) is 488.15 m.Table 2 shows the summary statistics of the data.
Based on the skewness value in Table 2 and the shape of the histogram, the data deviate strongly from the normal (Gaussian) distribution.As discussed in section "Kriging", the Kriging variance does not adequately represent the prediction uncertainty for non-Gaussian data.To address this issue, Gaussian anamorphosis is employed in order to normalize the data (see section "Gaussian anamorphosis").

Gaussian anamorphosis of water level data
In order to address the non-Gaussian distribution of the data, the data-driven Gaussian anamorphosis method is employed.The method is described in section "Gaussian anamorphosis".First, KCDE is applied with the triweight kernel (see Table 1) to provide a nonparametric CDF, FK (x) , as discussed in Pavlides et al. (2022).The plug-in kernel bandwidth was calculated at h = 9.33 m (h in KCDE refers to water level, not geographical distance).
Figure 2 shows the KCDE-derived FK (x) in comparison to the staircase Fs (x) .In contrast with the latter, FK (x) is a continuous curve which was constructed using the lookup table .Figure 3 shows the histogram and the normal probability plot for the transformed data {x * 1 , … , x * N } which are obtained after normalization.As is evident in these plots, the transformed data {x * 1 , … , x * N } conform to the standard normal distribution N(0, 1) .Spatial prediction is carried out using the normalized data.The predictions {x * 1 , … , x * P } obtained of the map grid are transformed back to water level values {x 1 , … , xP } with the help of the lookup table as explained in section "Gaussian anamorphosis".

UK cross-validation
The UK predictive accuracy was assessed using LOOCV (section "Assessment of predictions").In LOOCV sampling sites s n , n = 1, … , N = 72 and the corresponding measurements are cyclically removed from the data, and the value of the water level, xn at the point removed is esti- mated from the remaining 71 drillholes.The estimation error  n = x n − x(s n ) , for n = 1, … , 72 is calculated.The histogram of UK estimation errors for all mines in the study area is shown in Fig. 7. Standard validation metrics are shown in Table 3.The values of the data range from x max = −1 m to = −208 m.Most of validation errors are in [−50, 50] m, with the exception of a few locations that return errors greater than 50 m (in magnitude).The presence of few high validation errors is not surprising considering the small sample size and the significant spatial dispersion of the sampling points.These factors lead to large prediction intervals (as shown in the uncertainty map of Fig 5b).Taking into account the preceding factors, the UK interpolation performance is considered adequate.

Results: SLI
The water level in the study area is estimated using SLI (see section "Stochastic local interaction model") with the quadratic kernel (see Table 1 Using the Eqs.( 19) and ( 23), the optimal SLI predictions x * SLI (u) and respective variances 2 SLI (u) are estimated.The same map grid is used for UK (see section "Results: universal Kriging").In contrast with Kriging, the search neighborhood in SLI is automatically adjusted by the algorithm through the parameter and the bandwidth vector h instead of being set by the user.For comparison purposes, the SLI predictions are used only at the map-grid nodes where UK predictions are available.The normality of the transformed data {x * 1 , … , x * N } allows using the standard deviation of the SLI predictions to determine prediction intervals using the method presented in section "Assessment of predictions".
The results are presented in Figs. 8 and 9.The SLI algorithm returns large search neighborhoods as implied by = 4.18 (see section "Stochastic local interac- tion model"): the bandwidth Eq. ( 18) dictates that the local bandwidth h u at the point u is the distance to the k-near- est neighbor ( k = 2 ) multiplied by .As shown in Fig. 1, k-nearest neighbor distances vary from small (43 m) to large (2,977 m).
As is evident in Figs.5b and 8b, the SLI prediction interval range has a minimum value, min u∈ {R SLI;90 (u)} , of 31.9 m.For UK, the prediction interval has a lower minimum, i.e., min u∈ {R UK;90 (u)} = 0.7 m.The reason is that UK explicitly minimizes the prediction variance, and thus it generates a smaller range for prediction intervals.

SLI application to mine B
Stochastic local interaction can handle large due to the sparse structure of the model (Hristopulos 2015;Hristopulos et al. 2021).However, the locally adaptive nature of the SLI predictor also allows model estimation and prediction even for small samples.In this section SLI is applied to the dataset for mine B. This mine involves 28 sampling points, a sample size which is considered small for most stochastic methods.The MLE initialization values for mine B are the same as for the entire a r e a : v 0 = (c 1 = 2.5,  = 0.1,  = 1.25, m X = −0.56)⊤ .The optimal SLI parameters obtained by MLE are: v = (c 1 = 2.881,  = 0.046,  = 0.938, m X = 0.163) ⊤ .
The maps for xSLI (u) , x0.05 (u) , x0.05 (u) and the range R SLI;90 are shown in Fig. 10.The prediction of the water level values using only data from mine B is similar to the SLI implementation using the data from the entire area.The water level in the northwestern area of the mine is lower than in the southeast, although SLI for mine B gives slightly higher predictions for the low water level area in the northeast than SLI for the entire area.The prediction uncertainty using only data from mine B follows a similar spatial distribution as the interval shown in Fig. 8b.However, the uncertainty is lower in the northwestern area and the eastern part of the mine than the SLI implementation using all the data, albeit it is still relatively high.  1 3 The differences for the mine B results between Figs. 8-9 and Fig. 10 are expected since the maps of Fig. 10 are produced with fewer data that determine a different SLI model.

SLI cross-validation
Leave-one-out cross-validation is used to assess the SLI predictive performance, first for the SLI model based on the entire area and then for the SLI model obtained for mine B. The validation metrics for the SLI estimation are shown in Table 4.The histograms of the errors are shown in Fig. 11.The data values for the entire area range between x max = −1 m to x min = −208 m.However, the low- est LOOCV prediction is min i={1,2,…,N} x(s i = −121.9m, i.e., the minimum is significantly overestimated.One reason for this behavior is the difficulty of estimating the minimum value, once it is removed, in a sparsely sampled For the SLI using data from the entire area, there are a few large errors (  > 50 m).For SLI predictions using only mine B data, there are three locations that return high validation errors.In both cases, the SLI overestimates the water lever in both mine B and mine C.However, considering the range of data values, the clustering of the data locations and the small number of data points (especially for mine B), and the LOOCV measures over the study areas (Table 4), the SLI performance is considered adequate overall.

Discussion
This section provides a comparative discussion of the UK and SLI cross validation results.For ease of reference, the validation measures are gathered in Table 5.In the case of UK, the validation measures for Mine B are derived using the variogram model estimated from the entire area, as 28  data locations are considered too few to reliably estimate the variogram model.The SLI parameters were inferred separately for mine B, and thus a different set of SLI parameters was used (see section "SLI application to mine B") than for the entire area.UK validation measures are slightly better for the entire area.This result is attributed to the fact that UK employs a linear trend Eq. ( 2) to formulate predictions, while SLI is employed with a constant mean (see Eq. 19).This choice creates a disadvantage for SLI, since the spatial distribution of the data over the entire area does not seem to comply with the stationarity assumption.This shortcoming will be fixed in future implementations of SLI.
Both methods perform adequately given the small sample size, although they significantly overestimate the (few) low values as evidenced in the MaxAE values in Table 5 and the error histograms of Figs. 7 and 11.In addition, both methods give high uncertainties in areas that are distant from the sampling sites.However, these shortcomings point primarily to sampling, not methodological deficiencies.
The differences between SLI and UK are illustrated in Fig. 12.As expected from the results in the previous sections, the prediction interval (PI) range of SLI, R SLI , is generally higher than the PI range for UK, R UK .Kriging methods minimize the prediction variance 2 UK , which affects the R UK .SLI prediction is instead based on maximizing the conditional PDF by means of the energy functional H(x;v) which can lead to higher prediction variances at many prediction points.
SLI gives significantly higher estimates than UK at certain points in mine B. The UK predictions drop significantly in the north of the study domain, i.e., for s 2 > 8000 m (see Fig. 5), while SLI predictions remain high (see Fig. 10).The linear trend in the UK model improves the validation measures compared to a preliminary analysis based on ordinary Kriging.On the other hand, the sudden drop of the water level in the northern section of mine B, forces UK to model a significant linear slope in the trend function, leading to  In Fig. 13, the LOOCV absolute error for each of the data sites is plotted against the range of the 90% prediction interval R SLI (s) for SLI or R UK (s) for UK.As evidenced in the figure, UK gives both the highest and lowest PI values.Furthermore, locations with high absolute error values ( |x(s) − x(s)| > 50 m) have high PI range ( R(s) > 100 m).Thus, for both methods high absolute error values are concentrated in areas with high uncertainty.
In addition, both methods estimate prediction intervals that are wide enough to accommodate the prediction of the other method at most locations.In both cases, the LOOCV results are reasonable given the small size of the dataset.The SLI method allows for constructing a spatial model that can give reasonable predictions even with a dataset as small as that of mine B (28 points).

Conclusions
In this research, the estimation of groundwater levels in sparsely sampled areas with mining activity is investigated.This case study involves the aggregate water level below surface for three mines in northern Greece.The recently proposed kernel-based KCDE method (Pavlides et al. 2022) is used to estimate the non-Gaussian distribution of water level and to conduct Gaussian anamorphosis.The predictive performance of two stochastic interpolation approaches is compared, UK and the more recently developed SLI method.SLI is a spatial interpolation method designed to provide computational efficiency for large spatial and spatiotemporal datasets (Hristopulos 2015;Hristopulos and Agou 2020;Hristopulos et al. 2021).While the primary objective of SLI is to efficiently extract correlations from large data, as shown herein, it can also be applied to small datasets that do not allow reliable estimation of the variogram function which is needed for the application of Kriging methods.
The cross-validation measures show evidence of reasonable interpolation performance.However, in certain areas the 90% prediction intervals are rather wide, indicating high prediction uncertainty.UK takes into account a linear trend function and minimizes the prediction variance, thus allowing for tighter prediction ranges than SLI.On the other hand, SLI provides an automatically adaptable search radius, computational efficiency, and the ability to estimate the spatial model even with the very sparse dataset of mine B. As discussed in Section "Discussion", the introduction of a linear trend model based on a small dataset can be misleading.While UK cross-validation measures are slightly better for the entire area, application of SLI mine B leads to slightly better performance than UK.Furthermore, the trend in areas outside the data clusters leads to rapid decrease of the UK predicted water levels (see Fig. 5) that can not be confirmed based on the available data.
The SLI predictor can be modified to include trend functions as in UK-for example, promising results were obtained by incorporating a polynomial trend in the spatiotemporal SLI predictor (Hristopulos and Agou 2020).Management of water resources is important in areas of mining activity.Future studies could focus on spatiotemporal prediction methodologies in order to allow monitoring seasonal changes of the water level due to mining activities.Furthermore, the KCDE-based estimate of the CDF can be used in conjunction with conditional simulation methods, such as Kriging polarization (Chilès and Delfiner 2012;Hristopulos 2020a;Olea 2012), in order to better capture the spatial variability of water level.
Funding Open access funding provided by HEAL-Link Greece.

Fig. 1 a
Fig. 1 a Frequency histogram of the 72 water level measurements (m below surface).b drillhole locations for the three mines: mine A (blue o), mine B (black x), mine C (red +).Coordinates are in meters ).The transformed data values {x * 1 , … , x * N } are used for SLI interpolation and the predictions are subsequently back to meters BSL.The SLI model parameters c 1 , , and m X are estimated by MLE.The initialization values for the MLE a r e : v � 0 = (c 1 = 2.5,  = 0.1,  = 1.25, m X = −0.015)⊤ .The SLI model parameters estimated by MLE are v � = (c 1 = 0.926,  = 0.032,  = 4.18, m X = −0.015)⊤ .

Fig. 4 Fig. 5 a
Fig. 4 Variogram of the residuals (determined by removing the MLEbased linear trend).The spherical model is shown with a continuous curve (red line), and the experimental variogram is shown by circle markers (blue circle)

Fig. 6
Fig. 6 UK-based maps of the lower and upper limits of 90% prediction intervals: a lower quantile x0.05 (u); b upper quantile x0.95 (u) .Coordinates are in meters area; this effect is similar to the Kriging smoothing effect (Yamamoto 2005; Pavlides et al. 2015).Another reason is that a trend function in the SLI model was not used.For mine B, the values of the data range from x max = −1 m to x min = −142 m.SLI again overestimated the lowest value yielding a minimum of −39.2 m.

Fig. 8 a
Fig. 8 a SLI prediction map of water level (m below surface level) for the three mines.b Range of water level 90% prediction intervals.Coordinates are in meters

Fig. 10
Fig. 10 SLI maps for mine B: a predicted water levels (m below surface); b range of the 90% water level prediction intervals; c lower quantile x0.05 (u) ; (d) higher quantile x0.95 .Coordinates are in meters

Table 1
Common kernel functions (|x| ≤ 1) is an indicator function used to define the range of compactly supported kernels:

Table 2
Summary water level data statistics.SD standard deviation The sample contains 72 data points that represent the average of biannual measurements over the 10-year period at each sampling site.All statistics are measured in meters below sea level, except for skewness (dimensionless measure)

Table 3
Validation measures (LOOCV) for assessment of UK inter-

Table 4
Validation measures for the SLI estimations Pearson's correlation coefficient, ME is the mean error, MAE is the mean absolute error, MaxAE is the maximum absolute error and RMSE is the Root Mean Square Error

Table 5
Leave-one-out cross validation measures for SLI and UK Pearson's correlation coefficient, ME mean error, MAE mean absolute error, MaxAE maximum absolute error and RMSE root mean square error.UK validation measures for mine B use the variogram model inferred from the data over the entire area