Introduction

Coastal zones are one of the most complex and dynamic systems since their landform changes rapidly (timeframe of days and weeks) due to the combined action of tidal flows, currents and waves on coastal sediments (Raper et al. 2005). Moreover, grain size analysis and textural characteristics of surficial coastal sediments provide useful information to define and reveal the hydrodynamic condition as well as the deposition process. The modelling of many environmental and engineering applications in the coastal zone as well as for risk assessment against coastal hazards requires the knowledge of the grain size and the distribution of the surficial coastal sediments. As a result, measurements of grain size parameters are important for the understanding and calculation of sediment transport and critical parameters for modelling coastal erosion and vulnerability (Boumboulis et al. 2021), offshore and geotechnical engineering (Zananiri and Vakalas 2019), coastal zone management and coastal protection works such as beach nourishment. Hence, maps of surficial sediment spatial distribution in coastal and nearshore zone are important to provide information about the processes and mechanisms of the environment for sustainable management and protection.

However, the collection of sediment samples is impossible to be carried out along the entire coastline and a lack of information regarding the coastal sediment distribution is created. For this reason, it is necessary to predict the grain size of the sediments in all unsampled positions. To overcome this constraint the application of spatial interpolation and geostatistical techniques has been recommended in varied scientific fields for parameter distribution, especially in rainfall and precipitation data (Vicente-Serrano et al. 2003; Moral 2010; Borges et al. 2016; Pellicone et al. 2018; Ananias et al. 2021), hydrogeology and groundwater analysis (Elumalai et al. 2017; Charizopoulos et al. 2018; Karami et al. 2018; A. Antonakos A., Lambrakis 2021) environmental and ecological research (Kalivas et al. 2013; Zarco-Perello and Simões 2017; Tziachris et al. 2017; Qiao et al. 2018), engineering geology and soil liquefaction (Boumpoulis et al. 2021; Kokkala and Marinos 2022) and soil sciences (Brus et al. 1996; Bourennane et al. 2000; Bishop and Mcbratney 2001; Robinson and Metternicht 2006). Geostatistical methods for spatial distribution are also recommended for mapping and prediction of sediment properties and the generation of coastal and seabed sediment maps (Verfaillie et al. 2006; Goff et al. 2008; Stephens et al. 2011; Lark et al. 2012; Liu et al. 2012). Meilianda et al., (2012), implemented multivariate geostatistics named for the production of a high-resolution median grain size distribution map with bathymetry as auxiliary information. (Jerosch 2013), developed a Cokriging model with three parameters (slope, bathymetry, and cost distance) to construct a predictive multi-parametric sediment type map. Bockelmann et al., (2018) applied kriging with external drift (KED) to interpolate estimates on relative proportions of the grain size fraction of mud and median grain size (D50) in surface sediments in the North Sea. Although there is a plethora of studies on the spatial distribution of sediments, almost all of them were carried out on the seabed and not near the shoreline, where most of the natural hazards have impacts and human activities occur. In this context, this study focuses on a zone which is close to the shoreline (extended from the backshore area to the 5-m sea depth contour) to produce maps which can be used in different applications that occur in the coastal zone.

All the aforementioned publications agree that the kriging geostatistical method in its various forms, has high prediction accuracy in sediment distribution maps, especially if combined with other highly correlated variables (Cokriging or Kriging with an External Drift). Contrary to the straightforward principle implemented for the deterministic methodologies, geostatistical techniques, have the advantage that they are including the parameter of randomness (stochastic) in the model. Another advantage of kriging is that it takes into account the spatial autocorrelation between neighbouring observations for calculating the interpolated surfaces and predicting values at unsampled places (Goovaerts 1999; Bockelmann et al. 2018). Recently, more advanced and complex methodologies have been developed for interpolation, such as Machine Learning (ML) (Bélisle et al. 2015; Appelhans et al. 2015; Tziachris et al. 2020; Sekulić et al. 2020; Didkovskyi et al. 2022), but a thorough review of this these techniques is beyond the scope of this paper.

In most of the research papers, in which a comparative study between spatial interpolation methods has been applied, geostatistical techniques (Kriging) outperformed the deterministic methodologies. Among the different Kriging techniques, the most efficient turned out to be the multivariate Co-Kriging (COK) methodology. However, if the variables used as auxiliary information in the Co-Kriging model are not correlated, then univariate Kriging models have better accuracy and performance than Co-Kriging. In any case, it is important to estimate the optimal spatial interpolation technique each time applied in a dataset for the best possible result.

The main objective of this study is the comparison of the predictive performance of different spatial interpolation methods, for surficial sediment distribution of the median grain size (D50) in the coastal zone of a model area existing in Western Greece. For this reason, four univariate geostatistical and three deterministic techniques were performed to produce sediment distribution maps. These methods include Ordinary Kriging (OK), Simple Kriging (SK), Bayesian Kriging (EBK), Universal Kriging (UK), Radial Basis Function (RBF), Local Polynomial Interpolation (LPI) and Inverse Distance Weighting (IDW). The prediction accuracy of those interpolation models was evaluated with a cross-validation method (RMSE: Root Mean Square Error and ME: Mean Error) and using the coefficient of determination (R2) between the measured and predicted values. The implementation and production of those maps can be provided and used as a raster file, for the calculation of many other parameters that affect coastal erosion, such as the geotechnical parameter for the calculation of the Coastal Vulnerability Index (CVI).

Study area

The model area of this research is the Gulf of Patras which extends from the Rio-Antirio Bridge in the Eastern part to the cape of Araxos in the Western part and is located in Northwest Peloponnese, Western Greece (Fig. 1). On the West, it is connected to the Ionian Sea and on the East with the Gulf of Corinth. The total length of the shoreline of the study area is around 30 km. The Gulf of Patras is a densely populated coastal zone, which suffers extensive erosion over time (Nikolakopoulos et al. 2019; Depountis et al. 2023) and is expected to be worst in the future not only due to climate change but also due to anthropogenic factors and urban development.

Fig. 1
figure 1

Study area (Gulf of Patras in Western Greece) with the positions of the sediment sampling. The coordinates correspond to the Hellenic Geodetic Reference System

The gulf of Patras constitutes a shallow marine system (Piper and Panagos 1979) with a maximum depth of water arising at 130 m. The main rivers discharging in the gulf are the Acheloos and Evinos rivers in the north and Peiros in the South. The Gulf of Patras is a Plio-Quaternary graben (Ferentinos et al. 1985) and the evolution of the gulf in the Quaternary occurred due to the interaction between the tectonic subsidence (rates at 3–5 mm/year at the central graben), global sea level rise and the river sediment supply (Chronis et al. 1991).

In this research, the coastal zone of the study area was divided into two zones: Sampling Zone 1 (Shoreline materials), which is extended between the backshore and foreshore and Sampling Zone 2 (Seabed materials), which is extended from the foreshore until the 5-m depth contour (Fig. 1). Sampling Zone 1 is mainly consisted of coarse sands and gravelly sands, while Sampling Zone 2 has fine sands and silty sands. Furthermore, sampling zone 1 and 2 were divided into 3 different sub-regions for better map visualization and simplification of results: 1) Sub-region of Vrachnaika-Roitika-Kaminia, 2) Sub-region of Alissos-Kato Achaia-Niforeika and 3) Sub-region of Kalamaki-Ioniki Akti-Karnari (Fig. 1).

Research methodology

The main aim of this research is the assessment of the optimal spatial interpolation technique for sediment distribution maps comparing different spatial distribution methodologies. To accomplice this a stepwise methodological approach was followed: 1) Data collection-sediment sampling-grain size analysis, 2) Spatial interpolation analysis in a GIS environment and 3) Construction of sediment distribution maps utilizing the optimal interpolator. As shown in Fig. 2 the methodology starts with the collection and sampling of coastal sediments in the study area, which is subjected to grain size analysis in the laboratory, providing with this way useful information to define and reveal the depositional environment.

Fig. 2
figure 2

Flowchart showing the methodological approach used in this study

Subsequently, grain size analysis results were inserted in a GIS environment and specifically the median grain size (D50) to produce spatial interpolation maps applying geostatistical and deterministic methodologies. Cross-validation of the interpolated results is required to find the optimal spatial interpolator. Errors, Pearson Correlation and R2 are calculated for each different interpolator and the one with the higher appropriateness index (AI) value will be considered the optimal.

Data collection-grain size analysis and depositional environment

A total of one hundred fifty-one (151) samples of coastal sediments from seventy-five (75) different positions were collected along the coastline, which were taken from the top 0 to 10 cm at each sample position. Seventy-five (75) samples were taken from Sampling Zone 1 and seventy-six (76) from Sampling Zone 2. A global Positioning System (GPS) was used to accurately record the sampling positions. Sampling in several areas was difficult due to the morphology of the coast (pebbles, full coverage with algae, etc.) and its difficult access to the sample positions due to the existence of various structures and houses near the coastline.

The collected samples of surficial coastal sediments were subjected to grain size analysis with the dry sieving methodology and were classified according to the ASTM standards. The calculated parameters of sediments, which were extracted from the grain size distribution curve and used for the soil classification are the D50, D30, and D10 particle size, the uniformity coefficient (Cu) and the curvature coefficient (Cc). These are the geometric properties of a grading curve, which describes a particular type of soil. Statistical parameters such as mean, sorting, skewness and kurtosis using the GRADISTAT V.4 software and the Folk and Ward (1957) logarithmic method (Blott and Pye 2001) were also calculated.

The depositional environment of the sediment samples was estimated with the linear discrimination function (Basanta K. Sahu 1964), which is used in many research papers for revealing the deposition environment (Alsharhan and El-Sammak 2004; Theodorakopoulou et al. 2012; Baiyegunhi et al. 2017; Emmanouilidis et al. 2018). To discriminate the depositional environments of the aeolian and beach the following equation was used:

$$Y1=-3.5688M +3.7016{r}^{2} -2.0766SK +3.1135KG$$
(1)

where M = mean, r2 = Sorting, SK = Skewness and KG = Kurtosis. If Y1 is <  − 2.7411, the suggested environment is the aeolian and if Y1 is >  − 2.7411, the deposition environment is the beach (Basanta K. Sahu 1964).

To distinguish the depositional environments between the beach and shallow marine depositions the following equation was used:

$$Y2=15.6534M+65.7091{r}^{2}+18.1071SK +18.5043KG$$
(2)

If Y2 < 65.3650 beach deposition is indicated and if Y2 > 65.3650 a shallow marine environment is suggested (Basanta K. Sahu 1964).

Spatial interpolation analysis in GIS-Semi-variogram and Kriging interpolation

The initial step before the interpolation with geostatistical methods is the calculation of the experimental variogram (semi-variogram) from the data set. The semi-variogram graph explains the relationship between the point samples' distances and their possible semi-variogram values (Meilianda et al. 2012). The equation that describes the semi-variogram is:

$$\gamma \left(h\right)=\frac{1}{2N(h)}{\sum_{i}^{N(h)}{[Z}_{{(x}_{i})}-Z({x}_{i}+h)]}^{2}$$
(3)

where γ(h) is the estimated values of the semi-variogram, which symbolize the variance of the observation values separated by a distance h, xi is the position of the samples Z(xi) and (xi + h) are the positions of the samples Z(xi + h), N(h) is number of Z(xi) pairs of measurements separated by h.

When the distance of a pair of a sample is low, the difference between sampled values is also expected to be low. This means that samples located nearby are very possible to have similar values, but samples located at a larger distance from each other are expected to have different values, which are increasing as the distance is higher.

The semi-variogram diagram is consisted of and is described by three key parameters, the nugget, the sill and the range. Nugget (C0) represents the measurement and data errors or random spatial sources of variation at distances smaller than the sampling interval or both and represents the value of the initial variability. Range (a) is the distance where the semi-variogram reaches the total sill (C0 + C1) and after that distance, there is no spatial correlation of the data. Sill is the value that the semi-variogram reaches the range and represents the maximum variability, while partial sill is the sill minus the nugget (C1—C0). The use of the nugget-sill ratio (C0/C0 + C1) was applied for the estimation of the spatial dependence of the variables (Jerosch 2013; Adhikary and Dash 2017; Tziachris et al. 2017). A ratio of less than 25% means strong spatial dependence, while a ratio between 25 and 75% indicates a moderate spatial dependence and a ratio over 75% shows a weak spatial dependence (Cambardella et al. 1994).

Spatial interpolation analysis in GIS-Interpolation methods

Four geostatistical techniques (OK, UK, SK and EBK) and three deterministic (IDW, RBF and LPI) algorithms were implemented to interpolate the median grain size D50 in the coastal zone of the gulf of Patras, with the Geostatistical Analyst tool of the ArcGIS 10.8 software. Generally, deterministic approaches use either the degree of smoothing (LPI and RBF) or the distance between the points, such as the IDW. Geostatistical models are based on the statistical properties of the data and in the semi-variogram. Kriging as a geostatistical interpolator can be described with the acronym of BLUE-Best Linear Unbiased Estimator and rely on an assumption of a linear relationship between the response and explanatory variables (Aalto et al. 2013; Burrough and McDonnell, 1998; Goovaerts 1999). It is considered the best because the expected squared difference between the estimated value and the known is the minimum for all possible linear estimators. Linear since the estimator is formed by linear weighing of the available samples. Unbiased estimator because it defines that the expected error is equal to zero. Finally, Kriging is mentioned as an extract estimator due to that estimates a value in an unknown position where there are no data and samples.

Inverse Distance Weighting (IDW)

IDW is one of the most used methods among deterministic interpolation techniques. This method assumes that the measured values at a closer distance have greater weight than those further away. The influence of a known value is inversely related to the distance from the unknown data point. Consequently, this method gives greater weights to values closest to the prediction position and the weights reduce as a function of distance. IDW determines cell values using a linearly weighted combination of a set of sample points.

$$Z{(x}_{o})=\frac{\sum_{i=1}^{n}\frac{{x}_{i}}{{d}_{ij}^{r}}}{\sum_{i=1}^{n}\frac{1}{{d}_{ij}^{r}}}$$
(4)

where Z(xo) is the estimated unknown value, x is the set of spatial coordinates (x1, x2), and r is the weight related to the distances dij which is the distance between the estimation point of the n data points.

Local Rolynomial Interpolation (LPI)

Local Polynomial Interpolation (LPI) is a deterministic interpolation technique, which fits many polynomials mathematical functions, in contrast with Global Polynomial Interpolation (GPI) which fits a unique polynomial function to the entire surface (Johnston et al. 2001). Therefore, LPI fits the specified order (zero, first, second and third) using all points only within the defined overlapping neighbourhood, which is used as a value for each prediction in the fitted polynomial at the centre of the neighbourhood (Johnston et al. 2001). GPI is useful for identifying long-range trends in the dataset, whereas LPI can produce surfaces that capture the short-range variation.

Radial Basis Function (RBF)

The Radial Basis Function (RBF) is an interpolator which is based on a form of artificial neural networks i.e. input layer, hidden layers, and output layer (Johnston et al. 2001; Ali et al. 2021). In RBF, the generated surface requires passing through each measured point while minimizing the total curvature of the surface (Johnston et al. 2001). In addition, RBF can predict values above the maximum and below the minimum. The interpolator basis functions that are covered by the RBF include a thin-plate spline, tension-based spline, completely regularized spline, multi-quadric function and inverse multi-quadric spline.

Ordinary Kriging (OK)

The OK interpolation method is the most common and simple among the Kriging techniques, which incorporates statistical properties of the measured data (spatial autocorrelation) (Bhunia et al. 2018). Ordinary kriging is also described by the acronym BLUE, as mentioned above. OK assumes that the constant mean is unknown and defined as the mean of samples (local mean). As mentioned above, the first step for the application of Kriging is the estimation of the semi-variogram. The equation used for OK interpolation is:

$${Z}_{OK}{(x}_{o})=\sum_{i=1}^{N}{\lambda }_{\iota }{{\rm Z}(x}_{i})$$
(5)
$$\sum_{i=1}^{N}{\lambda }_{\iota }=1$$
(6)

where ZOK(xo) is the interpolated value for point (xo), Z(xi) is the known value, λi is the OK weight for the Z(xi) value. In addition, λi values must be evaluated to obtain an unbiased estimation and to minimize the error variance (Pellicone et al. 2018).

Simple Kriging (SK)

In contrast with OK, the application of Simple Kriging (SK) presupposes the assumption of stationarity. SK considers μ to be known and constant all over the study area, unlike with the OK type, where the μ is unknown and is considered to fluctuate locally, maintaining the stationarity within the local neighbourhood (Moral 2010). The equation used for SK interpolation is:

$${Z}_{SK}{(x}_{o})=\sum_{i=1}^{N}{\lambda }_{\iota }{{\rm Z}(x}_{i})+\left(1-\sum_{i=1}^{N}{\lambda }_{\iota }\right)\mu$$
(7)

where μ is a known stationary mean

Universal Kriging (UK)

Universal Kriging is an interpolation method with a changing mean and the trend is modelled as a linear combination of functions of the spatial coordinates. The UK also uses a linear trend function μ(xi) rather than relying on a constant trend function μ, like the other Kriging interpolators. It is similar to OK because it considers that the trend (μ) is not constant over the whole field but depends on the spatial position of the observation. According to Goovarets (1997), the UK is a univariate method, but some studies classify it as multivariate because it uses coordinate information (Li and Heap 2014).

Empirical Bayesian Kriging (EBK)

According to (Krivoruchko and Gribov 2019) EBK consists of two geostatistical models: the intrinsic random function kriging (IRFK) and the linear mixed model (LMM). In EBK, the stochastic spatial process is represented locally as a stationary or nonstationary random field and the parameters of the locally defined random field are allowed to vary across space (Gribov and Krivoruchko 2020). Empirical Bayesian kriging also differs from other kriging methods by considering the error introduced by estimating the underlying semivariogram. Classic Kriging techniques obtained semi-variogram is evaluated from known data positions and is considered as the single, true semi-variogram and it is used to make predictions at unknown positions by not taking the uncertainty in the semi-variogram estimation into account, thus underestimating the standard errors of predictions (Schneider and Martinoni 2001; Pellicone et al. 2018). EBK is a geostatistical interpolation method that automates the most difficult aspects of building a valid kriging model through a process of subsetting the study area, coupled with multiple simulations to obtain the best fit (Krivoruchko and Gribov 2019). This process finally creates a spectrum of semi-variograms and each of these is an estimate of the true semi-variogram for the subset (Pellicone et al. 2018).

Optimal interpolation-cross validation methods

The cross-validation method was applied for the evaluation and the prediction accuracy in the unknown values of the different interpolation techniques to find the optimal one. Consequently, the objective of cross-validation is to give information about which model provides the most accurate predictions and how well the prediction works (Schneider and Martinoni 2001).

In the present study, the prediction accuracy of the cross-validation method was evaluated using the statistical parameters of Mean Error (ME), Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root-Mean-Square-Error (RMSE). In an accurate model that provides good results and predictions, the ME must be close to 0 and RMSE should be as small as possible The two most commonly used for cross-validation are RMSE and ME. Statistical parameters of the errors were calculated using both the Geostatistical Analyst Tool in the ArcGIS software and the statistical software R.

Mean Error is the measure of the bias of the prediction and can be calculated with the following equation:

$$ME=\frac{{\sum }_{i=1}^{n}[{Z}_{{x}_{i}}^{*}-{Z}_{{x}_{i}}]}{n}$$
(8)

RMSE measure the difference between the predicted and the observed values and estimates the standard deviation of the residuals. It can be calculated by the following equation:

$$RMSE=\sqrt{\frac{{{\sum }_{i=1}^{n}[{Z}_{{x}_{i}}^{*}-{Z}_{{x}_{i}}]}^{2}}{n}}$$
(9)

In addition, for the cross validation the Pearson correlation (P. Correlation) and the coefficient of determination (R2) between the observed and predicted values were used. The spatial dependence of the variables in the semi-variogram was estimated by the sill-nugget ratio.

In order to simplify the evaluation of the optimal interpolator selection, an appropriateness index (AI) (Ali et al. 2021) was applied in the calculated cross-validation parameters. The computation of AI requires the normalization of the values into non-dimensional using the minimum–maximum normalization method. The methodology of the appropriateness index (AI) is described in Ali et al. (2021).

Results

Grain size analysis

As it was mentioned before, the coastal sediment samples were divided into two different zones (Zone 1 and 2), which represent the shoreline (Zone 1) and the seabed materials (Zone 2). The descriptive statistics of the coastal sediments from the grain size analysis in Zones 1 and 2 are presented in Table 1.

Table 1 Descriptive statistics of coastal sediments derived by the grain size analysis (D50: arithmetic and LogD50: logarithmic)

Grain size analysis, classification, and statistical parameters indicate that Sampling Zone 1 is mainly consisted of coarse and gravelly sands, with D50 = 1.97 mm. On the other hand Sampling Zone 2 mainly consisted of fine and silty sands, with D50 = 0.63 mm.

Sediments of Ζone 1 are poorly to very poorly sorted, with a coarse to symmetrical skewness and very platykurtic to leptokurtic kurtosis. On the other hand Sediments of Ζone 1 are poorly to moderately sorted and well-sorted, with a coarse to very fine skewness and platykurtic to very leptokurtic kurtosis. An example of the geometrical parameters required for the classification and description of the sediment samples according to the ASTM standards is presented in Table 2 while the interrelationship between the grain size parameters (sorting, mean, skewness and kurtosis) of the two sampling zones is presented in bivariate plots in Fig. 3.

Table 2 Grain size analysis and classification of the Sub Region 1: Vrachnaika-Tsoukaleika-Kaminia
Fig. 3
figure 3

Bivariate plots between the statistical parameters (Mean, Sorting, Skewness and Kurtosis) of the two Sampling Zones

In Sampling Zone 1 (shoreline materials), the grain size analysis shows that the coarser sediments (gravels and cobbles) are located in Sub-region 2 in the mouth of the Peiros River with a grain size of 4.5 to 5.5 mm. Coarse sediments (coarse sands and gravelly sands) are also existing in all sub-regions, especially in Vrachnaika, Kaminia, the western part of Alissos, Niforeika and a small part of Karnari. The rest of Sampling Zone 1 consists of medium to coarse sands with a grain size of around 1 to 2.5 mm. On the contrary, Sampling Zone 2 (seabed materials) is mainly covered by fine sediments with a grain size of around 0.2 to 0.7 mm, except in a few areas such as the Peiros River and small parts of Alissos, Kaminia and Karnari, which appear with coarser sediments.

Implementation of Basanta K. Sahu, (1964) functions for the differentiation of the deposition environments between aeolian/beach and beach/shallow marine depositions show that most of the sediment samples belong to the beach/shallow marine environment (Fig. 4).

Fig. 4
figure 4

Differentiation of deposition environment in the study area

The statistical normal distribution of the sampling density was tested by plotting histograms in an arithmetic scale (Fig. 5a and 5c), applying the Q-Q plots which represent the data distribution against the expected theoritical normal distribution as well as the Kolmogorov–Smirnov tests which examine the null hypothesis that a set of data becomes from a normal distribution. According to these tests, for both Zones 1 and 2, the sampling data sets are not normally distributed. Even though Kriging can provide good interpolation results in non-normal data, the optimal results can be interpolated when the data are normal or close to normal distributions. Hence, the sampling data sets were log-transformed to achieve distributions as much closer to normal (Fig. 5b and 5d). After the log transformation, the sampling data set of Zone 1 is normally distributed (Kolmogorov test: p-value = 0.742), while the sampling data set of Zone 2 is not (Kolmogorov test: p-value < 0.05).

Fig. 5
figure 5

Frequency distribution of a) D50 mm and b) LogD50 mm, of sediments in Sampling Zone1 and c) D50 mm and d) LogD50 mm in Sampling Zone 2

A co-kriging interpolation with auxiliary parameters was also applied, using the depth of water (m) and the mud content (%). However, the Pearson correlation coefficient, which is the linear correlation between two datasets, for those two variables to the median grain size (D50 mm) showed a very low correlation and the run of a co-kriging model was not feasible to give a better interpolation solution for the current study.

Semi-variogram modeling

The construction of a semi-variogram requires the selection of input data, which are affecting significantly the final interpolating results. Those inputs are the number of minimum and maximum search neighborhood especially for the geostatistical methodologies, the order of polynomial for LPI, kernel function for RBF and the power for IDW (Table 3). The selection of the final values of each input was decided so that the final results obtained the lowest possible errors in the cross-validation outputs.

Table 3 Selection value of input data for each interpolator

The number of minimum and maximum search neighborhood is five (5) and (10) respectively, while the power function (p) for IDW is equal to 2 (p = 2). The special environment of the coastal zone makes difficult the interpolation of long distanced samples either because there are no beaches in several areas due to erosion or because the D50 values vary per area across the coastline. This is the reason of the selection of those specific values for search neighborhood and power function (p). Therefore, in order to make the interpolation as realistic as possible, a relatively small number of search neighbors and a power equal to 2 were chosen (for IDW), giving a greater weighting to the closest points. The selection of Completely regularized Spline as kernel function and the selection of a first order polynomial in LPI interpolator was finally decided because they had the lowest errors in cross validation.

Furthermore, based on the inputs data of Table 3 and for each geostatistical interpolation method (except EBK), the omnidirectional semi-variogram parameters nugget, partial sill, and range along with the nugget-sill ratio were calculated (Table 4).

Table 4 The semi-variogram parameters for each interpolation method with the Nugget-Sill ratio

In LogSK_Zone 1 the nugget-sill ratio is 10.74% whereas in LogSK_Zone 2 is 59.18%, indicating a strong and moderate spatial dependence, respectively. LogUK_Zone 1 and LogUK_Zone 2 have the highest ratio with 28.79% and 55.75% respectively, indicating a moderate spatial dependence. LogOk Zones 1 and 2 were estimated with the best nugget-sill ratio of 10.42% and 19.00%, respectively, showing a strong dependence between the variables. The range of Sampling Zone 1 was lower (780.34 to 2712.19 m) compared with the range of Sampling Zone 2 (1467.7 to 4737.06 m). Therefore, LogOK_Zone 1 was calculated with the lowest range (780.34 m) and LogOK_Zone 1 with the highest (4737.06 m).

In all cases, the spherical model was chosen because it yielded the smallest errors among the other models (spherical, exponential and Gaussian). The omnidirectional semi-variogram for the OK method and their fitted models can be seen in Fig. 6.

Fig. 6
figure 6

Omnidirectional semi-variogram for Ordinary Kriging (OK) for spherical, gauss and exponential model for Zone 1

Sediment distribution and prediction maps

The produced sediment distribution map using deterministic and geostatistical methods, for both Ζones 1 and 2 and the three sub-regions (Vrachnaika-Roitika-Kaminia, Alissos-Kato Achaia-Niforeika, Ioniki Akti-Karnari-Araxos) are presented in Figs. 7 and 8. Generally, the visual examination of the sediment distribution maps indicates similar and smooth spatial patterns with minor differences between the different spatial interpolation techniques.

Fig. 7
figure 7

Sediment distribution map of LogD50 (mm) with the deterministic approaches

Fig. 8
figure 8

Sediment distribution map of LogD50 (mm) with the geostatistical approaches

In the deterministic distribution maps of LogD50 (Fig. 7), the main differences observed are: 1) parts of the Vrachnaika seabed materials, were estimated with higher grain size values among the different deterministic interpolation techniques (higher values in RBF in comparison with IDW and LPI method), 2) IDW indicates higher values from RBF and LPI interpolation in Tsoukaleika, Kaminia, Alisos and Peiros River areas (seabed zone), 3) in Alikes area the shoreline materials were calculated with higher grain size with the LPI and lower with the RBF and IDW method, 4) IDW and RBF has higher values in comparison with LPI in Kalamaki shore materials, 5) in Ioniki akti area for seabed zone the IDW and LPI has higher values of D50 than RBF and, 6) in Karnari area the IDW predicted with higher grain size and lower with RBF and LPI (shoreline zone).

In the geostatistical prediction maps of LogD50 (Fig. 8), the depicted grain sizes of the distributed sediments, reveal fewer differences than the deterministic ones. The most significant differences between the maps are: 1) in the Vrachnaika-Tsoukaleika-Kaminia area all the interpolations show similar behaviour in the distribution and prediction of D50 (mm), with small variations only in the western part of Kaminia and Alisos area, where the OK method calculated with higher values compared to EBK, SK and UK, 2) SK shows lower D50 values and UK higher compared to the other methods in Peiros River zone 2, 3) OK, EBK and SK indicate similar behaviour in Kato Achaia, Niforeika, Kalamaki and Ioniki akti for both zones, except the UK, which depicted with small variations in the rest of the area and, 4) in Karnari the OK, EBK and SK show similar trends and are characterized with higher sediment grain size in comparison with the UK spatial interpolation.

Cross validation

Comparison between deterministic and geostatistical methods

A cross-validation method was applied to compare the performance of the different interpolation methods used in this study for Zone 1 and 2. The results of cross-validation are presented in Table 5.

Table 5 Cross-Validation results of LogD50 (mm) predictions for Zone 1 (Shore) and 2 (Bed). The presented values in the table are normalized using the minimum–maximum normalization method

The result shows that geostatistical methods outperform the deterministic. Statistical errors are lower almost in all geostatistical methods in comparison with the deterministic methods except UK, which has the highest error among all the spatial interpolation techniques. In addition, the Pearson Correlation and the coefficient of determination (R2) in the linear regression model between the measured versus the predicted values was higher in the geostatistical (kriging) interpolations compared with the deterministic, with the exception again of the UK which has low values of R2.

The optimal interpolation method based on AI for Zone 1 proved to be the OK method, while in position 2 and 3 were the EBK and SK, respectively. Correspondingly, the optimal interpolation method for Zone 2 proved to be the EBK method, followed by the deterministic technique RBF in position 2 and SK in position 3 (Table 5). Despite that RBF is deterministic method, proved to be in the second place in Zone 2, since the calculated errors are low and the P. Corellation and R2 values are high. The cross-validation results can also be observed in the heatmap of Fig. 9, where can be ssen the optimal performance of the OK_Shore and EBK_Bed (with yellow color) compared to the other interpolation methods.

Fig. 9
figure 9

Heatmap with the normalized values (from 0 to 1) of cross-validation methodology for each statistical parameter and spatial interpolation technique

The scatter plots between the measured versus the predicted values of the sediment median grain size (LogD50) are presented in Fig. 10 for both the deterministic and the geostatistical interpolations, respectively.

Fig. 10
figure 10

Scatter plots of Measured vs Predicted LogD50 values for the interpolations

Comparison between deterministic methods

The optimal interpolation from the deterministic methodologies for both zones 1 and 2 has the RBF interpolation, which has the lowest errors, the highest P. Correlation, R2 and AI (0.698 in Zone 1 and 0.813 in Zone 2) in both zones. IDW and LPI in Zone 1 have similar behaviour in errors values, however IDW has a better performance in P. Correlation and R2 and as a result the AI index ranked it in higher position. On contrary, in Zone 2 LPI proved to be more efficient since has the smallest ME and RMSE errors and higher P. Correlation, R2 and AI (IDW = 0, LPI = 0.745). In any case, the error values for the three different deterministic approaches were very close and agrees with the small differences which were found during the visual examination of the maps.

Comparison between geostatistical methods

According to Table 4, in the shoreline Zone 1, the lowest RMSE, MSE, MAE, P. Correlation and R2, and as a result the highest AI index (0.924) has the Ordinary Kriging (OK). While in the seabed Zone 2, the lowest ME, P. Corellation and R2, and the highest AI index (0.949) has the Empirical Bayesian Kriging (EBK). Despite that SK was not the best interpolator for sediment distribution for none of the two zones, it indicates similar RMSE and R2 with OK in Zone 1 and higher R2 than OK in Zone 2. Therefore, the error values obtained from the cross-validation method are very close in all interpolations except the UK which had the worst interpolator in all cases.

Results and calculated values from the cross-validation agree with the visual examination of the distribution maps, especially in the SK, OK and EBK interpolations, which depict similar distributions with negligible differences (except in a few parts of the study area). This fact is confirmed by cross-validation since the calculated prediction errors are very close in those three interpolations. On the contrary, the UK method has significant differences in both the distribution maps and the error parameters compared to the other three methods. Therefore, the best and more accurate prediction for Zone 1 is the OK and for Zone 2 is the EBK interpolation.

Discussion-Conclusions

The performance of different spatial interpolation methods for the production of sediment distribution maps is examined in this research, with the ultimate goal of using the respective maps in coastal vulnerability index calculations and applications related to coastal management.

For this purpose four geostatistical (OK, UK, SK and EBK) and three deterministic techniques (IDW, RBF and LPI) were applied in the coastal sediments of the Gulf of Patras in Western Greece, to interpolate the median grain size D50 (mm) and choose the optimal technique for this particular area. The final results indicate that the geostatistical interpolations outperform the deterministic ones.

Many researchers have investigated in the past which interpolation technique has better performance in several other geoenvironmental cases. In most of them results showed that the geostatistical methods usually have a better performance than the deterministic ones (Li and Heap 2011; Yao et al. 2014), especially if combined with auxiliary parameters (Moral 2010; Kalivas et al. 2013; Pellicone et al. 2018; Ananias et al. 2021). Nevertheless, in some studies, deterministic methods outperformed the geostatistical ones. According to Adhikary and Dash (2017) and Wen et al. (2022) RBF outperform the OK method (similar to the Sampling Zone 2 of the current research) in the prediction of groundwater table. Zarco-Perello and Simões, (2017) estimated that the IDW outperform the OK method in the prediction of distribution of the habitat sessile organisms in marine ecosystems. Qiao et al., (2018) recommended that the prediction accuracy of IDW was higher than OK in the spatial prediction of As concentration in the soils of Beijing.

Comparing the deterministic interpolations in already published papers has been observed that the calculated optimal interpolator for each study was different. In this study RBF was calculated as the optimal interpolator among the other deterministic interpolation methods (LPI and IDW). Wang et al., (2014) compared LPI, RBF and IDW interpolations for estimating the spatial distribution of precipitation in Ontario, Canada and showed that LPI was the best method, while Bhunia et al., (2018) proved also that LPI had higher prediction accuracy than IDW and RBF in the spatial distribution of soil organic carbon (SOC). Wu et al., (2019) compared LPI, RBF and IDW interpolations for estimating spatial distribution of the historical hydrographic data of the Mississippi River and found that RBF outperformed IDW and LPI, while Adhikary and Dash, (2017) showed that RBF performance was higher compared to IDW. Chen et al (2016) recommend that IDW is a better interpolator rather than LPI for determining fishery resources density in the Yellow Sea.

In this study the geostatistical spatial interpolations provide similar results regarding which one of them generates higher prediction accuracy. The general trend indicates that OK and EBK have the best performance. However, the optimal interpolator is not always the same. Bhunia et al., (2018), made a comparison between OK and EBK for the distribution of soil organic carbon (SOC) and found that OK has lower errors and the highest R2, while Daya and Bejari, (2015) calculated that OK is more efficient as an interpolator compared with the SK when modelling the Cu concentration in Chehlkureh deposits, in SE Iran. Ali et al., (2021) calculated the normal annual rainfall in Pakistan based on different interpolation methods and revealed that EBK and empirical Bayesian kriging regression prediction (EBKRP) methods are more efficient than the OK, SK and UK. However, the results of the geostatistical analysis show that it is possible to generate similar spatial distribution maps with different Kriging interpolation methods (OK, UK, SK, EBK) with small differences and similar error and R2 values, similar to this research. Moral, (2010) showed that if using OK, SK and UK, the generated precipitation maps are similar both in visual comparison and prediction errors.

For the production of sediment distribution maps, the majority of research papers implement multivariate methods (Co-Kriging, KED etc.) with the use of bathymetry and mud content as auxiliary information (Verfaillie et al. 2006; Stephens et al. 2011; Meilianda et al. 2012; Bockelmann et al. 2018; Zananiri and Vakalas 2019). Jerosch, (2013), developed a Co-kriging (COK) model with three parameters (slope, bathymetry, and cost distance from river) to construct a predictive multi-parametric sediment type map and compared OK-COK for sediment distribution recommending the use of a combination of both. Verfaillie et al., (2006) and Stephens et al., (2011), generated seabed sediment prediction maps with the implementation of OK and KED with bathymetry Digital Elevation Model (DEM) as an auxiliary variable and the KED proved to be better than OK. However, if the auxiliary variables are not correlated, the interpolation is not better compared to univariate geostatistics and that was the reason for not using the Co-Kriging method in this study. This happens because the interpolation in the shore materials of the study area (sampling Zone 1) was performed without big proportions of mud content and shallow water until the 5-m depth contour (sampling Zone 2), since the production of the maps was carried out for coastal erosion purposes and not for seabed sediment distribution, where a bathymetry parameter can be used for Co-Kriging interpolation.

In any case, spatial interpolation of any value depends on the data set, sampling density, case study, and the scientific field that is applied (e.g. sediment distribution, precipitation etc.). Hence, the comparative study between the different interpolator methods is recommended before any final decision can be made on which spatial interpolation technique is better to implement with the highest predictive power and the smallest prediction errors. In this way, it is feasible to find the most appropriate method for a given data set to generate the most accurate prediction maps (Al-Mamoori et al. 2021; Wen et al. 2022). Those maps and their results can be implemented in a plethora of other applications; e.g. for modelling many environmental and engineering applications in the coastal zone. In addition, these maps can be incorporated as inputs in risk assessment analysis against coastal hazards (sea level rise, coastal erosion etc.), especially in models such as the Coastal Vulnerability Index (Boumboulis et al. 2021).

The output map of this study can be used in the assessment and calculation of the Coastal Vulnerability Index (CVI) of the Gulf of Patras. Thus, it was very important to find which interpolation method may give the best prediction regarding the distribution of the median (D50 mm) grain size of the surficial coastal sediments, since it is a significant input parameter for the calculation of CVI. Furthermore, the data and the produced maps can be used by the local authorities in further assessments towards the understanding of processes, planning and management of the coastal area of the Gulf of Patras.

Future research for the enhancement of the results of this study could be a) the application of spatial interpolation techniques with different and more advanced methodologies such as Machine Learning (ML) and b) the development of a comparative study between the spatial interpolation techniques and the influence of those parameters in the final interpolated results and the cross-validation values (RMSE, ME etc.). In addition, different cross-validation techniques could be implemented like Monte Carlo and k-fold, to assess the potential error difference between those methods.