Statistical analysis of the best GIS interpolation method for bearing capacity estimation in An-Najaf City, Iraq

The presence of an economical solution to predict soil behaviour is essential for new construction areas. This paper aims to investigate the ultimate interpolation method for predicting the soil bearing capacity of An-Najaf city-Iraq based on field investigation information. Firstly, the engineering bearing capacity was calculated based on the in-site N-SPT values using dynamic loading for 464 boreholes with depths of 0–2 m, using the Meyerhof formula. The data then were classified and imported to the GIS program to apply the interpolation methods. Four deterministic and two geostatistical interpolation methods were applied to produce six bearing capacity maps. The statistical analyses were performed using two methods: the common cross-validation method by the coefficient of determination (R2) and root mean square error (RMSE), where the results showed that ordinary kriging (OK) is the ultimate method with the least RMSE and highest R2. These results were confusing so, the backward elimination regression (BER) procedure was applied to gain the definite result. The results of BER show that among all the deterministic methods, the IDW is the optimal and most significant interpolation method. The result of geostatistical methods shows that EBK is the best method in our case than the OK method. BER also applied to all six methods and shows that IDW is the ultimate significant method. The results indicate no general ultimate interpolation method for all cases and datasets type; therefore, the statistical analyses must be performed for each case and dataset.


Introduction
The Allowable bearing capacity is an influential geotechnical parameter used to decide the most convenient foundation for a particular structure. Bearing capacity is essential for the type and depth of foundations to prevent damages, especially those resulting from loads and earthquakes . A Geodatabase for soil geotechnical properties can help save a high portion of the total project cost. Generally, it is a complicated, time-consuming, and expensive process to create a Geodatabase for a specific area because it requires considerable data. Instead, using interpolation methods to predict the non-spatial points from the existing ones can be very helpful.
Different methods for spatial interpolation have been developed. Many factors affect the predictions and the results of spatial interpolation of the methods. This reason increases the difficulty in selecting the optimal input data method (Li and Heap 2014). Spatial interpolation methods are increasingly used in many disciplines, such as botany, geology, agriculture, hydrology, and climatology (Bernardi et al. 2017;Xu et al. 2015;Emadi and Baghernejad 2014;Hadi and Tombul 2018;Eldrandaly and Abu-Zaid 2011;Hernandez-Stefanoni and Ponce-Hernandez 2006).
Multiple approaches and methodologies were used to estimate soil engineering properties using GIS interpolation tools. Kravchenko and Bullock (1999) evaluated three interpolation methods (i.e., inverse distance weighting IDW, ordinary kriging OK, and lognormal ordinary kriging LOK) to determine the best interpolation method for mapping soil properties (Kravchenko and Bullock 1999). El May et al. (2010) study the urban area using the environmental geologic map using the GIS program to map the geotechnical zoning for safe construction (El May et al. 2010). Orhan and Tosun (2010) used the GIS program to prepare geotechnical maps for several soil properties that fully visualize the study area's whole sub-surface. The zones maps are implemented to specify the appropriate foundation soils in the residential (Orhan and Tosun 2010). Ali and Fakhraldin (2016) investigate the soil's physical and chemical properties to give base data, which can be used for future construction design (Ali and Fakhraldin 2016). Sohaib K. et al. conducted many studies to interpolate soil properties in developing a geotechnical database for An-Najaf city in (Al-Mamoori 2017;Al-Mamoori et al. 2018, 2019a, b, 2020a. Selecting the convenient interpolation technique is the most important factor to gain the most accurate results as different interpolation methods will give different results. Among diverse interpolation methods, none is unequivocally optimal; thus, the best method for a particular case will only be obtained by applying the possible interpolation methods and comparing them. There were many studies regarding this aspect. For example, Luo et al. (2008) carried out a study to compare seven interpolation methods to choose the best method for mean daily wind speed grids; the data used were from 190 points across England and Wales. Among the seven methods technique (TSA, IDW, LPI, TPS, UK, Cokriging, and OK), the results indicated that OK and UK are the optimal interpolation methods for this particular case (Luo et al. 2008). Another study compares three interpolation methods (i.e. IDW, radial basis function (RBF), and Kriging (including OK, simple kriging SK, and UK) to interpolate groundwater depth. These methods were applied to data belong to 22 years from forty-eight wells in the Minqin oasis area. The results showed that SK is the best method for interpolating groundwater depths (Sun et al. 2009). Robinson and Metternicht (2006) implement and compare four interpolation methods (i.e., OK, LOK, IDW, and Splines) to interpolate some soil properties. The results have shown no single interpolation method that can produce the most accurate results to generate continuous maps of soil properties (Robinson and Metternicht 2006).
The purpose of this paper is to statistically determine the optimal GIS-based interpolation method for soil Bearing Capacity data collected from 464 sites. Six GIS-based spatial interpolation methods have been implemented.

Study area
31.933 latitudes and 44.2833 longitudes bound An-Najaf city (including Kufa). It covers an area of about 105.1 km 2 An-Najaf city has 81 neighbourhoods Fig. 1.
Arid and semi-arid are the main characteristics of An-Najaf's climate where the summer is hot and dry (Sabreen et al. 2020;Farhan et al. 2019), with an average temperature of 45 °C. The winter is short, cold, and wet, with an average temperature of 24 °C. The average annual rainfall is 100 mm in the wet period and 30 mm in the dry (Jasim et. al. 2020;Kareem et al. 2021).

Data
The study is dependent on data from soil samples collected from 464 sites in the study area. The utilized data were taken from the "National Center for Construction Laboratories and Researches (NCCLR)-Babylon" based on a cooperation memo. NCCLR-Babylon is a division of the "Iraqi National Center for Construction Laboratories and Research (NCCLR)". The data were collected for 464 sites, as shown in Fig. 2 (Governorate of Babylon 2016).

Methods
Bearing capacity data were performed by ArcGIS 10.7 using Geostatistical Analyst Wizard GAW. GAW has two types of interpolation techniques: deterministic and geostatistical. Each type has a group of interpolation methods. Most of the methods depend on point data to produce a surface or grid (Bolstad 2016;Ogryzek et al. 2020;Peng et al. 2017, Abbass Jasim et al. 2017. In this paper, both interpolation techniques were used for mapping soil engineering bearing capacity point data. The deterministic interpolation techniques, including four methods (IDW), local polynomial interpolation (LPI), radial basis functions (RBF) and global polynomial interpolation (GPI). The geostatistical interpolation technique has two methods ordinary kriging (OK) and empirical bayes kriging (EBK).

Interpolation methods
Two groups of interpolation methods have been used in this study which are as the followings:

Deterministic methods
These methods produce surfaces depending on either the resemblance or the degree of homogeneity of the examined points. These methods can be divided into global and local. Global methods utilize all the data to predict the unknown values, while Local methods use only the data within neighbourhoods to predict the unknown values from the observation points. GAW provides four deterministic interpolation techniques. One of them is a global interpolator (i.e., GPI), and the other three are local interpolators (i.e., IDW, LPI, and RBF). Deterministic methods are precise interpolators and can produce a surface that goes along the control points. Furthermore, imprecise interpolators can produce a value that differs from its value at the point's location (i.e. GPI and LPI) (Eldrandaly and Abu-Zaid 2011). In this paper, all four methods were used (IDW, LPI, RBF, and GPI), and below is some information about each method: Inverse distance weighted (IDW) It presumes that the connection and resemblance rate among contiguous points is commensurate to the distance between them. Each interpolated point has a weight equal to the inverse of its distance from the point (i.e. the closest place to the known point has more weight than the far ones) (Robinson and Metternicht 2006; Setianto et al. 2013;Bhunia et al. 2018). In the IDW method, the unknown points value is estimated using the following equation (Eq. 1).

Fig. 2 Soil bearing capacity samples' location
The optimal power in the IDW surface in our study is (1.427).
where:z(x 0 ): the interpolated value, n: the values of total sample data, x i : the ith data value, h ij : the separation distance between interpolated value and the sample data value, ß: the weighting power.
Local polynomial interpolation (LPI) LPI can generate a grid surface that picks up the short-range variance (Johnston et al. 2001). It is suitable for interpolating points only within the area of a specific neighbourhood rather than the whole dataset (Hani and Abari 2011). Thus, the overlapping among neighbourhoods may occur, while the cell's value at the centre of any neighbourhood is estimated as predicted. The optimal Goodness of Fit in the study area is (2.39).

Radial basis function (RBF)
RBF creates a surface grid to go through all measured points. Consequently, it is considered a precise technique. It generates similar values to those measured at the same point, and the predicted values may fluctuate above and below the maximum and minimum of the observed values (Pham et al. 2020) (Li et al. 2011). RBF is an inappropriate method if there is a great value change in short distances (Johnston et al. 2001). This method is similar to IDW, but the fluctuation of predicated value. There is also a smoothening parameter to control the smoothness of the resulting surface (Biancolini et al. 2018), and the optimal kernel parameter in the study area is (0.41).
Global polynomial interpolation (GPI) GPI is a fast and imprecise method that applies multiple regression to the whole dataset (Johnston et al. 2001;İmamoğlu and Sertel 2016). It is more suitable for slightly and gradually changing data (İmamoğlu and Sertel 2016). In this interpolation technique, the points at the border and edge can significantly influence the resulted surface. The orders allowed in the GPI are ten, and each order suits a specific characteristic in the data (Johnston et al. 2001).
A first-order (linear) is used in this study because it suits one plane among the data as equation discerned: where.z x i , y i : the datum at location. (x i , y i ), β i : the parameters. ε (x i , y i ): the random error.

Geostatistical interpolation methods
Geostatistical methods are known as the Kriging family. It is a statistically powerful interpolation method that depends on the spatial correlation that controls the distance or direction among sample points and can be used to explain surface spatial variation. Specify the number of the points or all points within a certain radius to determine each location's result value. This method is widely used in soil science and geology (Belkhiri and Narany 2015;Elumalai et al. 2017;Gharechelou et al. 2016;Reza et al. 2016). There are three types of kriging available in geostatistical analyst; ordinary kriging (OK), empirical Bayesian, and areal interpolation. In this study, only two methods (i.e. OK and EBK) were applied (Gómez-Hernández 2016). The Areal Interpolation (AI) was excluded because it is suitable for polygons, not for points.

Ordinary kriging (OK)
The OK method combines the measured statistical properties of the data. The measuring of the strength of statistical correlation in this method is a function of distance where spatial correlation fades, and the partial sill corresponds to the maximum variability in the absence of spatial dependence. In our case study, the optimal partial sill is (10.145). The semivariogram is used by the kriging approach to express the spatial autocorrelation. OK estimates Z*(x 0 ) and error estimation Variance where c(h) is semivariance, h is the lag distance, Z is the parameter of the points values, N(h) is the number of Empirical Bayesian kriging (EBK) EBK is the most complex method because it implies that the supposed semivariogram is the right one for linear interpolation (Javari 2017;Samsonova et al. 2017;Krivoruchko and Butler 2013;Krivoruchko 2012). The parameters in the newly developed EBK are automatically optimized using several semivariogram models rather than a single semivariogram. Therefore, it differs from other methods of classical Kriging. The steps used in the EBK method are an estimation and simulation of the semivariogram model using available data and simulated data (Li and Heap 2014).

Assessment of interpolation results
Two approaches were used to evaluate the optimal and best performance interpolation method: cross-validation and backward elimination regression BER method. For each approach, the interpolation methods were separated into Deterministic and Geostatistical. Comparative analysis was then applied to all methods. Separating the interpolation methods into Deterministic and Geostatistical will help researchers use one of the two techniques (i.e., deterministic or geostatistical).

Cross-validation
The six interpolation methods were verified individually by using cross-validation with predicted soil bearing capacity values. The comparison analysis among six interpolation methods was done depending on the (root mean square error (RMSE), mean absolute error (Dungca et al. 2017), and the coefficient of determination (R 2 )) (Adhikary and ith measured value. x intp : the studied variable's average interpolated value. x meas : the studied variable's average measured value.

Backward elimination regression BER
This method starts with a model containing all the K candidate regressors or variables. One by one excludes the regressor with the smallest partial F-statistic if its F-statistic is insignificant, e.g., if the variable has an insignificant F value. This variable is excluded from the model, and then a new regression model will be repeated on the remaining variables to exclude to insignificant one and so on till reaching the most significant variable ( The BER procedure using SPSS has been applied three times to determine the best interpolation among all of them. The second time among the deterministic methods and the third time among geostatistical methods.

GIS Interpolation maps
Six different interpolation methods' maps for 464 soil Bearing Capacity samples with class interval (1 Ton/m 2 ) have been carried out using ArcGIS 10.7 (Geostatistical Wizard) Fig. 3. The Bearing Capacity range is between (4-15 Ton/m 2 ). The deterministic methods show a spatial distribution similarity between (IDW and RBF) maps from the first visual impression. At the same time, the (LPI and GPI) straight and continuous zones are more similar to spline. Geostatistical methods show differences in smoothing because the OK map seems rougher class borders than EBK. Visual interpretation can approximate

Comparison of interpolation methods using Cross-validity procedure
The cross-validation method was applied to obtain the predicted values for 464 soil samples' points to assess the more accurate interpolation methods. Coefficient of determination (R 2 ), mean relative error (MRE), root mean square error (RMSE), and summation Error (SUM) have been computed for six methods Table 1. The results showed that the IDW shows a stronger R 2 (0.63) but a higher value in the other parameters in the deterministic methods. In contrast, the results were confusing in the geostatistical methods because EBK shows very slight strong in R 2 while OK has low better values in the errors parameters Table 1. This confusion could happen, in our case, due to the small differences in the tests' values and human error compared to more than one parameter. For more accurate and nonhuman errors, the BER model has been applied. It is worth mentioning that R 2 is considered an important indication for good spatial sampling. In our case, the best R 2 is around 0.6 because the spatial distribution is not good because the sampling was according to the construction projects, and the distance among points is irregular. Therefore, R 2 is very useful for sampling location quality, and in engineering must be at least 0.7 or more.

Backward elimination regression analysis
BER using SPSS software was used to specify in a definite and straightforward way the best interpolation methods. BER, as mentioned, is the best and simple statistical way to examine and determine the optimal method because it removes the least significant variable in the model each time. Then repeats the model again and again for the remaining variables until it reduces them to one most significant variable. BER has been applied to the two interpolation group methods (deterministic, geostatistical) separately and combined. The BER test results show that the IDW is the optimal and most significant interpolation method among all the deterministic methods. GPI is considered the most inappropriate and insignificant method because it is removed in the first model, followed by LPI removed in the second model. The third model is between the remaining two methods IDW and RBF Table 2. At the same time. The result of geostatistical methods shows that EBK is the best method in our case than the OK method, which is eliminated in the first model Table 3. BER also applied on all six methods and shows that IDW is the ultimate significant method among them by using GAW. The fifth model is between the two deterministic methods IDW and RBF. The most interesting is that EBK was removed in the first model due to the impact and significance of IDW Table 4.
The results indicate that the researcher should choose one type of method in each case study, whether deterministic or geostatistical, according to the case study and dataset and make analysis and comparison to determine the best method among them.

Spatial variation of soil bearing capacity
The allowable bearing capacity final map produced using the IDW and EBK interpolation method is the best one for preliminary engineering designs in An-Najaf city Figs. 4, 5. The maps show a similar spatial distribution of bearing capacity values. The allowable bearing between (10.1 to 15 Ton\m 2 ) is spatially distributed in the north, east, and west of the study area. This spatial distribution may be due to the impact of well compacted geological silty sand of Injana formation (Upper Fars) and Dibdibba formation. Bearing capacity below (10.1 Ton\m 2 ) is spatially distributed in the middle neighbourhoods with direction NW-SE, and this could be because of an ancient fluvial deposit and anthropogenic canals.

Conclusions
With economic development and population growth in An-Najaf city, the need for many new buildings is arises, most of which are constructed in new areas where there is a lack of information about the soil layers' properties. Therefore, the presence of an economical solution to predict soil behaviour is essential. In this research, six different methods have interpolated the bearing capacity to determine the ultimate one using two statistical methods: the cross-validation and BER methods.
By comparing the results of the two statistical methods, it is concluded that the BER method gives strong and more confident results than the cross-validation method as it eliminates the weakest variable at each run until the strongest variable remained. Also, dividing the interpolation methods into deterministic and geostatistical will give more accurate results. Based on this conclusion, researchers recommend using BER to compare the interpolation methods for any studied feature after dividing the interpolation methods into deterministic and statistical. The best Deterministic Interpolation Method for the Bearing Capacity of our case and dataset is the IDW. While the best geostatistical interpolation method is the EBK and the best overall interpolation method is the IDW. The produced map of bearing capacity by the IDW interpolation method indicates that the allowable Bearing Capacity ranges between 4 and 15 Ton\m 2 . This kind of map proposes a robust database for the study area; also, they ease understanding the data visually. As well, using these maps will help saving expenses along with time and effort. Another advantage of producing geotechnical maps for soil bearing capacity is to guide engineers and decision-makers to decide the best choice for any construction design, the most suitable foundation design, and the appropriate treatment needed. Finally, the results of this paper might be used for laying the foundations of smart cities.

Conflict of interest
The authors stated that there are no conflicts of interest of any kind.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.