Inverse distance weighting method optimization in the process of digital terrain model creation based on data collected from a multibeam echosounder

This paper presents the optimization of the inverse distance weighting method (IDW) in the process of creating a digital terrain model (DTM) of the seabed based on bathymetric data collected using a multibeam echosounder (MBES). There are many different methods for processing irregular measurement data into a grid-based DTM, and the most popular of these methods are inverse distance weighting (IDW), nearest neighbour (NN), moving average (MA) and kriging (K). Kriging is often considered one of the best methods in interpolation of heterogeneous spatial data, but its use is burdened by a significantly long calculation time. In contrast, the MA method is the fastest, but the calculated models are less accurate. Between them is the IDW method, which gives satisfactory accuracy with a reasonable calculation time. In this study, the author optimized the IDW method used in the process of creating a DTM seabed based on measurement points from MBES. The goal of this optimization was to significantly accelerate the calculations, with a possible additional increase in the accuracy of the created model. Several variants of IDW methods were analysed (dependent on the search radius, number of points in the interpolation, power of the interpolation and applied smoothing method). Finally, the author proposed an optimization of the IDW method, which uses a new technique of choosing the nearest points during the interpolation process (named the growing radius). The experiments presented in the paper and the results obtained show the true potential of the IDW optimized method in the case of DTM estimation.


Creating a seabed DTM
Research on seabeds is the most common and important task conducted by hydrographic institutions, and the created models are the basic information repository for further analysis and visualization. Users of GIS systems, including those containing bathymetric data, demand reliable, accurate and up-to-date data and place high demands on the dynamics of data processing, visualization and analysis in real time.
To create a seabed digital terrain model (DTM), it is first necessary to take marine measurements. Modern measuring systems with devices that enable the recording of observation results in a continuous manner (e.g. multibeam echosounders-MBESs) can obtain a very large amount of information about the shape of the seabed in a relatively short time. These systems record the location and depth (spatial coordinates x, y, z) of many millions of points during one measurement session. The processing of such a large amount of data that is additionally irregularly dispersed in space requires the use of specially prepared numerical methods and appropriately selected data processing algorithms (Burrough and McDonnell 1998). In GIS systems, DTM of the seabed surface is often described using a grid structure (regular square network). There are many methods of creating a grid based on measurement data, with the most commonly used interpolation methods being minimum curvature, natural neighbour, nearest neighbour, modified Shepard's method, radial basis function, inverse distance to a power, triangulation with linear interpolation, moving average, kriging, polynomial regression and methods based on artificial intelligence (Chin-shung et al. 2004;Maleika et al. 2012;Rishikeshan et al. 2014). These methods use several different algorithms to determine the new values at grid nodes. The selection of the optimal interpolation method for unevenly distributed measurement data depends on a number of features that characterize the dataset: the homogeneity of data dispersion, number of points per area unit, population variance (degree of data changeability) and the type of surface reflected by the data.
The inverse distance weighting (IDW) method, a deterministic spatial interpolation model, is one of the most popular methods adopted by geoscientists and geographers and has been implemented in many GIS packages. The general premise of this method is that the attribute values of any given pair of points are related to each other, but their similarity is inversely related to the distance between the two locations (Lu and Wong 2008).
The disadvantage of using the IDW method in processing large datasets is the computing time. Compared with other methods, the DTM creation time can be up to 5 times longer than the fast-moving average method . A reasonable question is whether it is possible to optimize the general IDW method for processing large bathymetric datasets in the process of creating a grid-based DTM. The goal of such optimization is to significantly increase the speed of the calculations while maintaining and perhaps even increasing the accuracy of the created model.

Source of errors and IHO standards
A significant problem when creating DTM seabed models is the inability to accurately determine the accuracy of the created model. This is because the actual shape of the surface being described is unknown; consequently, the model cannot be compared with the real surface. In practice, assessing the accuracy of the model boils down to estimating and summing the errors arising at the subsequent stages of data processing. Usually, persons performing tests using MBES simplify the estimation of the DTM accuracy to determine the measurement accuracy of the device and ignore other possible sources of error (such as the ship motion parameters, positioning system, MBES configuration, bottom type, occurring depths, interpolation methods used and grid resolution). They do so because of the difficulty in estimating the values of these errors and because the magnitudes of these errors, as shown by numerous studies, are usually much smaller than the measurement error (Bannari and Kadhem 2017;Kannan et al. 2015). It is estimated that for high-class MBES devices and measurements carried out in shallow waters (up to 20-30 m), this error is approximately 5 cm. More about determining this type of error can be found in (Maleika 2012).
Worldwide, organizations involved in hydrographic surveys comply with the standards for hydrographic surveys (International Hydrographic Organization 2008) issued by the International Hydrographic Organization (IHO). This publication includes the minimum standards for hydrographic surveys, which enables the determination of the spatial uncertainty of data. This allows the users of the information to use the survey results safely (merchant ships, navy, GIS users, etc.). The formula supplied by the IHO is used to compute the maximum allowable total vertical uncertainty (TVU) at a 95% confidence level (International Hydrographic Organization 2008): where a represents the portion of the uncertainty that does not vary with depth, b is a coefficient that represents the portion of the uncertainty that varies with depth and d is the depth.
In the research presented in this article, based on the IHO guidelines, it was assumed that a = 0.25 and b = 0.0075, which corresponds to the special class (as defined by the IHO (International Hydrographic Organization 2008), contains areas in which the highest quality of DTM and chart is necessary, e.g. channels, harbours, shallow waters and rivers). For these parameters and the test data used, the total DTM error is 25-30 cm (depending on the test surface). All hydrographic works, including the use of interpolation methods in the creation of DTM, should be carried out in accordance with the above standards.
In many papers, the authors investigate DTM interpolation accuracy, but most studies are based on data determined from test models at random (number and distribution of measuring points) and directly (depth reading without considering the measurement error). This does not correspond to reality, where the distribution of measurement points is quite characteristic for MBES (and depends on the sea survey parameters), and each individual measurement is burdened with some random measurement error. Therefore, the results obtained and the conclusions regarding the properties of the interpolation methods used for these specific data may be incorrect. The author believes that the use of an innovative method of determining test data and the test procedure described in the "Research" section increase the reliability of the results obtained and the conclusions drawn.
There are other works describing the impact of the applied interpolation methods on the accuracy of the created DTMs. For example, in (Amante and Eakins 2016), the authors study the accuracy of interpolated bathymetry in digital elevation models. In (Amante 2018), the authors estimated the coastal DEM uncertainty.

Methods
In this study, the IDW method is used to interpolate spatial data, which is based on the concept of distance weighting. This method can be used to estimate unknown depth data from the known (adjacent) measured depths (Burrough and McDonnell 1998;Bedient and Huber 1992). The IDW formulas are given by Eqs. 2 and 3 (Chen and Liu 2012): where R p represents the unknown depth data (cm); R i represents the depth data measured by MBES (cm); N is the number of points (in the search radius area); w i represents the weighting of each depth; d i is the distance from each depth to the calculated grid node; and α is the power and is also a control parameter, generally assumed to be two. Several studies (e.g. (Simanton and Osborn 1980;Tung 1983)) have experimented with variations in power, examining its effects on the spatial distribution of information from precipitation observations. The IDW gridding method can be either an exact or a smoothing interpolator. With IDW, data are weighted during interpolation such that the influence of one point relative to another declines with distance from the grid node. Weighting is assigned to data using a weighting power (α) that controls how the weighting factors drop off as the distance from a grid node increases (Golden Software Support 2019). Considering that the measurement data from MBES are burdened with some random measurement error (Maleika et al. 2011), it seems reasonable to use the IDW method with a low-power value so that the method works slightly more smoothly. This thesis is checked in test studies.
First, reasonable parameters for each interpolation technique were identified, as inappropriate parameters can cause significant artefacts in the interpolated surface, especially with IDW (Akkala et al. 2010;Burrough and McDonnell 1998). Numerous combinations of the interpolation parameters were evaluated using a brute-force methodology. The parameters for each interpolation technique that resulted in the lowest median percent deviation for the entire study area were identified and then used in all subsequent analyses. The parameters for IDW were a power of 2 and a variable search radius of 8 points.
The IDW interpolation method is commonly used in the processing of various spatial data, e.g. soil moisture distribution (Srivastava et al. 2019), and surface water volume estimation (Fuentes et al. 2019), and in creating digital elevation models (Salekin et al. 2018), air pollution models (Peng et al. 2018) and many others.
In most GIS software (e.g. Surfer), the IDW method is implemented in such way that it assigns values to points by averaging the data inside a fixed search area (defined by the user). There is also an option to define a minimum number of neighbours inside the search area, which allows us to calculate a new point value. Otherwise, the point node is set as a blank. The power parameter may also be changed.

Research
Preparing test data A major problem in testing and searching for the optimal interpolation methods used in creating a DTM is the inability to clearly identify errors arising in this process. This situation is the result of different representations of the input (unevenly distributed points) and output (evenly spaced points). To solve Fig. 1 Scheme for preparing test data using a virtual survey simulator. Based on the MBES source dataset, both a highresolution grid (1) and reference grid (2) were created. Using the high-resolution grid (1) and virtual survey software (3), a new dataset with properties similar to those of MBES source data was calculated (4). Based on these data, we can examine the properties of the interpolation method obtained, and in this way, DTM (5) is compared with the reference grid (2), which gives us the ability to pinpoint the interpolation error and model accuracy this problem, the proprietary virtual survey program (Maleika 2013) was used to prepare the test data, enabling the generation of xyz measurement sets based on a high-resolution grid. The created simulator includes survey parameters such as boat speed, operating parameters of the MBES device, MBES device error and profile system. As a result of the simulator's activities, a new test set of xyz points is received. Research on interpolation methods consists of creating subsequent models based on these generated sets using various interpolation parameters. Both of these models (reference and test) are described using a grid structure of the same size, so they can be directly compared, and the errors can be calculated (including the average error at the 95% confidence level). It should be clearly underlined that in the research procedure created in this way, the model error not only is an error resulting from the application of the selected interpolation method but also determines the total model error taking into account components such as single device measurement error (MBES), errors resulting from the parameters of the adopted structure grid (depending on, e.g. grid resolution) and errors depending on the adopted parameters of the marine survey. By changing only the parameters related to the interpolation of the measurement data, we can assess the impact of the interpolation method on the overall error of the model (for various variants), whose "adjustment" depends on the parameters listed above. The scheme for preparing the measurement data is shown in Fig. 1.
When preparing reference surfaces using a virtual survey, the following simulation parameters were determined: The parameters above correspond with the parameters used in the actual bathymetrical measurements, which were the foundation of the reference surfaces. These simulations give us high-density data (the distance between neighbouring points is approximately 3 cm and 10 cm).

Reference surfaces
The research was done using three reference surfaces prepared from real bathymetric data collected in 2017 by the Szczecin Maritime Office in the basin Szczecin Lagoon using the Simrad EM 3000 echosounder (Kongsberg Maritime AS, Norway) and WGS-84 coordinate system. Next, these real data were used to create reference surfaces (using the procedure described above). These surfaces differ significantly in their shapes (see Fig. 2 For each of these surfaces, a reference grid model was built (using IDW, kriging and the moving average interpolation method and then averaging). Their sizes and properties are presented in Table 1.

Testing procedure
The research used the GIS SURFER v8.0 software package (Golden Software, Golden-Colorado, USA). The IDW interpolation parameters such as search area, number of points,  power factor, and additional surface smoothing were modified during the research. All the created surfaces (DTMs) were compared with the reference surface, and the 95% confidence level of error was determined (in centimetres), as well as the computation time (in seconds). All the calculations were performed on a personal computer equipped with an Intel Core i7 processor (2.93 GHz, model 870), 4-GB RAM, HDD 2TB (ST32000641AS) and 64-bit Windows 7. The performance index for this configuration, calculated in Cinebench R15 (the CPU Test) (Cinebench 2020), is equal to 478 points.

Results
The first part of the study examined the impact of the size of the adopted search radius on the accuracy of the model. This parameter also indirectly affected the number of points involved in interpolation, and the maximum number was limited to 128. Tests were carried out for search radii equal to 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 1.0, 2.0, 5.0 and 10.0 (meters). The value of the minimum required points inside the search radius was set to 4. If there were fewer points in the specified environment, a blank node was inserted. The results obtained are presented in Figs. 3, 4 and 5 (the error depends on the search radius, % of blank nodes and calculation time).
As can be seen (Fig. 3, gate surface), the model error in the initial phase slightly decreases (up to 3.3 cm) and then starts to increase quickly (up to 9.5 cm). The percentage of blanked nodes drops rapidly from 8 to 0%. Because our goal was the minimal value at the 95% confidence level with a minimum of blank nodes, the chosen optimum value of the search radius is approximately 0.4 to 0.8 m. Figure 4 (swing surface) shows that the 95% confidence level of the error rate remains constant at approximately 5 cm. The percentage of blanked nodes falls rapidly between 0.2 and 0.4 m. The value of the optimal search radius for this surface is in the range of 0.5 to 2.0 m. Figure 5 (wrecks surface) shows that the 95% confidence level of the error rate increases slowly from 8 to 13 cm. Because the density of the measurement data for this surface is twice as high, for all the values, the search radius is 0%. Consequently, the optimal search radius for this surface is low (0.2 m or 0.3 m).  Considering the results obtained for all three surfaces, it is easy to notice that the accuracy of the created model is the highest for low values of search radius (approximately 0.4-0.6 m). On the other hand, for the smallest radius values, the number of blank nodes may be too high for the created model to be accepted. In conclusion, the optimal result for the search radius is equal to 0.6 m (high model accuracy in the absence of blank nodes).
The total interpolation time (green bars in the chart above) is strongly dependent on the size of the adopted search radius and increases exponentially with its size. It is particularly notable that the value of the radius chosen is larger than 2 m. Appropriate selection of this parameter turns out to be crucial in accelerating the IDW method.
The next part of the study examined the impact on the accuracy and speed of model creation on the number of points (n) involved in IDW interpolation. The tests were carried out for n = 1, 2, 3, 4, 5, 6, 8, 10, 15, 20 and 50 points. In this phase of research, it was additionally assumed that the maximum search radius was 2.0 m.
Based on the results obtained (Fig. 6), it can be concluded that the accuracy of the model is the highest when approximately 6-10 nearest measuring points are involved in the interpolation. The obtained model errors in these cases are smaller than when all the points located in a specific search for interpolation are taken by several percent. The calculation time for this approach is quite short and does not significantly depend on the density of the measuring points.
As mentioned in the "Introduction", when using the IDW method, it is usually assumed that the value of the power parameter is 2. However, this is not obligatory, and this parameter can take any real values. By changing this value, we can determine how strong the influence of the points closer or farther away from this value is when calculating a new value, indirectly affecting the local sharpening or smoothing of the surface. In the next stage of research, we checked the effect of the power parameter on the accuracy and speed of interpolation. The tests were carried out for power = 0.25, 0.5, 0.75, 1, 1.5, 1.75, 2, 2.25, 2.5, 3 and 5. In these studies, search radius-= 0.5 m and number of points n = 6 were used. The obtained results are presented in Fig. 7.
Based on the results obtained, it can be clearly seen that the highest accuracy of the model is obtained for

Proposition of improved variant of IDW-growing radius
From the previous research, it can be concluded that the method of selecting points involved in interpolation has an impact on the accuracy of the created models and the speed of calculations. The commonly used approach in which a new value in a grid is calculated based on all the points located within a specific environment (search radius) is not optimal. Slightly better results are obtained when n nearest points are chosen for interpolation. However, it should be remembered that the specificity and standards associated with hydrographic works require that the new value be calculated based on the points lying not more than 1 m distance away.
Taking these dependencies into account, the author proposed a new approach to the IDW interpolation method, consisting of the abandonment of a fixed-size search radius in favour of a growing radius. In the proposed approach, the size of the interpolation window increases from 3 × 3 points until the required number of measuring points (P) is within the radius or until the specified limit value (maximum radius) is reached. Only the P nearest points (this value is determined by the operator) located within such a frame are involved in interpolation. If there are no measuring points in the maximum radius, a blank node is inserted.
The algorithm of the improved IDW variant begins with the declaration of: Consequently, the points inside the search area are ordered according to the squared radius (see Fig. 8a). Next, the closest P points inside the search area determined by radius R are chosen (see Fig. 8b). If the number of points is greater than or equal to P, we perform IDW interpolation by use of P nearest points. Otherwise, we enlarge the R as long asR < = R MAX or the number of points required is greater or equal P. We perform IDW interpolation by use of these points (see Fig. 8c); otherwise, we set a blank node. Fig. 7 Results of all three tested surfaces with different values of the power parameter. Light yellow background indicates the values that give us the best results Fig. 8 The improved IDW approach behaviour: a the search radius is set as the minimum, b the number of points in the search radius is too low and c the search radius is increased to contain a certain number of points Pseudocode of improved IDW variant: Table 2 presents the results obtained when creating test models using the author's growing radius algorithm. The obtained model accuracy and calculation times are compared with the best values obtained before. The series of modified interpolation tests were performed with the following parameters: P = 5, 8 or 10 points, R MIN = 3 × 3 points, R MAX = 10 × 10 points (adequate 'search radius' = 1 m) and R STEP = 1 point. The results with italicized values indicate the best results.

Tests of the improved IDW method
Based on the results, it can be concluded that: & the Best results (the smallest error) are obtained when the closest 5 points are taken for interpolation, & The dimensions of the window (growing radius) for which the best results are obtained is approximately 0.5-0.6 m, & The depth accuracy in the created models (using the improved variant of IDW) is higher by approximately 0.1 cm (3%), & The calculation time using the improved variant of IDW is approximately 4-5 times shorter, & The more points that are involved in interpolation, the slower the algorithm is-these differences are clearly noticeable when using the algorithm with a fixed search radius (exponential increase in computation time), and when using the growing radius method, the increase in calculation time is much slower and more linear, and & The feature of the growing radius method means that in practice, no blank nodes are created.
It should be noted that utilizing the new proposed interpolation method only slightly increases the accuracy of the created models but clearly speeds the interpolation process in comparison to the traditional method (fixed search radius). When using the growing radius method, the size of the frame adapts to the density of the measurement points (which can vary not only for data coming from different surveys but also within the process of creating a model of one surface). In the proposed method, it is the user (operator) who decides how many points are considered while calculating a new node. The use of a growing radius additionally results in a much smaller number of blank nodes appearing. In summary, the use of the modified IDW method in creating the models of the seabed based on the data from MBES allows for a significant reduction in the calculation time with an additional minimal increase in the accuracy of the created model.

Smoothing the surface
Because MBES source data have some random error, the model created on this basis is characterized by minor local disturbances. Although the process of data interpolation to a grid by its nature slightly smooths the created surface, the use of an additional smoothing algorithm can positively affect the accuracy of the created model. Therefore, in the last part of the research, the use of a smooth filter was considered an objective of analysis.
The tests were performed using the growing radius method, and the results are presented in Table 3.
Based on the results obtained, it is clear that the use of an additional smoothing filter increases the accuracy of the model by approx. 0.3-1.0 cm, while the calculation time increases by approx. 3-5%. The best results are obtained using the Gaussian filter. The use of an additional model smoothing filter slightly increases its accuracy but also locally averages the measurements, so its use should be an option for the system operator to choose. Figure 9 shows a fragment of the gate surface before and after applying the smoothing filter.

Discussion and conclusions
This paper presents research on IDW variants in the case of digital terrain model creation using bathymetric data from MBES. Because the method was tested on various shaped surfaces and the test data reflected a real sea survey with high accuracy, the test results can be considered reliable. The IDW variant proposed by the author, a new method of a growing radius with an optimal number of points required as well as application of a Gaussian smoothing, was classified as the best IDW variant (called modified IDW).
Based on the broader research conducted by the author (presented in ) (Maleika 2018) (Maleika 2015) (Forczmanski and Maleika 2015) (Maleika et al. n.d.)), the following can be seen: & Increasing the search radius above 1 m results in slightly larger errors with very long calculation times because the density of measurement data from MBES is very high and reaches up to 60 points per 1 m 2 of surface, & Considering too many measuring points during interpolation increases the model error and significantly increases the calculation time, Italic indicates the values that give us the best results Fig. 9 Fragment of "gate" surface before (left) and after (right) smoothing using a Gaussian filter & The best results are obtained for approximately 5-8 measuring points in the nearest neighbourhood (search radius 0.6 m), and & The modified IDW (with growing radius) method slightly increases the accuracy of the model, with a significant reduction in the computation time, which is a significant disadvantage of regular IDW in typical settings.
Based on the above hypothesis, the IDW method can be optimized by eliminating the fixed size of the "search radius" and focusing instead on several closest measurement points.
The experiments showed the true potential of the new modified method. Moreover, this method performed very well on all the tested surfaces with errors at a 95% confidence level of less than 4 cm for the gate, 5 cm for the swing and 10 cm for the wrecks surfaces. According to the IHO standards, the total model error for the test surfaces should not exceed 25 cm. For all the tested surfaces, the error was significantly below this value, which shows that the modified IDW method can be used (following the IHO standards) for this type of interpolation.
Comparing the results of the research to the author's earlier works related to the optimization of the kriging (Maleika 2018) and moving average (Maleika 2015) interpolation methods in the process of creating the seabed DTM, it can be concluded that the modified IDW method gives intermediate results, both in terms of model accuracy and calculation time. Using this method allows us to obtain a clearly smaller model error than using the moving average method, although the error is slightly worse than that using the kriging method, while the calculation time is much shorter than that using the kriging method, although the calculation time is slightly longer than that using the moving average method. According to the author, the IDW method is the most comprehensive in bathymetric data processing in creating seabed DTM models.
The modified IDW method highlighted in this article is strongly connected with the characteristics, location and density of measuring points obtained from MBES made in shallow waters (5-20 m). The changes in data (data obtained from other measuring equipment or from MBES but in deeper waters) can cause slightly different readings of the search radius and number of points in interpolation. In such cases, we should use the method described in this article. The method uses a simple pinpoint set of optimal parameters for different input data and then adjusts the IDW method. The method using the growing radius seems to be universal and independent of the characteristics of the input data.
The proposed IDW method improvement approach can be used in GIS software for modelling surfaces based on large amounts of measurement data.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.