Keywords

5.1 Introduction

In this chapter, we will discuss a number of challenges that occur when working with geospatial point clouds, e.g., outliers, data fusion or domains without points. We will illustrate the discussion with a terrestrial LIDAR point cloud, or a seabed sonar point cloud and the combination of both. We aim to represent the area covered by both point clouds with a smooth parametric surface. The focus is on LR B-spline surfaces, which were shown to be appropriate within this context, see Skytt and DokkenĀ [Sky22] for prominent examples.

The data acquisition of terrains and seabeds produces huge point clouds. The structureā€”or lack of structureā€”in the point clouds depends on the technology used to acquire them, i.e., on the sensor under consideration, may it be a terrestrial laser scanner, a single or multibeam sonar. An efficient downstream use of the acquired data requires structured and compact data representations. Locally refined (LR) B-spline surfaces are smooth and flexible surfaces: They provide a middle road between the rigid but effective regularity of raster surfaces, and the highly flexible triangulated surfaces, see Chap.Ā 1 for a detailed comparison. The LR B-spline surfaces are found to be convenient for representation of terrains and seabeds [Sky15, Sky16], as they accurately represent the smooth part of the data while having the flexibility to adapt to local shape variations without globally increasing the data size of the mathematical surfaces.

An LR B-spline surface belongs to the class of locally refined spline surfaces. It is a piecewise polynomial surface defined on a rectangular domain composed of axes parallel rectangular boxes (a mesh). In contrast to a tensor product (TP) spline surface, the boxes do not need to form a regular pattern. The concept of LR B-splines is described in detail in Chap.Ā 2, which also includes an overview of alternative B-spline based locally refined surface methods.

The starting point for defining an LR B-spline surface is a TP B-spline surface.The adaptive refinement procedure inserts new meshlines into the surface description where the surface does not meet a prescribed accuracy requirement. The meshlines must satisfy the rule:

A new meshline must split the support of at least one TP B-spline implying that the refinement increases the number of TP B-splines with at least one.

Refinement is performed in an iterative algorithm described in Chap.Ā 3. First, the accuracy (L1 norm) of a current surface with respect to a given point cloud is computed. In a second step, the surface is refined where the accuracy does not meet the requirements. This process is repeated until the accuracy is found sufficient, which means that it does not exceed a predefined tolerance. Alternatively, the algorithm is stopped by some other constraint, normally the number of iterations. FigureĀ 5.1 summarizes the surface approximation algorithm.

Fig. 5.1
figure 1

Approximation of a point cloud by an LR B-spline surface with adaptive local surface refinement

Fig. 5.2
figure 2

The island FjĆølĆøy in Norway

In the following, we propose to illustrate the fitting of a data set combining observations from terrains and seabeds. We will highlight how to solve different challenges that arise in real cases, such as

  1. 1.

    Outliers (outlier detection methods)

  2. 2.

    Data voids (bounding of coefficients and trimming)

  3. 3.

    Noise (selection of tolerance and method for surface approximation).

We will describe in detail the methods used to guide the user through methodological answers to real problems. More precisely, in a first section, we will present the data set that will be approximated. Next, outlier detection procedures will be compared. We will explain the concept of bounding the coefficients. Next the advantages of the Multilevel B-spline Approximation (MBA) will be highlighted. We will conclude by explaining the concept of trimming to deal with data voids. An appendix is dedicated to the output format of the LR B-spline surfaces.

5.2 Description of the Data Set

FjĆølĆøy is a 2.1Ā km\(^2\) island in the municipality Stavanger at the south west coast of Norway, see Fig.Ā 5.2. In this area terrestrial and bathymetry data are both available and represented in the same coordinate system. We select a set of corresponding land and sea data from the FjĆølĆøy area, see Fig.Ā 5.3. The terrestrial data set was obtained in 2016 with LIDAR. The bathymetry data was acquired in 2013 by a multibeam sonar and released for public use in the context of the project ā€œMarine grunnkart pilot (Marine base map pilot)ā€. The boat with the sonar has no access at very shallow water: There is a zone between land and sea where data is lacking (voids). In an ongoing project, data from this zone are acquired with LIDAR bathymetry.

Fig. 5.3
figure 3

Selected data sets from land (brown) and seabed (grey). a Terrestrial points, b bathymetry points, and c both

The terrestrial data set shown in Fig.Ā 5.3a, consists of 2,579,974 points containing both land and the sea surface. Several buildings, trees and stones can be found in the covered area. The data set has been classified prior to reception. 73 points were identified as outliers, 1,643,865 as ground points and 936,036 points are unclassified. The outliers are quite distinct. We consider this set as the reference to which we compare our outlier detection methods. The bathymetry data set, Fig.Ā 5.3b, contains 25,107,199 unclassified points. Here no obvious outliers have been identified. The point cloud covers an area of \(800 \times 600\)Ā m, and the height range of the terrestrial data is [\(-\,64.75, 156.32\)]Ā m, including outliers. After the removal of outliers, the range is [\(-0.74, 76.4\)] m. The height range of the bathymetry data is [\(-21.13, -1.2\)] m. The total area covered by the combination of land and sea point clouds is just below \(0.5 \text { km}^2\).

We want to compute one LR B-spline surface combining both land and seabed as illustrated in Fig.Ā 5.3c. To that aim:

  1. 1.

    We prepare the terrestrial data by removing the points from the sea surface.

  2. 2.

    We let the x- and y-values of the data points parametrize the surface, which consequently corresponds to a function representing elevation.

  3. 3.

    We eliminate outliers. This step is described in detail in the next section.

  4. 4.

    We use biquadratic polynomials for the surface approximation with LR B-splines. This choice balances the need for smoothness and flexibility.

5.3 Outlier Detection

An outlier is defined as an observation that lies at an abnormal distance from other values in a random sample from a population. Different strategies exist to eliminate them.

5.3.1 Strategies for Outlier Detection

Outliers in a data set can come from contaminated data samples, incorrect sampling methods, errors coming from the sensor during data collection or analysis [Haw80, Cha17]. Outliers in large geospatial data set can largely influence the results of the surface fitting, i.e., they will be adjusted as normal observations. If not excluded prior to the approximation, ripples or non-smooth surfaces are likely to arise.

The outliers are categorized as sparse outliers, or can come in clusters [Wan15], in which case they are more challenging to filter. Visual approaches are not suitable for large point clouds and automatic detection should be preferred. Some of the most popular methods for outlier detection for light detection and ranging (LIDAR) point clouds are reviewed in, e.g., SotoodehĀ [Sot06]:

  1. 1.

    Z-Score or Extreme Value Analysis, as used in Sect.Ā 5.3.2.2.

  2. 2.

    Linear Regression Models (Principal Component Analysis, Least Mean Square).

  3. 3.

    Probabilistic and statistical testing. Histogram, boxplot, Interquartile range (IQR Sect.Ā 5.3.2.1) or Median Absolute Deviation are well known methods. From a statistical perspective, the Grubbā€™s test can be used to identify single outliers as minimum or maximum value in a data set, or the Rosnerā€™s test for multiple outliers. The statistical tests are often limited to univariate data sets that follow approximately a normal distribution.

  4. 4.

    Clustering techniques. They are used to group similar data values into clusters having similar behaviour. Here it is assumed that outliers belong to any or only small clusters. Classification of ground points from a geospatial point cloud is related to outlier detection.

  5. 5.

    Deep learning based methods. We cite PangĀ [Pan21] for a review of different possibilities.

  6. 6.

    Surface or slope based methods. Roberts et al.Ā [Rob19] test a set of surface-based and slope-based methods using some data sets known to be challenging to classify. With surface based methods, the ground surface is approximated iteratively. Ground points are identified using buffer zones defined from the parametric surface. Slope based methods assume that variations in terrain are gradual within a local neighbourhood.

5.3.2 Comparison of Outlier Detection Methods for the Selected Data Set

The terrestrial data set presented in Sect.Ā 5.2 contains both outliers and unclassified points representing houses, trees, low vegetation and similar objects. Our aim is to identify outliers, but it is also interesting to see the extent to which outlier detection methods separate ground truth from vegetation and man-made objects. We will investigate the IQR and Z-score methods, as well as a method for detecting single outliers. The outlier detection is applied in the context of adaptive approximation of height data, see Chap.Ā 3. The selected methods do not require any apriori estimate of the number of outliers. Moreover, they can be easily applied to a high number of data points.

The three methods are integrated in the adaptive approximation algorithm and applied at each iteration step in a regression setting. The distances between the points and the current approximating surface are compared making it a regression based method. The outlier detection is applied to subgroups of the data set identified by selecting the points situated in one mesh cell. The group testing is intended to reduce the computational effort in outlier detection. The point groups are subject for testing only if:

  • The maximum distance between the subset of the point cloud and the surface is larger than a threshold, which depends on the maximum and the average distance between the surface and all data points in the previous iteration step.

  • The local maximum distance has not be decreased significantly since the last iteration, which is an indication of the presence of at least one outlier.

The accuracy results from the last iteration step are obtained after outliers removal. At the start of the computation, it is hard to distinguish between features and outliers. As the surface is adapted to the point cloud, the distance in features will be smaller than for outliers; the criterion for allowing an outlier test is gradually decreased at each iteration step.

5.3.2.1 The IQR Test

Here the residuals correspond to a subset of the point cloud and the current surface are sorted according to their values. We call Q1 the first quartile of the residuals and Q3, the third one. Then \(IQR = Q3-Q1\) is the interquartile range of the residuals. We further denote two fences \(f1 = Q1 - factor \times IQR\) and \(f2 = Q3 + factor \times IQR\). An outlier is defined as a point with a residual value outside the range of these fences. Often \(factor = 1.5\), justified by assuming that the residuals follow the normal distribution. This factor gives fences at \(\mu - 3\sigma \) and \(\mu + 3\sigma \), where \(\mu \) is the mean and \(\sigma \) the standard deviation. This way 0.28% of the points are expected to be defined as outliers. Unfortunately there is no reason to believe that the residuals are normally distributed. A student distribution with a heavier tail is more probable, see Chap.Ā 4. As an assumption of the student distribution implies a more demanding computation to find the correct factor, we apply also the factors of 3 and 5 to our outlier detection and study the effect of this factor on the selected data set.

Table 5.1 Outlier detection with IQR, various factors

TableĀ 5.1 shows some results for outlier detection with the IQR method. The total number of outliers identified is 29,129, 8,389 and 2,615 for an IQR factor of 1.5, 3 and 5, respectively. All obvious outliers are caught together with a certain amount of vegetation and house points, depending on the factor under consideration. The example cells at the first iteration step are the same for all factors, and we see that the number of outliers found are reduced with increasing factor. Similar distances between the subset of the point clouds and the surface lead to a very diverse number of outliers. Fortunately, this is not necessarily a problem: Large distances can also be synonymous with a low accuracy due to lack of freedom in the surface. This occurs typically at the beginning of the adaptive process when the steepness and roughness of the terrain varies in the selected area.

Fig. 5.4
figure 4

The results of outlier detection with IQR, blue points are classified as outliers while the remaining points are light blue. a IQR factor 1.5, b detail with factor 1.5, c factor 3, d factor 5

FigureĀ 5.4 shows the result of the outlier classification. We see that much of the vegetation, buildings and some points at the sea surface are also classified as outliers for factorĀ =Ā 1.5, in addition to the obvious outliers (which are always found). In areas where the majority of the points belong to trees, see Fig.Ā 5.4b, the position of the surface is influenced by the vegetation points as well as the ground points. Ground points and vegetation points become equally likely to be classified as outliers. When the IQR factor is increased, the part of the vegetation classified as outliers is decreased, but not eradicated, as illustrated in Fig.Ā 5.4c and d.

The IQR outlier detection method removes many points and badly assumes that the data is normally distributed. Classification of points from vegetation and buildings should be done with more accurate methods. However, this method is simple and can give useful results if applied with care.

5.3.2.2 Z-score

Similarly to the IQR algorithm, the Z-score method is based on the assumption that the data are normally distributed. \(Z_i = \frac{r_i - \mu }{\sigma }\) where \(r_i\) is residual number i, \(\mu \) is the residual mean and \(\sigma \) the standard deviation. If the size of the residual is outside the range \([-3,3]\), it is considered as an outlier. In theory, this should give the same result as the IQR test with factor 1.5 for a normally distributed data set. In this method the mean and standard deviation are computed explicitly; this puts less assumption on the distribution. TableĀ 5.2 shows how outliers are detected with the Z-score method during the adaptive surface approximation. The method classifies fewer points as outliers than the IQR method. The total number detected was 5487. In the first iteration step, only 1 or 2 points are found to be outliers in 27 of the cells tested. A point with a very large residual shadows for other points that may also be considered to be an outlier. These points can be found only when the most extreme cases are removed.

Table 5.2 Outlier detection with the Z-score method
Fig. 5.5
figure 5

Results of outlier detection with the Z-score method. Blue points are classified as outliers. a The result after one iteration, b the result after three iterations

FigureĀ 5.5 highlights that the evident outliers are detected in the first iteration step together with a few points corresponding to vegetation or buildings. In later iterations, more points close to the ground are added. Some points belonging to trees, bushes and houses are classified as outliers. Unfortunately, some points from a tree may be found to be outliers and some may not.

5.3.2.3 Detection Aimed at Single Outlier Points

The last outlier detection method to be investigated in this chapter is designed to fit within the context of adaptive surface approximation with local refinement such as, e.g., LR B-splines. It mainly aims at identifying single outlier points and has no direct link to the aforementioned statistical methods.

The points in a cell with a residual larger than the threshold are called candidate outlier points. The threshold is used in a pre-processing step to check the cells for possible outliers.

Each candidate outlier is compared to a group of nearby points not restricted by the cell boundaries. The number of points in this group varies, but should be close to 100. A set of characteristics is computed for the group of nearby points, both including and excluding the candidate outliers, to decide if they should be excluded:

  • Standard deviation: \(std_{with}\) and \(std_{without}\),

  • Average distance to the surface: \(MAE_{with}\) and \(MAE_{without}\),

  • The range between their minimum and the maximum signed distance to the surface: \(R_{with}\) and \(R_{without}\),

  • Number of points: \(n_{with}\) and \(n_{without}\).

For a candidate point to be classified as an outlier, the following rules must apply: \(n_{with}-n_{without} \ll n_{with}\), \(std_{with} \gg std_{without}\), \(MAE_{with} \gg MAE_{without}\) and \(R_{with} \gg R_{without}\). Furthermore, let \(z_o\) be the elevation of the candidate outlier point and \(z_p\) of the closest neighbouring points and \(r_o\), and let \(r_p\) be the residual sizes for the two points. Then \(|z_o-z_p| > 2\times tol\) and \(|r_o-r_p| > 2\times tol\) where tol is the approximation tolerance. Moreover, a steep slope between the candidate outlier and the neighbouring point is required. The combination of these criteria implies that groups of outliers will be detected only if the group contains few points and/or is very deviant from other points in the neighbourhood.

Table 5.3 Outlier detection aimed at single outliers

TableĀ 5.3 shows the number of points identified as outliers along with some additional information. We note that the number of outliers is much lower compared to the previous methods. After the most prominent outliers have been removed in the first step, the outlier threshold is reduced significantly. In the first step, the number of candidate outliers in the cell is one or two, and all candidates are classified as outliers. In the second and third step, the number of candidates in a cell varies from 1 to 101 and in most cases no outliers are detected. Groups of points belonging to houses, trees and other vegetation are tested and found not to be obvious outliers.

Fig. 5.6
figure 6

Results of outlier detection with the method aimed at single outliers. Blue points are classified as outliers. a The final result, b a detail

Fig. 5.7
figure 7

Combination of terrain and sea data

FigureĀ 5.6 shows the location of the identified outliers. Mostly, the obvious cases are detected although a few points related to vegetation are included. As mentioned in the introduction, the data set contains 73 classified outliers, which were identified in a preprocessing step. The algorithm found 55 outliers where 49 also belong to the group of classified outliers. The current method is best adapted to the problem at hand (elimination of trees and vegetation), but it is complex and has a limited theoretical background. Nevertheless, it seems that a tailor made outlier detection is beneficial when combined with the adaptive method for surface generation.

5.4 Surface Approximation of the Selected Data Set

We use the points classified as ground from the terrestrial data set and remove the data points at the sea surface. This means that only points with a positive height component are included. The ground data is combined with the corresponding seabed data set resulting in the point cloud shown in Fig.Ā 5.7. We notice that there are some shallow water areas where points are missing. Furthermore, the point cloud density is considerably higher for the seabed part compared to the terrain one: There are about 25 times more bathymetry points than terrestrial.

5.4.1 Selection of Methods and Parameters

FigureĀ 5.1 gives an overview of the surface approximation algorithm. The process starts from a TP B-spline surface, which is adaptively refined in areas where the distance between the surface and the point cloud is larger than a given tolerance. Given a current LR B-spline surface, we can perform the actual approximation with a least-square (LS) approach or multilevel B-spline approximation (MBA), see Sect.Ā 3.3. LS approximation is a global approach with some best fit properties while MBA is an iterative explicit local approximation method. The method is to some extent expected to smooth out extreme behaviour in the approximating surface. We normally apply LS approximation for a number of iterations in the adaptive algorithm before turning to MBA.

Data sets are subject to noise and may contain outliers. It is, thus, not obvious that the approximation should be pursued until all points have a distance to the surface smaller than a given tolerance. Normally, the process is stopped by a maximum number of iteration steps, but the finding the optimal number of iterations is challenging. Computing the minimum of AIC is an alternative to find this optimum, but the process is time consuming. Moreover, it is a global method that does not take local variations in the point cloud into account, i.e., a minimum does not always exist. A tolerance is applied to identify where the surface needs to be refined. This value should be defined depending on the measurement accuracy, information that is not always known. Also, the actual selection of new meshlines to insert influences the accuracy and number of coefficients in the final surface. Various refinement strategies are discussed inĀ [Sky22] and a short resume is given in Sect.Ā 3.2. In the remainder of this section, we will discuss the selection of methods (MBA, LS, combination of both, refinement strategy) and parameters (tolerance, number of iterations) for the selected data set.

5.4.1.1 LS Approximation Versus MBA

FiguresĀ 5.8 andĀ 5.9 compare (i) the approximation with LS until about 33,000 coefficients are estimated, and switch to MBA, and (ii) MBA for the entire computation using a tolerance of 0.5 m. We see that for the LS approximation, both the number of unresolved points and the average distance in these points are lower than MBA for the same number of coefficients. The difference is the largest for few coefficients and diminishes when the number of coefficients increases.

Fig. 5.8
figure 8

Number of unresolved points with respect to the number of surface coefficients

Fig. 5.9
figure 9

Accumulated distance in points with distance more than 0.5 m scaled with a factor of 1/10,000

Fig. 5.10
figure 10

Results of surface generation, LSĀ =Ā green, MBAĀ =Ā brown. a The surfaces are roughly similar, b the point set is included in the figure to emphasize the areas without points

Fig. 5.11
figure 11

Focus on areas without points. a Approximation with LS, b using MBA

FigureĀ 5.10 shows the approximating surfaces using LS and MBA. The difference is small, but looking at Fig.Ā 5.11, it is clear that MBA offers a smoother transition in areas with no point. LS approximation should be applied early in the approximation process for data sets with relatively uniform density, a low noise level and no outliers. For non-smooth data sets with voids, MBA should be the preferred choice.

When fitting point clouds with spline surfaces, an overshoot in areas with steep gradients may arise, in particular with unevenly distributed data points. On the other hand, the surface is bounded by its coefficients due to the property partition of unity. By limiting the surface coefficients to a range slightly larger that the height range of the data set, extreme overshoots can be avoided. FigureĀ 5.12 focuses on a subset of the point cloud covering a part of the area depicted in Fig.Ā 5.11. The subset contains 1,494,242 points, which are shown in Fig.Ā 5.12a. The area is rough, includes parts without points and has steep climbs from the seabed to two islands. The adaptive approximation procedure starts from a biquadratic surface without inner knots and is allowed to continue for 12 iterations with refinement in alternating parameter directions. Using MBA, the result is almost similar when the size of the coefficients are bounded or not, see Fig.Ā 5.12b and c. This is not the case for the LS approximation. When the coefficients are bounded to a range slightly larger than the elevation range of the data point, the resulting surface depicted in Fig.Ā 5.12d is quite well behaved in the areas without point although less smooth than the MBA surfaces. Without a bound on the coefficients, the surface oscillates drastically in areas without points (Fig.Ā 5.12e).

The LS approximation is combined with a smoothing term to ensure a solution in areas without points, see Chap.Ā 3 for more details. The weights on the approximation term and the smoothing term sum up to one. Normally, the weight on this term is kept low to emphasize approximation. We apply a higher weight (0.1) to study the effect of smoothing in challenging configurations as shown in Fig.Ā 5.12f. The extreme behaviour in Fig.Ā 5.12e is avoided, but the surface is generally less smooth than the alternatives shown in Fig.Ā 5.12b, c and d. The approximation accuracy is lower when a high weight on the smoothing term is applied. It must be noted that the approximation errors increase in the last iteration step in theses cases. Otherwise, the accuracy does not differ much between the various approaches, see TableĀ 5.4. LS approximation may become less accurate when the LR mesh gets very unstructured. Then the algorithm switches to perform approximation with MBA. We stop the iteration just before this situation occurs so the results are achieved with LS approximation or MBA, purely. We note that the bounds on the surface coefficients do not hamper the approximation accuracy.

Fig. 5.12
figure 12

Focus on areas without points. Approximation with different selections of approximation method. a Data points, b approximation with MBA and bound on the coefficients, the surface is light blue and the points can be glimpsed in clear blue, c approximation with MBA and no coefficient bounds, d LS approximation with coefficient bounds, e LS approximation, no coefficient bounds, f LS approximation with high weight on the smoothing term (0.1) and no coefficient bounds

Table 5.4 Accuracy of the subset of the point cloud with LS approximation and MBA

5.4.1.2 When to Stop the Iteration

FiguresĀ 5.8 andĀ 5.9 indicate that the gain in continuing the approximation after the surface having 20,000ā€“30,000 coefficients is small. For approximation with MBA, the maximum distance decreases from 3.782 to 3.426 m, the average distance from 0.100 to 0.073 m and the fraction of points outside the tolerance from 1.9 to 0.57% when the number of coefficients increases from 21,572 to 76,110 and the computation time from 3 min. 24Ā s to 4Ā min 28Ā s. We refer to TableĀ 5.5 for the accuracy development for an increasing number of iterations.

When searching for an optimal surface approximation, a balance has to be found between the number of iterations, the MAE and other performance indicators, i.e., the maximum distance and the computational time for a given tolerance. The choice is let to the practitioner: This latter should judge the risk of fitting the noise as the number of iterations increases. Here an indication can be provided by searching the minimum of AIC, see Chap.Ā 4. In our particular case, no minimum could be found: We link the lack of minimum with the fact that the surface contains many details and is not smooth enough, i.e., a global criterion on its own is not sufficient to judge the goodness of fit.

5.4.1.3 Tolerance and Accuracy

A main concern regarding surface fitting is linked with the accuracy of the approximation. This is especially important in areas like seabed shallows, while the noise level may be high at shallows due to sea vegetation and a narrow sonar width resulting in multiple traversals by the boat carrying the sonar. The surface should accurately represent the main shape of the terrain, but not necessarily adapt to every little stone. The tolerance is used to determine where the surface needs refinement and consequently the achievable accuracy. It is a predetermined value that should reflect the precision of the measurement. A level of 2ā€“3 times the measurement error can be considered appropriate as discussed in Chap.Ā 4. This is a first indication as the real error is normally larger than the precision of the measurement device, which is not always known. Several scans are merged and arbitrary objects, like power lines and fishes, may influence the result. Here we investigate the impact of the threshold on the fitting.

The surface approximations in Figs.Ā 5.8, 5.9, 5.10 andĀ 5.11 were performed with a tolerance of 0.5 m. The algorithm was allowed to run for 12 iterations, and all mesh cells where the maximum distance between the surface and a point in that cell exceeded the tolerance triggered refinement. All B-splines with the cell in its support were refined in one parameter direction at the time, in the x-direction at odd levels and the y-direction at even levels. This corresponds to the refinement strategy called FA, see Chap.Ā 3 for more details. The MAE dropped below the tolerance at iteration level 2 for both LS approximation and MBA, and touched 0.1 m at level 9. The tolerance of 0.5 m is selected somewhat arbitrary, but is found to balance surface size and accuracy.

Table 5.5 Tolerance, number of iteration steps, MAE, number of coefficients and percentages of points with a distance to the surface in specified ranges

TableĀ 5.5 presents some accuracy results for a selection of tolerances and maximum iteration levels. The setup used in Figs.Ā 5.8, 5.9, 5.10 andĀ 5.11 is highlighted with bold font. The difference in accuracy between the applied tolerances is remarkably small while the numbers of coefficients differ greatly when a high number of iterations is applied. In the first iteration steps, the selected tolerance plays a limited role. The approximation error indicates similar refinements for all applied tolerances.

Fig. 5.13
figure 13

Point cloud coloured according to the distance to the surface. White points are closer than 0.4 m, green points lie below the surface and red points above. More saturated colour means larger distance. The size of the white points are reduced compared to the coloured points. a Tolerance 0.1 m, 12 iterations, b tolerance 0.6 m, 12 iterations

FigureĀ 5.13 shows that the configuration of points with a residual value smaller than or larger than 0.4 m is relatively similar for the tolerances 0.1 and 0.6 m. Some differences can be spotted mainly due to an increase in point size for the points with a distance larger than 0.4 m in the picture. The surface adapting to a tolerance of 0.1 m has more points within this tolerance belt than the other surfaces, but the difference is negligible compared to the difference in the number of surface coefficients. The percentages of points within this small belt after 12 iterations is 79.3, 78.4, 77.4 and 75.8% for tolerances of 0.1, 0.4, 0.5 and 0.6, respectively. The roughness of the data does not allow such a tight approximation with a smooth surface. The majority of the points with a high distance to the surface belong to the seabed. This can be caused by the number of bathymetry points being much higher than terrestrial points, but also from the bathymetry points being unclassified whereas terrestrial points are classified as ground. The descent is most prominent in shallow seabed areas.

5.4.1.4 Refinement Strategies

In Sect.Ā 5.4.1.3, we saw that a tighter tolerance increased the number of surface coefficients considerably at later iteration levels without improving the accuracy significantly. The effect of the extra refinement is low. Similar results were also found inĀ [Sky22]. A rapid introduction of new meshlines leads to more coefficients for similar accuracy, but also a lower computational time. A slower pace in introducing new degrees of freedom often led to few coefficients and an acceptable computation time, while a very restrictive introduction could block further accuracy improvements and eventually lead to more surface coefficients that contribute little to an accurate approximation.

Table 5.6 Refinement strategies and associated accuracy results

TableĀ 5.6 illustrates how different refinement strategies for defining new meshlines influence the approximation results. We stop the iteration after the surface has reached 20,000 coefficients. The number of iterations required is reported in column two. For the strategies whose name starts with F and Mc, the refinement is triggered by mesh cells that contain points with a residual value larger than the tolerance. For strategies starting with S and R, refinements are triggered for B-splines having such points in their support. If the strategy is marked by ā€œallā€, all such occurrences will lead to refinement while ā€œtnā€ indicates that only mesh cells or B-spline supports with a relatively high number of out-of-tolerance points combined with a large distance to the surface will trigger refinement. Strategies marked with B will refine in both parameter directions at each iteration step while strategies marked with A will refine in alternating parameter directions. Strategies starting with F are full span strategies meaning that all B-splines having the identified cell in its domain are split. Mc are minimum span strategies. Here, only the one B-spline is defined to be refined and the criterion is a combination of size and number of associated out-of-tolerance points. For S strategies the identified B-spline is refined in all knot spans, while for R the knot spans containing most out-of-tolerance points are refined. McA tn is the most and SB the least restrictive refinement strategy in the list. We refer to Chap.Ā 3 for more details on each refinement strategy.

In TableĀ 5.6, we compare the results for the different strategies after the last iteration. We see that the ā€œAā€ strategies always need a higher computational time than the ā€œBā€ since the refinement in ā€œAā€ is performed in each direction separately implying that the number of coefficients to estimate is higher. However, the final number of coefficients relative to the accuracy tends to be lower for the ā€œAā€ methods and the efficiency is higher. For this data set, SB has the lowest computational time but a high number of coefficients and the poorest efficiency among the recorded strategies. The best approximation efficiency is found for FA (marked with bold font). However, the efficiency does not take the value of the residuals into account as long as it is smaller than the prescribed tolerance; here the actual distance could be considered as well. We see that some methods will have lower computational time than FA. Thus, if the time is regarded as more important than the number of surface coefficients, FB and McB are good alternatives, preferably with some restrictions on the mesh cells that trigger refinement (tn). The results in this experiment fall well in line with the conclusions inĀ [Sky22]. The choice of the refinement strategy could be also seen as a model selection problem, following the concept described in Chap.Ā 4.

Fig. 5.14
figure 14

Computation of the trimmed surface. a The point cloud (in khaki green) is recursively divided into subsets of the point cloud and bounded by polygons, b the composite polygons bounding the entire point cloud, c the polygons are approximated by a set of spline curves, d the final trimmed surface

5.4.2 Dealing with Missing Points and Voids: Trimming

In Computer Aided Design (CAD), trimming is used to remove extra lines or extra parts of an object, see, e.g., Marussig and HuguesĀ [Mar18] for an overview of methods in Isogeometric Analysis (IgA). Trimming aims to optimize the modeling and visualization of the approximated surface. Here we apply it to handle data gaps and ā€œcutā€ the domains where no points were available for fitting. Often, this would have led to unfavorable ripples or voids as the algorithm tries to approximate without data support. We note that the parameterization and the mathematical description of the surface remain unchanged after trimming. We summarized the principle of trimming as follows:

  1. 1.

    We bound the points by curves in the xy-plane. These curves are often B-splines curves or NURBS.

  2. 2.

    The curves are arranged in one loop for the outer boundary and one loop for each hole and associated to the parameter domain (xy-plane for points parameterized by their x- and y-values) of the surface.

  3. 3.

    The outer loop is counter clockwise oriented, while eventual inner loops are clockwise oriented. By convention only the areas of the surface situated to the left of such trimming loops are considered valid. Consecutively, the loops divide the resulting trimmed patch into distinct parts where the direction of the curves tells which parts of the domains are visible or not.

FigureĀ 5.14 explains the computation of trimming loops and the trimmed surface. A polygon of horizontal and vertical lines in the xy-plane surrounding the points is computed in a recursive procedure. Depending on the density of the point cloud, a maximum recursion level is selected. A dense point cloud allows more recursions and consequently a more accurate polygon. The point cloud is recursively divided into blocks as shown in Fig.Ā 5.14a. Here the maximum recursion level is two. The boundary lines of the blocks containing points are collected, while removing lines that occur twice. This happens when two adjacent blocks contain points. The resulting lines are sorted to create one or more polygons, see Fig.Ā 5.14b. In Fig.Ā 5.14c, the polygons are divided into pieces, each being approximated by a spline curve, and finally, in Fig.Ā 5.14d, the trimmed surface is shown.

5.5 Conclusion

Adaptive LR B-spline surface approximation is a flexible method to ā€œtransform data into informationā€. Within a context of approximating geospatial data, huge, noisy and scattered data set from terrains or seabeds can be represented in a compact way. The surface approximation with LR B-splines has following advantages:

  1. 1.

    The computational time is manageable.

  2. 2.

    The data storage is strongly simplified: Millions of points are condensed into a manageable number of coefficients to estimate.

  3. 3.

    The adaptive approximation method is flexible. The MBA can be combined with the LS approximation. Here the LS method is used in the first iterations, and the smoothness term can be adapted to avoid fitting of noise. In the last iterations, the MBA allows an explicit yet very accurate fitting. Because it has similarities with the L1 noise, outliers and data gaps can be optimally handled to keep the approximation smooth. This property is often needed for geospatial data set.

  4. 4.

    The refinement methods can be adapted depending on the data at hand (point density, presence of noise or outliers). Different parameters such as the tolerance, the polynomial degrees of the spline surface or the refinement strategies can be chosen individually.

  5. 5.

    The fit of the approximation can be judged using simple statistical concepts such as the mean absolute distance, the number of points outside tolerance or the maximum error. Additional statistical quantities, such as information criterion can provide orientation for optimizing the surface approximation.

  6. 6.

    The format is flexible and allows an export as TP B-spline surface in usual GIS software.

  7. 7.

    The C++ functions are freely available to permit a wide usage of the LR B-spline surface approximation, up to individual adaptation of the algorithms.

In this chapter, we have highlighted these properties and approximated a data set composed of seabed and terrain data recorded from sensors having different noise properties. More specifically:

  1. 1.

    We have compared different pre-processing strategies to eliminate outliers, and found that the method identifying single outlier points with no direct link to statistical methods hits the target best: it reduced the risk of eliminating features that need to be approximated but found real outliers.

  2. 2.

    We have developed the concept of adaptive approximation, starting from a coarse mesh. A refinement is performed in cells where the error between the mathematical surface and the points exceeds a predefined tolerance.

  3. 3.

    We have highlighted how to deal with data voids that are a common challenge for many GIS data set. Here the point density may be so low that no plausible surface approximation can be performed. We have highlighted that MBA performs well in such cases. It is a computational advantageous method as no minimization has to be done.

  4. 4.

    We compared different parameters set up to achieve the best goodness of fit, e.g., the tolerance, the number of maximum iterations, or the refinement strategy. We have investigated different refinement strategies and shown that the FA (full span refinement in one direction at each iteration) was more favorable. We further showed how the tolerance affects the noise fitting.

  5. 5.

    We explained how a trimming can be performed to cut domains without points for which the fitting is unfavorable (ripples, oscillations).

The result of the surface approximation with LR B-splines is a mathematical surface with few coefficients in comparison to the huge number of points to approximate. The surface describes the underlying ground with high accuracy, which can be assessed by means of simple statistical quantities. Ongoing research tries to find the most optimal surface with respect to the data at hand by setting, e.g., the tolerance less empirically. To that aim, concepts developed in Chap.Ā 4 can be used for smooth and homogeneous point clouds. In Chap.Ā 6, we will present further applications of the LR B-spline surface approximation, such as deformation analysis with LR B-spline volume, or the drawing of contour lines from the mathematical model.