A Statistical Criterion to Judge the Goodness of Fit of LR B-Splines Surface Approximation

Kermarrec, Gaël; Skytt, Vibeke; Dokken, Tor

doi:10.1007/978-3-031-16954-0_4

Gaël Kermarrec⁹,
Vibeke Skytt¹⁰ &
Tor Dokken¹⁰

Part of the book series: SpringerBriefs in Earth System Sciences ((BRIEFSEARTHSYST))

1061 Accesses

Abstract

The surface approximation obtained with adaptive strategies using locally refined (LR) B-splines depends on the degrees of freedom of the spline space, the tolerance from which the refinement is performed, the noise level of the scattered observations, the refinement strategy and the bidegree of the spline space. The choice of the best model is a challenging task that can be partially answered with statistical criteria, such as the Akaike Information Criterion (AIC). Here we relax the assumption that the approximation error should be normally distributed and with equal variance and propose the use of the student distribution to compute the AIC. We apply the AIC to decide which tolerance, refinement level, or polynomial bidegree are the most adequate for an optimal fitting. We highlight how the resulting AIC can be combined with more usual criteria to judge the goodness of fit of the surface approximation.

You have full access to this open access chapter, Download chapter PDF

Keywords

4.1 Introduction

A surface approximation of a point cloud can be done either globally (non-adaptive methods), or with locally adaptive methods. The adaptive surface fitting with LR B-splines used in this SpringerBrief belongs to the latter category. LR B-splines can be viewed as a generalization of univariate non-uniform B-splines, see [Dok13] for more details. The approximation of point clouds is performed step by step and the mathematical surface at each new iteration step depends on the result of the previous one [Sky22]. Contrary to Non Uniform Rational B-splines (NURBS) surfaces, local refinement is allowed. This method avoids overfitting in domains where no further refinement is necessary. We summarize the principle of adaptive surface fitting as follows:

The starting point is a Tensor Product (TP) B-spline space. This is used for defining the initial LR mesh and the first collection of TP B-splines.
The LR mesh is successively refined by inserting new meshlines: The first time in the initial LR mesh, later in the refined LR mesh. This insertion of meshlines is triggered from mesh cells, where at least one observation is associated with an error term higher than a given tolerance. The new meshline is extended to ensure that the support of at least one B-spline is completely traversed. The choice of the tolerance is linked with the level of accuracy needed, balanced by the computation time and number of surface coefficients to be estimated.
The point cloud is approximated and the result is an LR B-spline surface.
After a given number of iteration steps or as soon as no error term exceeds the tolerance, the final surface is computed.

The multilevel approximation (MBA) proposed in [Lee97] is often combined with a least-squares (LS) approximation in the first steps, see Chap. 3 for more details on the procedure.

The output surface depends on different parameters that are often chosen empirically. In this chapter, we will introduce a statistical criterion called the Akaike Information Criterion [Aka73] to judge the goodness of fit of the approximation, in addition to more usual values, such as the number of point outside tolerance or the mean absolute error (MAE). We further propose to investigate how the tolerance can be chosen with respect to the level of noise in the point clouds.

The remainder of the chapter is as follows: In the first section, we will describe the penalized model selection criteria within the context of surface approximation. The student or t-distribution will be introduced to face the challenge of outliers. Dedicated examples will show the potential of AIC as a global indicator for the goodness of fit.

4.2 Surface Approximation and Penalized Model Selection Criteria

To illustrate the challenges of choosing an optimal model in the sense of AIC, we will discuss two approaches: With and without a penalty term regarding the number of coefficients. We start with a data set of size $n_{obs}$, which we approximate with an LR B-spline surface by setting, e.g., the tolerance, the maximum number of iterations, the refinement strategy, and the polynomial bidegree of the spline. We call the result of the fitting a model and consider k possible models. The vector ${\boldsymbol{c}}_k$ contains the estimated coefficients and has a length $n_{cp_k}$. Each model has its own likelihood $L\left( {\boldsymbol{c}}_k\right) $: This associates a numerical value to the question how “likely” the model is to the observations. It is convenient to work with the log-likelihood function for the model with the estimates ${\boldsymbol{c}}_k$, which is defined as $l\left( {\boldsymbol{c}}_k\right) ={\log \left( L\left( {\boldsymbol{c}}_k\right) \right) \ }$. The likelihood is a measure of goodness of fit and has a meaning only when it is compared with another likelihood computed for another model.

Approach 1 without penalty term: Case 1

When performing a surface approximation, one could search for the optimal refinement level, i.e., the iteration step from which the algorithm should be stopped because the optimal model has been found. Here we would call model 1 the approximation at level 1, model 2 at level 2. To each model is associated a likelihood, computed from the parameter vector of estimated coefficients. As the iteration step increases, its length will increase accordingly, but the corresponding likelihood may increase only slightly. Searching for the minimum of the likelihood without penalizing for the number of coefficients could lead to an overfitting and ripples in the approximated surface.

Approach 1 without penalty term: Case 2

If we make a first approximation of a scattered point cloud with a tolerance of 0.01, we obtain a parameter vector ${\boldsymbol{c}}_1$ of length $n_{cp1}$; The approximation has a likelihood $L\left( {\boldsymbol{c}}_1\right) $. In parallel, we can compute a second model by changing the tolerance to 0.005. Its likelihood is $L\left( {\boldsymbol{c}}_2\right) $, with ${\boldsymbol{c}}_2$ of length $n_{cp2}\gg n_{cp1}$. For both models, we stop the refinement after 5 iterations. Usually $L\left( {\boldsymbol{c}}_1\right) \ne L\left( {\boldsymbol{c}}_2\right) $ and we could state that $L\left( {\boldsymbol{c}}_1\right) <L\left( {\boldsymbol{c}}_2\right) $. This would lead to the conclusion that the second model is more appropriate to fit the data as its likelihood is higher. This statement is partially true: The number of coefficients for the second model is much higher than for the first one. This difference may be unfavorable (i) from a computational point of view, (ii) if overfitting should be avoided due to the presence of noise in the data, or (iii) if a lean model is preferred for storage or subsequent use. A too high number of coefficients should be avoided as ripples and oscillations may occur in the fitted surface.

The penalized criteria address the drawbacks raised in the first approach. In their simple form they are called the Bayesian Information Criterion (BIC) [Sch78] or the Akaike Information Criterion (AIC) [Aka73]. The two criteria are defined as:

$$\begin{aligned} BIC_k=-2l\left( {\boldsymbol{c}}_k\right) +{\log \left( n_{obs}\right) \ }n_{cpk} \text { and } \end{aligned}$$

(4.1)

$$\begin{aligned} AIC_k=-2\left[ l\left( {\boldsymbol{c}}_k\right) \right] +2n_{cpk}, \end{aligned}$$

(4.2)

respectively. They can be seen as statistical alternatives to more usual heuristic considerations: The first term in Eqs. 4.1 and 4.2 is the log-likelihood, i.e., a measure of the goodness of fit to the data. The second term is a penalty term, which accounts for the increase in complexity. When k models are compared with each other, the model with the smallest IC is chosen. Choosing the best model within the framework of IC can be seen as finding a balance between these two quantities. The reader is referred to [Bur02] for the detailed derivation of the IC. In the following, we come back to the two cases with the second approach which accounts for a penalty term.

Approach 2 with penalty term: Case 1

For case 1, we can assume that the likelihood will saturate after a given number of iterations. At the same time, the number of coefficients will still strongly increase with each iteration step. It is likely that a minimum of the BIC and/or the AIC occurs, balancing both values.

Approach 2 with penalty term: Case 2

For case 2, only two models are compared with each other, the choice of the most optimal model is easy to meet if both $AIC_2>AIC_1$ and $BIC_2>BIC_1$, i.e., it can be concluded on the superiority of model 2 with respect to model 1: A tolerance of 0.005 is more optimal than 0.01 for approximating the data at hand, within the context of model selection with IC.

Potentially the BIC and the AIC may come to two different conclusions, i.e., $AIC_2<AIC_1$ and $BIC_2>BIC_1$. For case 1, this could be that the 3rd step is more optimal for BIC and the 5th for AIC. It is often stated that the BIC underestimates the optimal number of parameters to estimate. On the contrary, the assumption beyond the AIC is that the true model is unknown and unknowable. AIC is good for making asymptotically equivalent to cross-validation, and BIC for consistent estimation. In case of disagreement of the two criteria, other measures of goodness of fit should be added within the context of surface fitting, such as the MAE, the maximum error ${Max}_{errk}$, or $n_{outk}$.

In the following, we skip the subscript k for the sake of readability. We refer to Chap. 3 and recall that the following indicators to judge the goodness of fit will be used additionally:

1.
The mean absolute error (MAE) defined as $MAE=\frac{1}{n_{obs}}\sum _{i=1}^{n_{obs}} |z_j - \hat{z_j}|$, $\textbf{z} = \{z_j\}_{j=1}^{n_{obs}}$ and $\hat{\textbf{z}} = \{\hat{z}_j\}_{j=1}^{n_{obs}}$. We have $\hat{\textbf{z}}$ is the estimated z-component of the point cloud obtained after the kth iteration.
2.
The maximum error is given by ${Max}_{err}={\max \left\| \hat{\textbf{z}} -\textbf{z}\right\| }$,
3.
The number of points outside a given tolerance: $n_{out}$,
4.
The degree of freedom or number of control points $n_{cp}$ estimated for a given iteration step of the refinement,
5.
The computational time CT. We have used a stationary desktop with 64 GB of DDR4-2666 RAM. It has a i9-9900 K CPU with 8 cores and 16 threads, but a single core implementation is used in the experiments.

These indicators are described in Chap. 3. Here we propose to highlight how they can be used in combination with the AIC to provide a weighted conclusion about the goodness of fit of the surface approximation.

4.3 Improving Information Criterion for Surface Approximation

We consider that AIC is an adequate criterion for model selection in the field of surface approximation as the true underlying surface is unknown. The risk of underestimation of the number of coefficients with the BIC should be avoided as details may not be revealed properly. If the AIC does not have a minimum, deeper investigations could be needed by changing the setup of the surface approximation (bidegree of the splines, refinement strategy, see Chap. 3). From now, we will only consider the AIC and search for its minimum when comparing k models.

4.3.1 The Challenge of Normality

The likelihood function is often taken to the Gaussian one, assuming the residuals of the surface approximation to be normally distributed. Unfortunately, this strong belief, when violated, can lead to a biased AIC. This compromises the correct and in-dubious determination of the AIC minimum and the choice of the most adequate model among a set of candidates. We propose to use the t-distribution (also called student’s distribution), which gives more probability to observations in the tails of the distribution than the standard normal distribution [McN06]. This allows to give different weights to points outside the tolerance in the surface approximation, such as outliers. The t-distribution is defined by three parameters: $\mu $ its mean, $\sigma $ its variance and $\nu _{t} $ the degree of freedom of the distribution. The normal distribution is a special case of the t-distribution when the degree of freedom approaches infinity. The parameters of the t-distribution $\mathbf {\theta }=[\mu ,\sigma ,\nu _t]$ cannot be expressed in a closed form. A stable approach to estimate them is via the iterative two-steps EM (Expectation Maximization) algorithm. In that case, we assume the observations to be independently and identically distributed. The so-called E-step computes the expected value of $l\left( \textbf{p}\right) $ given the observed data whereas the M-step consists of maximizing the expectation computed over the parameters to estimate. In most cases, the algorithm converges to a local maximum [Liu95].

The log-likelihood of the density of $\textbf{r}$, with $\textbf{r} =\left[ r_{1} ,...,r_{n_{obs}} \right] ^{T} $, the residuals of the surface approximation, which are assumed to come from the t-distribution $t\left( \mu ,\sigma ,\nu _{t} \right) $, is given by:

$$\begin{aligned} {\log \left( L\left( \boldsymbol{r}\right) \right) \ }= & {} n_{obs}\left( {\log \Gamma \ }\left( \frac{\nu _t+1}{2}\right) \right) -n_{obs}{\log \Gamma \ }\left( \frac{\nu _t}{2}\right) -\frac{1}{2}n_{obs}\log \sigma ^2 \nonumber \\{} & {} +\frac{1}{2}n_{obs}\nu _t{\log \left( \nu _t\right) \ }-\frac{\nu _t+1}{2}\sum ^{n_{obs}}_{i=1}{{\log \left( \nu _t+\delta \left( r_i; \mu , \sigma \right) \right) \ }} \end{aligned}$$

(4.3)

with $\Gamma $ the Gamma function and $\delta \left( r _i; \mu , \sigma \right) =\frac{{\left( r _i-\mu \right) }^2}{\sigma }$, the standardized Mahalanobis square distance, refer to [Mah36] for more details.

The proposed AIC depends on the statistical properties of the approximation error of the fitted LR-surface: The size of the observation vector $n_{obs} $, the number of coefficients $n_{cp}$, the refinement strategy, the bidegree of the spline space, and the parameter of the t-distribution.

4.3.2 An Improved AIC for Surface Approximation

In real applications, the true model is unknown. It is easier to assess the potential of statistical criteria such as the AIC within the framework of simulations. We have chosen the two different point clouds presented in Chap. 3 to illustrate how the AIC can be used to judge the goodness of fit, as an alternative to the usual indicators ($n_{cp}$, ${Max}_{err}$, $n_{out}$, or MAE). We will investigate the optimal:

1.
number of iterations,
2.
refinement strategy,
3.
bidegree of the spline,
4.
tolerance with respect to the noise level.

4.4 The AIC to Choose the Settings for Surface Approximation of Scattered Data

The two reference surfaces used in the following correspond to (a): Smooth geometry and (b): Geometry with sharp edges. For each generated point cloud, we simulated $n_{obs}=40{,}000$ scattered data points $\left( x_i,y_i,z_i\right) $, $i=1...n_{obs}$. Both are illustrated in Fig. 4.1. In the following, all values will be given in m, if not specified differently in the text. The z-component of point cloud (a) is given by

$$\begin{aligned} z=\frac{\tanh {\left( 10y - 5x \right) }}{4} + \frac{1}{5 {\text {e}}^{(5x -2.5)^2 + (5y-2.5)^2}}. \end{aligned}$$

(4.4)

The point cloud (b) is generated by letting:

$$\begin{aligned} z = \frac{1}{3{\text {e}}^{\sqrt{(10x-3)^2+(10y-3)^2}}} + \frac{2}{3{\text {e}}^{\sqrt{(10x+3)^2+(10y+3)^2}}} + \frac{3}{3{\text {e}}^{\sqrt{(10x)^2+(10y)^2}}}. \end{aligned}$$

(4.5)

To mimic real data, we add a Gaussian noise of standard deviation 0.002 m in the z-direction.

4.4.1 Number of Iteration Steps

Here we investigate the optimal level of refinement with the AIC. The results are presented in Table 4.1 for point cloud (a) and Table 4.2 for point cloud (b).

For point cloud (a), the number of points outside tolerance $n_{out}$ was 0 after 15 iterations for a tolerance of 0.007 and a final MAE of 0.0016 m. The AIC has a local minimum at level 7 and a new minimum at level 11. The minimum at level 7 corresponds to the stage where the MAE saturates. This highlights the coherence of the different indicators for the smooth and homogeneous surface under consideration.

Table 4.1 Investigation on the AIC by varying the iteration level for a given tolerance of 0.007 for point cloud (a)

Full size table

For point cloud (b), we found a minimum of the AIC at the 13th iteration level for the tolerance 0.007. Here there was no point outside tolerance after 14 iteration steps and the MAE reaches 0.0016. We note that at the 8th iteration, there is a turning point and both the MAE and the AIC saturate. Increasing the number of level of iterations leads to more coefficients (1650 at the 8th iteration step versus 2292 at the 14th) but not a strong improvement of the fitting. However, the improvement seems significant enough so that the AIC, which balances $n_{cp}$ versus the likelihood, has a weak minimum at the 13th iteration step. We link this findings with the challenging geometry of the point cloud with peaks. The results are presented in Table 4.2 together with the other indicators for the sake of comparison.

Table 4.2 Investigation on the AIC by varying the iteration level for a given tolerance of 0.007 for point cloud (b)

Full size table

4.4.2 Refinement Strategy

In Chap. 3, we presented a set of refinement strategies that can be implemented with LR B-splines. We will here investigate two of them in the context of optimal surface fitting using AIC to judge the goodness of fit.

1.
FA for which the refinement is performed alternatively in one of the two parameter directions,
2.
FB for which the refinement occurs in both parameter directions at each iteration level.

The potential number of new coefficients at each iteration level is much less for FA compared to FB. For FA more iterations are expected to reach an acceptable accuracy. However, Skytt et al. [Sky15] show that this reduced pace in the introduction of new coefficients will lead to surfaces with fewer coefficients. Here the two refinement strategies can be considered as two models within the AIC framework as they are not equivalent, i.e., they lead to different residuals and likelihood. In the following, we set the tolerance to 0.007, the bidegree of the spline to (3,3) and the maximum iterations to 20. We compare two FA and FB refinement strategies to highlight the flexibility of the setting.

Point cloud (a)

We found that FB has a minimum AIC at the 7th iteration step but this latter starts to saturate at the turning point from which $n_{cp}$ begins to increase strongly (4th iteration step), see Fig. 4.2. For FA, the $n_{cp}$ increases at a slower pace compared to FB. The AIC has a weak minimum at the 15th iteration but saturates from the 6th one, as shown in Fig. 4.2.

The MAE for FA is 0.0016 after 20 iterations and a CT of 7.7 s. For FB, after 9 iterations and 3.8 s, the MAE reaches a comparable value of 0.00157. For both strategies, there is no point outside the tolerance at those iteration steps. Thus, FB is more favorable from a CT perspective. The computation times include computing the AIC. If AIC is omitted, the times are 1.10 and 0.95 s for FA and FB, respectively. However, the number of coefficients $n_{cp}$ is much higher for the FB strategy (7024 vs. 4655 at the optimal iteration step). To compare, 4932 coefficients had to be estimated at the 4th iteration step with the FB strategy, and 173 points were still outside tolerance versus 2 points for the FA and 4655 coefficients.

We further note that the minimum of AIC for FB at the (from the AIC perspective) optimal 7th iteration is higher than for FA (− 416,573 vs. − 419,125 for FA at the optimal 15th iteration). This difference would indicate that FA is more optimal from a statistical criterion perspective than FB. This choice has to be weighted from a practitioner perspective, i.e., answering the question if more accuracy is needed or not, if the CT is an important criterion or not, and taking into consideration the challenge of overfitting. There is no definitive answer as the truth does not exist. It is a question of interpretation.

Point cloud (b)

For the point cloud (b), we find that FB has a minimum AIC at the 7th iteration step. It starts to saturate at the turning point (5th iteration step) from which $n_{cp}$ begins to increase strongly, see Fig. 4.3. As the MAE, the AIC for FA and FB reaches a weak minimum, which indicates a fitting that can hardly be considered as optimal. We found that the MAE and the AIC for FA reach slightly lower values than for FB: for the MAE we found 0.0016 versus 0.0015 and for the AIC − 421,830 versus − 411,504 for FA and FB, respectively. The AIC can, thus, allow to conclude on the superiority of the FA strategy for point cloud (b) but the results differ slightly if we only consider the MAE as a criterion to judge the goodness of fit. This highlights the importance of accounting for $n_{cp}$ to balance the likelihood. However the difference from a computational point of view for FA to reach the minimum is significantly higher than for FB: 1.437 s for FB versus 4.523 s for FA. The recording of the computational time includes computation of the AIC.

The previous results would tend to indicate that FA is more optimal than FB for fitting point clouds with LR B-spline surfaces. This is partially true and has to be weighted against CT. The the two examples clearly highlights that the fitting with the FB strategy produces more coefficients than FA for a similar accuracy while FA has a higher CT than FB. Still, this justifies our choice of using FA in the previous (and following) sections without lack of generality. This highlights, also, that new criterion should be found that also would also account for the CT to judge and balance the goodness of fit.

4.4.3 Tolerance

A proper tolerance is important for surface fitting: A large tolerance will make the process faster but may lead to underfitting, a smaller tolerance will increase the accuracy of the fitting result but costs more time, i.e., the fitting surface will be more complex, not to speak of the risk of overfitting. Hence, we can use AIC as the criterion to compare with the usual indicators and weight the number of parameters versus global accuracy. The fitting with minimum AIC is the optimal tolerance in a global sense.

In this section, we show the potential of AIC for investigating the tolerance. Here the standard deviation of the noise is taken as previously to be 0.002 m in the z direction. We vary the tolerance within a range from 0.005 to 0.011. We use refinement strategy FA and polynomial bidegree (2,2) and focus on point cloud (a). Similar conclusions could be drawn for point cloud (b) and are not presented here. Table 4.3 gives the AIC, as well as the iteration level with no point outside tolerance. For each tolerance, we set the number of maximum iteration steps to 20. For example, when the tolerance is 0.01, the approximation will continue until 14th iteration step, but the minimum AIC is reached at the 7th step. The AIC decreases with the tolerance and has a minimum for a tolerance of 0.007, which is illustrated in Fig. 4.4. This value was the optimal tolerance chosen for Table 4.1. We further note that the MAE stays around 0.0017 for all tolerances at the optimal number of iterations, and has a weak minimum at a tolerance of 0.006. This result is compatible with the results given by the AIC.

Table 4.3 Investigation on the AIC by varying the tolerance

Full size table

4.4.4 Polynomial Bidegree of the Splines

In this section, we vary the bidegree of the splines from (2, 2) (biquadratic) to (3, 3) (bicubic), which are usual choices for performing surface fitting. This corresponds to two different models within a model selection framework. We consider point cloud (a) and (b) and use the FA strategy for refinement, as well as a tolerance of 0.007. For point cloud (a) and for the optimal refinement level, we found that the biquadratic setting leads to a minimum of the AIC compared to the bicubic one (− 419,130 vs. − 419,125). From the MAE perspective, we found a value of 0.0016 for both settings at the optimal iteration step for the AIC (11th for the biquadratic and 15th for the bicubic respectively). The MAE does not decrease significantly for higher iteration steps, and, thus, does not allow to conclude in favour of a biquadratic or bicubic surface. Furthermore, a low MAE can be risky, i.e., linked with an overfitting. Here the AIC with its minimum, even if weak, has an evident advantage over the MAE to find an optimal iteration level, by weighting the likelihood with the number of coefficients.

We have the same conclusion for point cloud (b). Here the minimum of the AIC is smaller for the bidegree (2, 2) (− 422,672 vs. − 421,830) but the MAE is similar for both optimal iteration steps corresponding to the minimum of the AIC (17 for the bicubic and 14 for the biquadratic).

Skytt et al. [Sky15] mentioned that in most cases a biquadratic surface will suffice, which is in accordance with our results. Thus, in most cases a higher bidegree of the polynomial doesn’t contribute to a better accuracy of fitting LR B-spline surfaces for this type of data sets and noise levels.

4.4.5 Optimal Tolerance Versus Noise Level

Depending on the sensors and the conditions under which they are used, the noise level will vary. For a terrestrial laser scanner, the noise level of the range is known to depend on the intensity, i.e., the power of the backscattered laser signal recorded by the instrument after reflection. Atmospheric effects may also act as correlating the observations, i.e., decreasing the effective number of observations [Ker20]. The noise is often characterized by its standard deviation, a quantity which can be provided by the manufacturers. We can conjecture that a high noise level leads to a point cloud that is more challenging to fit optimally, with a strong risk of overfitting. Here we understand under overfitting “fitting the noise” instead of the true underlying surface. This effect is unwanted as it can give surfaces with ripples and oscillations [Bra20]. A wise choice of the tolerance can avoid or strongly mitigate the risk of overfitting. Thus the tolerance is an important parameter which is usually fixed rather empirically. Often, a low MAE is searched. Unfortunately, an artificially small error is not automatically linked with a high accuracy for fitting the underlying point cloud: In case of noise or outliers in the observations, even the contrary may happen.

We propose to investigate the choice of an optimal tolerance in the context of model selection, searching for a minimum of the AIC. To that end, we simulated different Gaussian noise vectors added to the reference point cloud. Their standard deviation was varied in a range of values between 0.001 m (low level of noise) and 0.0045 m. The noised surfaces were fitted with an LR B-spline surface. We chose the FA strategy and a biquadratic surface, following the results of the previous sections.

Here we vary the tolerance for a given noise level and search for the minimum AIC. Each AIC is computed at the optimal iteration step. We place ourselves in the framework of Monte Carlo simulations by simulating each time 100 noise vectors and taking the mean over all indicators.

The results of the investigations for point cloud (a) are presented in Fig. 4.5.

Figure 4.5 highlights that the optimal tolerance found with the AIC depends on the standard deviation of the noise level. As the noise level increases, the optimal tolerance increases, and so the AIC. We found a linear dependency of the optimal tolerance with respect to the noise level with a slope of 3 (left axis in Fig. 4.5a). This slope is slightly lower (close to 2) as the noise level increases. A similar result was found for point cloud (b) and is not presented here. The slope of 3 can be justified as corresponding to 3 times the standard deviation of the noise, i.e., this is the interval in which 68% of the measurements will fall assuming their normal distribution. We found that the number of optimal iteration steps stays between 6 and 7 and decreases as the noise level increases. This is an important finding as it is unnecessary -if not risky- to continue the adaptive refinement for noisy point clouds. This is what the AIC tells us. We further computed the MAE at the iteration step considered as optimal from the AIC, see Fig. 4.5b. We found a linear dependency, with a slope of 0.78. This latter is less predictable than the previous one regarding the noise level and will depend on the point cloud under consideration.

Following these results, we propose to choose the optimal tolerance as being 2.5 times the noise level. This is a good compromise when the noise of the sensor is unknown. Three times the noise level would be even more conservative and has to be weighted against a potential loss of accuracy.

4.5 Conclusion

In this chapter, we have introduced a statistical criterion as a new tool to judge the goodness of fit of the surface approximation. An information criterion is a weighted measure between the quality of fitting and the number of coefficients that are to be estimated. We showed how the AIC can come into play for determining the optimal level of refinement, the optimal bidegree of the spline, or the choice of the refinement strategy. Exemplary, we found that a biquadratic surface is optimal for a smooth point cloud. The tolerance is often fixed empirically with the aim to have a low RMSE or MAE. We found by investigating the AIC, that the optimal tolerance depends linearly on the noise level of the point cloud and can be fixed to 2.5 times the standard deviation of the noise of the observations. This information is often given by the manufacturers or can be guessed based on the residuals of the approximation and/or previous investigations. Thus, we have provided an answer to the question of the optimal tolerance with respect to the data at hand. These results will be used in Chaps. 5 and 6.

The use of the AIC to judge the goodness of fit is beneficial when many coefficients are needed to fit a point cloud: It avoids unnecessary steps and a possible overfitting. The AIC remains a global statistical quantity which has to be combined with other indicators, depending on what “optimality” should be for the application under consideration. For point clouds with high variability and local changes, the AIC only gives a global indication about the fitting.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Proceedings of the 2nd International Symposium on Information Theory (pp. 267–281). Akademinai Kiado.
Google Scholar
Bracco, C., Giannelli, C., Großmann, D., & Sestini, A. (2018). Adaptive fitting with THB-splines: Error analysis and industrial applications. Computer Aided Geometric Design.
Google Scholar
Bracco, C., Giannelli, C., Großmann, D., Imperatore, S., Mokris, D., & Sestini, A. (2020). THB-spline approximations for turbine blade design with local B-spline approximations. ArXiv:2003.08706, https://doi.org/10.48550/arXiv.2003.08706
Burnham, K. P., & Anderson, D. A. (2002). Model selection and multimodel inference. Springer.
Google Scholar
Dokken, T., Pettersen, K. F., & Lyche, T. (2013). Polynomial splines over locally refined box-partitions. Computer Aided Geometric Design.
Google Scholar
Dokken, T., Skytt, V., & Barrowclough, O. (2019). Trivariate spline representations for computer aided design and additive manufacturing. Computers & Mathematics with Applications, 78, 2168–2182.
Google Scholar
Kermarrec, G., Kargoll, B., & Alkhatib, H. (2020). Deformation analysis using B-spline surface with correlated terrestrial laser scanner observations: A bridge under load. Remote Sens.
Google Scholar
Lee, S., Wolberg, G., & Shin, S. Y. (1997). Scattered data interpolation with multilevel B-splines. IEEE Transactions on Visualization and Computer Graphics, 3(3), 229–244.
Article Google Scholar
Liu, C., & Rubin, D. B. (1995). ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica, 5, 19–39.
MathSciNet MATH Google Scholar
Mahalanobis, P. C. (1936). On the generalised distance in statistics. Proceedings of the National Institute of Sciences of India, 2(1), 49–55.
MathSciNet MATH Google Scholar
McNeil, A. J. (2006). Multivariate t-distributions and their applications. JASA, 101(473), 390–391.
Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–646.
Article MathSciNet MATH Google Scholar
Skytt, V., Barrowclough, O., & Dokken, T. (2015). Locally refined spline surfaces for representation of terrain data. Computers & Graphics.
Google Scholar
Skytt, V., & Dokken, T. (2022). Scattered data approximation by LR B-spline surfaces. A study on refinement strategies for efficient approximation. In C. Manni & H. Speleers (Eds.), Geometric challenges in isogeometric analysis (Vol. 49) Springer INdAM Series.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Meteorology and Climatology, Leibniz University Hannover, Hanover, Germany
Gaël Kermarrec
Mathematics and Cybernetics, SINTEF Digital, Oslo, Norway
Vibeke Skytt & Tor Dokken

Authors

Gaël Kermarrec
View author publications
You can also search for this author in PubMed Google Scholar
Vibeke Skytt
View author publications
You can also search for this author in PubMed Google Scholar
Tor Dokken
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaël Kermarrec .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kermarrec, G., Skytt, V., Dokken, T. (2023). A Statistical Criterion to Judge the Goodness of Fit of LR B-Splines Surface Approximation. In: Optimal Surface Fitting of Point Clouds Using Local Refinement. SpringerBriefs in Earth System Sciences. Springer, Cham. https://doi.org/10.1007/978-3-031-16954-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-16954-0_4
Published: 15 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16953-3
Online ISBN: 978-3-031-16954-0
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

A Statistical Criterion to Judge the Goodness of Fit of LR B-Splines Surface Approximation

Abstract

Keywords

4.1 Introduction

4.2 Surface Approximation and Penalized Model Selection Criteria

4.3 Improving Information Criterion for Surface Approximation

4.3.1 The Challenge of Normality

4.3.2 An Improved AIC for Surface Approximation

4.4 The AIC to Choose the Settings for Surface Approximation of Scattered Data

4.4.1 Number of Iteration Steps

4.4.2 Refinement Strategy

4.4.3 Tolerance

4.4.4 Polynomial Bidegree of the Splines

4.4.5 Optimal Tolerance Versus Noise Level

4.5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation