Introduction

As well known, laser-scanning survey is characterised by a fully automatic point clouds acquisition method, while the successive processing phases of registration, classification and segmentation require some levels of human intervention. To furnish a contribution in the direction of a fully automatic laser data processing, a new reliable geometrical classification of the point clouds is proposed in this paper. The work fits in the recent researches conducted by the authors, whose analytical aspects have been mainly presented to the statistician community (Crosilla et al. 2007) and whose laser-scanning applications have been shown at various International Society for Photogrammetry and Remote Sensing (ISPRS) meetings (Crosilla et al. 2004, 2005; Visintini et al. 2006; Beinat et al. 2007).

The procedure of automatic classification proposed by the authors is fundamentally based on the local analysis of the Gaussian K and mean H curvatures, obtained by applying a non-parametric analytical model. In detail, the Z(X,Y) measured coordinate of each point is modelled as a Taylor’s expansion of second-order terms of X,Y local coordinates. The weighted least squares estimate of the unknown vector, collecting the differential terms, is obtained by considering a selected number of neighbour points within a bandwidth radius and by applying a weighting function, taking into account their distance from the central point.

Since the instrumental noise worsens the data quality and the analytical modelling simplifies the surface true shape, the curvature values have to be statistically verified, namely also the variances of the estimated values have to be taken into account, as recommended by Flynn and Jain since 1988 and recently by Hesse and Kutterer (2005), these last specifically for the form recognition of laser-scanned objects.

Neglecting for simplicity the presence of outliers in the neighbour points, to verify the fulfilment of the second-order Taylor’s expansion model, a chi-square ratio test is applied to the estimated variance factor and to the a priori measurement variance.

If the null hypothesis is accepted from the locally estimated surface differential terms, the corresponding local Gaussian K and mean H curvature values are obtained, as well as the principal curvatures. As known, such curvature values are invariant to the reference frame.

A statistical analysis of the two-term vector, containing the Gaussian and the mean curvature values, is carried out by applying the variance–covariance propagation law, so to compute their covariance matrix. A Fisher ratio test is subsequently applied to verify the significance of the obtained curvature values vector. If the null hypothesis is accepted, the surface can be locally accepted as planar. If the null hypothesis is rejected, a ratio test for each K and H curvatures is carried out.

By simultaneously analysing the sign and the values of K and H, a classification of the whole point cloud is indeed achievable, being possible the following surface basic types: hyperbolic (if K < 0), parabolic (K = 0 but H ≠ 0), planar (K = H = 0) and elliptic (K > 0).

Furthermore, an empirical optimization method of the Taylor’s expansion bandwidth size is presented. This allows computing the minimal values of K and H that can be evidenced by the F test, once a first-kind error value is fixed.

As possible development of the proposed Taylor’s model, third- and fourth-order terms can be considered. As known from literature (e.g. Cazals and Pouget 2007), third- and fourth-order series can be exploited to detect ridges, crest lines and their properties.

The paper goes on with the parametric modelling of each recognised unit by estimating the corresponding surface analytical function, starting from raw clusters detected by a region growing method.

In this case, the segmentation of geometrical units can be indirectly obtained by means of 3D spatial intersections among the estimated surfaces.

On the other hand, the segmentation can be also directly done by detecting the discontinuity lines through the analysis of the coefficient values of the Taylor’s expansion third- and fourth-order terms.

The numerical testing of the proposed procedure has been achieved with satisfactory results for simulated laser data, also with noise, belonging to the OSU Range Image database (Ohio State University: http://sampl.ece.ohio-state.edu/data/3DDB/ /RID/index.htm;).

Estimation of local surface parameters by a non-parametric regression model

Dealing with parameter estimation by regression models, the main advantage of a non-parametric approach consists in its full generality: in our case, i.e. the local estimation of the bypassing surface through the laser points, it means that neither a priori knowledge of the point geometry nor the fitting analytical function is required. Let us consider the following polynomial model of second-order terms (Cazals and Pouget 2003):

$$ Z_j = a_0 + a_1 u + a_2 v + \frac{1}{2}a_3 u^2 + a_4 uv + \frac{1}{2}a_5 v^2 + \varepsilon_j $$
(1)

where the coefficients and the parameters are locally related to a measured value Z j by a Taylor’s expansion of the function Z = μ + ε in a neighbour point i of j, as:

$$ a_0 = Z_{{0_i }} $$
$$ a_1 = \left( {\frac{{\partial Z}}{{\partial X}}} \right)_{{X_i }} $$
$$ a_2 = \left( {\frac{{\partial Z}}{{\partial Y}}} \right)_{{Y_i }} $$
$$ a_3 = \left( {\frac{{\partial^2 Z}}{{\partial X^2 }}} \right)_{{X_i }} $$
$$ a_4 = \left( {\frac{{\partial^2 Z}}{{\partial X\partial Y}}} \right)_{{X_i, Y_i }} $$
$$ a_5 = \left( {\frac{{\partial^2 Z}}{{\partial Y^2 }}} \right)_{{Y_i }} $$
$$ u = \left( {X_j - X_i } \right) $$
$$ v = \left( {Y_j - Y_i } \right) $$

with X i , Y i and X j ,Y j plane coordinates of points i and j. The parameter a 0 is the estimated function value \( Z_{{0_i }} \) at point i, while the parameters a s , with s > 0, are the first- and second-order partial derivatives along X,Y directions at the i-th point of the best approximating local surface.

To apply the Taylor’s expansion 1, the coordinate Z must be univocally defined by X,Y coordinates. Sometimes, it is necessary to apply a permutation among X,Y,Z coordinates in order to assume, as Z-axis for Eq.1, the direction whose Z values results better expressed as function of the X,Y ones. In other words, points displaced onto quasi-orthogonal surfaces are not well modelled by Eq.1.

Rewriting model 1 in algebraic form as:

$$ z = X\beta + v $$
(2)

the unknown parameters are collected into the [6 × 1] vector:

$$ \beta = \left[ {a_0 \quad a_1 \quad a_2 \quad a_3 \quad a_4 \quad a_5 } \right]^T $$
(3)

while, considering the p neighbour points j of point i, the coefficient matrix X has p rows as:

$$ X_j = \left[ {1\quad u\quad v\quad \frac{1}{2}u^2 \quad uv\quad \frac{1}{2}v^2 } \right]. $$
(4)

In order to weight the different Z j acquired values for the least squares estimation of vector β, a diagonal weight matrix W is assumed by considering a symmetric kernel function centred at the i-th point as:

$$ w_{{ij}} = {\left[ {1 - {\left( {{d_{{ij}} } \mathord{\left/ {\vphantom {{d_{{ij}} } b}} \right. \kern-\nulldelimiterspace} b} \right)}^{3} } \right]}^{3} \;{\text{for}}\;{d_{{ij}} } \mathord{\left/ {\vphantom {{d_{{ij}} } b}} \right. \kern-\nulldelimiterspace} b < 1 $$
$$ w_{ij} = 0\,{\text{for}}\,{{d_{ij} } \mathord{\left/ {\vphantom {{d_{ij} } b}} \right. } b} \ge 1 $$

where d ij is the distance between the points i and j, and b is the radius (bandwidth) of the window encompassing the p closest points to i. The value of b, rather than the kernel function, is critical for the quality in estimating β. In fact, the greater is the value of b, the smoother the regression function results, while the smaller is the value of b, the larger is the variance of the estimated value.

Finally, the weighted least squares estimate of the unknown vector β from p neighbour points results as:

$$ \hat{\beta } = \left( {X^T WX} \right)^{- 1} X^T Wz $$
(5)

The residual vector \( \widehat{v} \) for the p points within the bandwidth is simply given as \( \widehat{v} = z - X\widehat{\beta } \), and this allows computing the a posteriori variance factor \( \widehat{\sigma }_0^2 \) at point i as:

$$ \hat{\sigma }_0^2 = \frac{{\hat{v}^T W\hat{v}}}{p - 6}. $$
(6)

For each point i, this local value has to be suitably evaluated, as will be better explained in the following, in order to verify by a χ 2 test if it is comparable to the measurement noise or if it is sensible also to a systematic effect, due to limitations in the Taylor’s expansion order or due to the presence of possible outliers or data slips.

Figure1 reports the simulated scan agpart-2 as example throughout the paper chapters: it belongs to the OSU Range Image database (Ohio State University: http://sampl.ece.ohio-state.edu/data/3DDB/ /RID/index.htm). This synthetic object is composed of a cylinder having a circular cavity in the axis with a larger coaxial disk: the surfaces are thus cylindrical and planar. The simulated scan is oblique with respect to the object axis, as can be seen in Fig.1 at left, representing the view of the X,Y plan. The almost 30,000 points range along X from −2.30 to +1.68, while along Y from −1.10 to +1.76: they are coloured by the original Z i values from blue (+0.57) to red (+3.65). Coordinates of the dataset are simply real numbers with six digits to the right of the decimal point, but the unit of measurement is not defined; nevertheless, supposing that one unit corresponds to one decimetre, the cylinder of the agpart-2 model has a diameter of 22cm and a length of 33.5cm, while the larger disk has a diameter of 40cm. Points are defined on a rectangular X,Y grid with a size of ΔX = 0.0200 and ΔY = 0.0168: supposing again the unit corresponds to a decimeter, the grid steps are ΔX = 2.00mm and ΔY = 1.68mm, so simulating a high-density laser acquisition (30 points per square centimetre). It can be considered as an “error-free” point cloud, although some irregularity and lack of data occur along the connection of the various unit surfaces.

Fig.1
figure 1

Agpart-2 simulated model (OSU database): overall Z i values (at left) and along the yellow section (at right)

Figure1 at right depicts in the Y,Z plane the values of Z i along the yellow section defined for X = 5.2cm; as can be seen, decimetric data slips occur in correspondence of the occluded part of the surfaces. This situation is very common in laser-scanning measurements. To avoid the estimation of a surface simultaneously interpolating both discontinuous edges, the p closest points are found considering for the distances d ij not only their X,Y planimetric coordinates but tridimensionally involving Z coordinates also, by means of a 3D radius b centred at point i. In other words, the circular 2D window mentioned before becomes a spherical 3D volume in our procedure. Of course, the value of b has to be suitably chosen less than the data slip: in our case, the bandwidth radius b has been fixed equal to 2cm. In general, by applying a spherical encompassing volume, no smoothing effects in computing \( a_0 = Z_{{0_i }} \) are present in the data slip area as well as along the border edges of the dataset.

Estimations by formulas 5 and 6 of agpart-2 dataset lead to the values of \( \widehat{\sigma }_0^2 \) shown in Fig.2, where the points are coloured from blue (0.00mm2) to red (0.06mm2). The values of \( \widehat{\sigma }_0^2 \) different from zero arise along the discontinuity lines but not in correspondence of the decimetric data slip thanks to the 2-cm radius spherical volume, as can be also noticed in Fig.2 at right, showing the variance factor along the yellow section.

Fig.2
figure 2

Agpart-2 simulated model: overall \( \widehat{\sigma }_0^2 \) estimated values (at left) and along the yellow section (at right)

Statistical analysis of the non-parametric model applied

As mentioned before, for each laser point i, the estimated local value of the variance factor \( \widehat{\sigma }_0^2 \) is a quality index of vector β estimation process. It is crucial to verify whether, within the encompassing bandwidth, the behaviour of the corresponding residuals \( \widehat{v} = z - X\widehat{\beta } \) are due to the noise of the laser measures, to possible outliers or rather to limitations in the non-parametric model. For such aim, the following chi-square test is applied, with null hypothesis H0: \( \widehat{\sigma }_0^2 = \sigma_{\text{ls}}^2 \) and alternative hypothesis H1: \( \widehat{\sigma }_0^2 \ne \sigma_{\text{ls}}^2 \).

$$ \frac{{\hat{\sigma }_0^2 }}{{\sigma_{\text{ls}}^2 }}\left( {p - 6} \right) \le \chi_{{\left( {p - 6} \right)_{{{\text{1 - }}\alpha }} }}^2 $$
(7)

where:

  • \( \sigma_{\text{ls}}^2 \) is the variance of the laser scanning (ls) instrument employed for the data acquisition.

  • \( \chi ^{2}_{{{\left( {p - 6} \right)}_{{1 - \alpha }} }} \) is the value of the chi-square distribution for (p − 6) degrees of freedom, when α probability for a first-kind error is assumed.

The results of the chi-square test 7 depend also on the local value of \( \widehat{\sigma }_0^2 \) and on the degrees of freedom number points (p − 6) encompassed within the spherical volume. The choice of the 3D distance d ij  < b is really important for defining the number of p closest points to point i: for points lying on a X,Y grid, p strongly depends on the local slope of the surface with respect to the X,Y plane.

For instance, for agpart-2 dataset with b equal to 2cm, the number p falls down from about 300, for encompassing areas with low slope, to less than 50 for discontinuity areas between planar and very slant cylindrical surfaces, as can be seen in Fig.3 at left.

Fig.3
figure 3

Points coloured by (p − 6) values (at left) and by the results of the \( \chi^2 \) test (at right): green where H0, red where H1

Chi-square test results (Fig.3 at right) are not predictable by simply considering the \( \widehat{\sigma }_0^2 \) values (Fig.2 at left). First of all, such values have to be locally multiplied by the corresponding (p − 6) values (Fig.3 at left), divided by the measurement variance \( \sigma_{\text{ls}}^2 \) and finally compared to the critical value of \( \chi_{{\left( {p - 6} \right)_{{{\text{1 - }}\alpha }} }}^2 \). The following analysis of the chi-square test results can be done, considering that, for most part of the points, the H0 hypothesis is accepted (green colour):

  • H0 is accepted; a good local congruence between laser measures and a second-order Taylor’s model is statistically proved. The values derived from vector \( \widehat{\beta } \) as the Gaussian and mean curvatures, are statistically meaningful, and thus, a curvature-based classification can be carried out in such zones.

  • H0 is rejected; the local congruence between laser measures and the Taylor’s model is not statistically fulfilled, i.e. a significant difference between the acquired laser data and the second-order polynomial modelling is present. For this reasons, the derived curvature values in such zones have to be interpreted with particular care.

In general, the values of \( \widehat{\sigma }_0^2 \) significantly differ from \( \sigma_{\text{ls}}^2 \) along the discontinuity lines of the scanned objects or along ridges or crest lines. This might be explainable as a not sufficient modelling of the Taylor’s order terms or as an improper choice of the bandwidth radius.

Computation of local curvature values

For the local shape analysis of laser point cloud, some fundamental quantities defined in differential geometry are considered. In particular, local Gaussian, mean and principal curvature values are taken into account. All these can be obtained from the so-called “Weingarten map” matrix A of the surface (e.g. Do Carmo, 1976), that is given by:

$$ A = - \left[ {\begin{array}{*{20}c} e & f \\ f & g \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} E & F \\ F & G \\ \end{array} } \right]^{- 1} $$
(8)

where E, F, and G are the coefficients of the so-called “first fundamental form”, computable from a s (s ≠ 0) parameters as:

$$ E = 1 + a_1^2 $$
$$ F = a_1 a_2 $$
$$ G = 1 + a_2^2 $$

and e, f and g are the “second fundamental form” coefficients:

$$ e = {{a_3 } \mathord{\left/ {\vphantom {{a_3 } {\sqrt {a_1^2 + 1 + a_2^2 } }}} \right. } {\sqrt {a_1^2 + 1 + a_2^2 } }} $$
$$ f = {{a_4 } \mathord{\left/ {\vphantom {{a_4 } {\sqrt {a_1^2 + 1 + a_2^2 } }}} \right. } {\sqrt {a_1^2 + 1 + a_2^2 } }} $$
$$ g = {{a_5 } \mathord{\left/ {\vphantom {{a_5 } {\sqrt {a_1^2 + 1 + a_2^2 } }}} \right. } {\sqrt {a_1^2 + 1 + a_2^2 } }} $$

The Gaussian curvature K corresponds to the determinant of A:

$$ K = \frac{{eg - f^2 }}{{EG - F^2 }} $$
(9)

The mean curvature H can be instead obtained from:

$$ H = \frac{eG - 2fF + gE}{{2\left( {EG - F^2 } \right)}} $$
(10)

The principal curvatures k max and k min, corresponding to the eigenvalues of A, are given instead from the solution of the system \( k^2 - 2Hk + K = 0 \), i.e. from \( k_{{\min, \max }} = H \pm \sqrt {H^2 - K} \). Further usable relationships for the curvature values are: K = k min k max and H = (k min+k max)/2.

Substituting the a s terms into formulas 9 and 10 (see e.g. Quek et al. 2003), the following expressions for the Gaussian K and the mean H curvatures can be obtained:

$$ K = \frac{{a_3 a_5 - a_4^2 }}{{\left( {a_1^2 + 1 + a_2^2 } \right)^2 }} $$
(11)
$$ H = \frac{{a_3 \left( {1 + a_2^2 } \right) + a_5 \left( {1 + a_1^2 } \right) - 2a_1 a_2 a_4 }}{{2\left( {a_1^2 + 1 + a_2^2 } \right)^{{{3 \mathord{\left/ {\vphantom {3 2}} \right. } 2}}} }} $$
(12)

Summarising, for each i-th laser point, four local curvature values K, H, k max and k min can be automatically obtained as functions of the vector \( \widehat{\beta } \) terms. Furthermore, such curvatures are invariant to the adopted reference frame, providing a very much important property in analysing the surface shape.

Figure4 shows the estimated K curvature values for the agpart-2 scan, coloured from blue (−31.66dm−2) to red (+12.19dm−2): values equal to zero correctly occur in the central part of the various unit surfaces (where chi-square test is fulfilled), while very high variations of the Gaussian curvature take place in buffer areas along the ridges separating the various surfaces.

Fig.4
figure 4

Points coloured by K estimated values (at left); values of K along the yellow section (at right)

Figure5 at left illustrates instead the estimated H curvature values coloured from blue (−7.67dm−1) to red (+6.94dm−1): H values are constant in the central part of the various unit surfaces. As can be seen in Fig.5 at right, along the yellow section, the following values of H have been computed:

  • +1.79dm−1 for the cave smaller cylinder

  • 0.00dm−1 for both planar surfaces

  • −0.45dm−1 for the two convex equal radius cylinders

  • −0.25dm−1 for the convex larger cylinder (disk)

Fig.5
figure 5

Points coloured by H estimated values (at left); values of H along the yellow section (at right)

For the cylindrical surfaces, where in fact Gaussian K values are correctly equal to zero, the minimum principal curvature k min corresponds to zero, while the maximum one k max = 2H, and so the relative curvature radius r min = 1/2H. The computed radius of the various cylinders are so respectively: 2.8cm, 11cm and 20cm, exactly their own correct values.

Significance analysis of the curvature values

For the statistical analysis of the estimated local Gaussian and mean curvature values, once the least squares solution of the differential terms is obtained by means of Eq.3, the variance–covariance matrix of the estimated parameters is also available. The variance–covariance propagation law can be applied to the estimated \( \widehat{\beta } \) terms to determine the [2 × 2] covariance matrix of the Gaussian and mean curvature values. For such end, let rewrite \( \widehat{\beta } = \left[ {\widehat{z}_0 \;\;\widehat{a}_1 \;\;\widehat{a}_2 \;\;\widehat{a}_3 \;\;\widehat{a}_4 \;\;\widehat{a}_5 } \right]^T \) as a partitioned estimated vector \( \widehat{\beta } = \left[ {\widehat{z}_0 \quad \widehat{a}} \right]^T \) containing the estimated function value \( \widehat{z}_0 \) and the sub vector \( \widehat{a} \) of the Taylor’s expansion differential terms at point i. Let \( \Sigma_{{\beta \beta }} \) be the estimated variance–covariance matrix of vector \( \widehat{\beta } \) terms; it can be yet partitioned as:

$$ \Sigma_{{\beta \beta }} = \left[ {\begin{array}{*{20}c} {\sigma_{{z_0 }}^2 } & {{\mathbf{\sigma }}_{{z_0 a}}^T } \\ {{\mathbf{\sigma }}_{{z_0 a}} } & {{\mathbf{\Sigma }}_{aa} } \\ \end{array} } \right] $$
(13)

where Σ aa is the variance–covariance matrix of the sub vector a containing the differential terms at point i. As known, the variance–covariance matrix \( \Sigma_{{\beta \beta }} \) can be expressed as:

$$ \begin{array}{*{20}c} {\sum\nolimits_{{\beta \beta }} { = \hat{\sigma }_0^2 N^{- 1} = \hat{\sigma }_0^2 \left[ {\begin{array}{*{20}c} {n_{{z_0 }} } & {n_{{z_0 a}}^T } \\ {n_{{z_0 a}} } & {N_{aa} } \\ \end{array} } \right]^{- 1} = \hat{\sigma }_0^2 Q_{{\beta \beta }} = } } \\ { = \hat{\sigma }_0^2 \left[ {\begin{array}{*{20}c} {q_{{z_0 }} } & {q_{{z_0 a}}^T } \\ {q_{{z_0 a}} } & {Q_{aa} } \\ \end{array} } \right]} \\ \end{array} $$
(14)

where \( Q_{{\beta \beta }} \) is the covariance matrix of vector \( \widehat{\beta } \), while \( \widehat{\sigma }_0^2 \) is given by the relationship 6.

Of course, the estimated Gaussian and mean curvature values are not independent, as can be seen observing Eqs.9 and 10 or 11 and 12. In order to apply a significance test taking in account also the correlation between the curvature values K and H, the following [2 × 1] vector is introduced:

$$ \omega = \left[ {K\quad H} \right]^T $$
(15)

Applying the variance–covariance propagation law, the covariance matrix of vector ω can be obtained as:

$$ Q_{{\omega \omega }} = F_{{\omega \omega }} Q_{aa} F_{{\omega \omega }}^T $$
(16)

where:

$$ F_{{\omega \omega }} = \left[ {\begin{array}{*{20}c} {\frac{{\partial K}}{{\partial a_1 }}} & {\frac{{\partial K}}{{\partial a_2 }}} & {\frac{{\partial K}}{{\partial a_3 }}} & {\frac{{\partial K}}{{\partial a_4 }}} & {\frac{{\partial K}}{{\partial a_5 }}} \\ {\frac{{\partial H}}{{\partial a_1 }}} & {\frac{{\partial H}}{{\partial a_2 }}} & {\frac{{\partial H}}{{\partial a_3 }}} & {\frac{{\partial H}}{{\partial a_4 }}} & {\frac{{\partial H}}{{\partial a_5 }}} \\ \end{array} } \right] $$

For the points where the null hypothesis of the chi-square test (7) is fulfilled (see Fig.6 at left), in order to verify whether the Gaussian and mean curvature vector ω is significantly different from zero, the alternative hypothesis of the following F ratio test must be satisfied (Pelzer 1971), with null hypothesis H0: E(ω) = 0 and alternative hypothesis H1: E(ω) ≠ 0:

$$ \frac{{\omega^T Q_{{\omega \omega }}^{- 1} \omega }}{{r\hat{\sigma }_0^2 }} > F_{{1 - \alpha, r,\infty }} $$
(17)

where:

  • r = rank (\( Q_{{\omega \omega }} = 2 \)),

  • \( F_{{1 - \alpha, r,\infty }} \) Fisher distribution value for r and degrees of freedom and α probability for a first-kind error.

Fig.6
figure 6

Points coloured by the results of the χ 2 test (at left) and the F ratio test (at right): green where H0, red where H1

The results of F ratio test obtained for agpart-2 are depicted in Fig.6 at right: having fixed a probability α equal to 0.05, the critical value of \( F_{{0.95,2,\infty }} \) equal to 3 is overcome for curved surfaces (red colour) and not for planar ones (green colour).

Significance analysis of the curvature-based classification

If E(ω) ≠ 0, it is worthwhile to independently test the values of K and H in order to check if both, or just one of them, are significantly different from zero. The null hypothesis is independently rejected for K and H, i.e. E(K) ≠ 0, E(H) ≠ 0, if:

$$ \frac{{K^2 }}{{\hat{\sigma }_0^2 q_{kk} }} > F_{{1 - {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. } 2},1,\infty }} $$
(18.a)
$$ \frac{{H^2 }}{{\hat{\sigma }_0^2 q_{hh} }} > F_{{1 - {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. } 2},1,\infty }} $$
(18.b)

where:

  • q kk and q hh are the diagonal terms of matrix \( Q_{{\omega \omega }} \),

  • \( F_{{1 - {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. } 2},1,\infty }} \) Fisher distribution value for 1 and degrees of freedom and α/2 probability for each of the two tests in order to satisfy a global first-kind error value equal to α (Bonferroni correction).

By simultaneously analysing the sign and the values of K and H, a statistically proven classification of the whole point cloud is finally made possible. In fact, as known, each surface can be classified as one of the following types (see Table1): hyperbolic (if K < 0), parabolic (K = 0 but H ≠ 0), planar (K = H = 0) and elliptic (K > 0).

Table1 Classification of surfaces according to the values of Gaussian K and mean H curvatures (from Haala et al. 2004)

When the null hypothesis H0: K = 0 is only satisfied, if H > 0, the single curvature surface can be classified as a concave parabolic valley while if H < 0 as a convex parabolic ridge. Finally, whether both null hypotheses are rejected, the surface is classifiable as a concave pit (if K > 0 and H > 0), as a convex peak (K > 0, H < 0), as a saddle valley (K < 0, H > 0) or as a saddle ridge (K < 0, H < 0).

Summarising, this step allows not only to classify the various volumetric primitives but also to a priori define the polynomial kind of the interpolating parametric model to successively apply for a refined segmentation of the points, as will be explained later.

Optimization of the Taylor’s expansion bandwidth size

Formulas18.a and 18.b are also useful to determine the minimal values of K and H that can be checked by the test, once a global significance level α is fixed:

$$ K > \hat{\sigma }_0 \sqrt {F_{{1 - {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. } 2},1,\infty }} \,q_{kk} } $$
(19.a)
$$ H > \hat{\sigma }_0 \sqrt {F_{{1 - {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. } 2},1,\infty }} \,q_{hh} } $$
(19.b)

Of course, K and H tend to diminish, i.e. the test becomes more sensible, as \( \widehat{\sigma }_0 \), q kk and q hh become smaller; that is if the precision of the laser measurements rises, the curvature values augments, and the number of selected points, within a prefixed bandwidth, becomes greater. This fact makes it possible to empirically optimise the Taylor’s expansion bandwidth size in order to evidence curvature values. For instance, if the geometric characteristics of the surveyed object are approximately known and rough curvature values K 0 and H 0 can be a priori defined, once the class of the instruments that are going to be used is fixed and the corresponding measurement precision \( \sigma_{\text{ls}}^2 \) is known, a simulation procedure may be thought in order to find the minimal number of the bandwidth points that satisfy the following inequalities:

$$ \frac{{K_0^2 }}{{\sigma_{\text{ls}}^2 \,F_{{1 - {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. } 2},1,\infty }} }} > q_{kk} $$
(20.a)
$$ \frac{{H_0^2 }}{{\sigma_{\text{ls}}^2 \,F_{{1 - {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. } 2},1,\infty }} }} > q_{hh} $$
(20.b)

The values q kk and q hh can be determined, for selected classes of bandwidth points, once approximate design parameters are fixed. Terms q kk and q hh correspond to the diagonal elements of matrix \( Q_{{\omega \omega }} \), computable from:

$$ Q_{{\omega \omega }} = F_{{\omega \omega }} Q_{aa} F_{{\omega \omega }}^T = F_{{\omega \omega }} \left( {N_{aa} - \frac{1}{{n_{{z_0 }} }}n_{{z_0 a}} n_{{z_0 a}}^T } \right)^{- 1} F_{{\omega \omega }}^T $$
(21)

In order to put in evidence particular K and H curvature values, the minimum bandwidth radius satisfying inequalities 20.a, 20.b and 7 will be chosen.

Estimating higher order terms of the Taylor’s expansion

As reported in the literature (e.g. Cazals and Pouget 2003), third- and fourth-order Taylor’s terms make possible to extract ridges and their properties. More precisely, ridges are curves along which one of the principal curvatures has an extremum along its curvature values. As ridges represent points characterised by an extremum value of the principal curvatures, their location requires estimating differential quantities up to the third order, and actually up to the fourth order to decide whether the extremum is a maximum or a minimum. Furthermore, ridges, being curves of extremal curvature, can furnish fundamental information for the laser point clouds segmentation, registration and matching procedures. As ridges are detected analysing the principal curvature values, it is necessary to adopt for each point, where the Taylor’s expansion is applied, a local reference system able to directly furnish principal curvature values and their directional derivatives. This coordinate system is the so-called “Monge frame”, where the terms a 0, a 1, a 2, a 3 and a 4 are equal to zero. The local Taylor’s expansion up to the fourth-order terms assumes the following expression:

$$ \begin{array}{*{20}c} {Z_j = \frac{1}{2}\left( {a_3 u^2 + a_5 v^2 } \right) + \frac{1}{6}\left( {b_0 u^3 + 3b_1 u^2 v + 3b_2 uv^2 + b_3 v^3 } \right) + } \\ { + \frac{1}{24}\left( {c_0 u^4 + 4c_1 u^3 v + 6c_2 u^2 v^2 + 4c_3 uv^3 + c_4 v^4 } \right) + \varepsilon_j } \\ \end{array} $$
(22)

where:

  • a 3, a 5 correspond, in the Monge frame, to the principal curvatures.

  • b 0, b 3 are the directional derivatives of a 3, a 5 along their respective curvature lines.

  • b 1, b 2 are the directional derivatives of a 3, a 5 along the other curvature lines.

Points having an extremum value for b 0 or b 3 automatically identify ridges. Specific algorithms to perform the curvature estimation of the differential terms in the Monge frame and to automatically extract ridges have been recently proposed in the literature (Cazals and Pouget 2007).

Analytical parametric modelling of the surface units

Within any kind of surface unit, classified as before thanks to the F ratio tests 18.a and 18.b, whose result is visible in Fig.7 at left, a region growing method is applied for a first raw cluster segmentation. Starting from a random point, not belonging to any recognised cluster, the surrounding points, having a distance less than the bandwidth b, are analysed by evaluating the values of the difference \( Z_{{0_i }} - Z_i \) and the values of K and H. If the neighbour points present difference values within a threshold, then they are labelled as belonging to the same class and put into a list. The same algorithm is repeated for each list element till this is fully completed. Afterwards, the procedure restarts again from a new random point, ending when every point has been analysed.

Fig.7
figure 7

Classification (at left) in planar (green) and cylindrical (red) units; initial raw cluster segmentation (at right)

A first raw segmentation of the whole dataset is so carried out (Fig.7 at right): each cluster represents an initial subset to submit to a refining segmentation. For this aim, we now suppose that laser measures can be rightfully represented by the following parametric Simultaneous AutoRegressive (SAR) model (Haining 1990):

$$ z - \rho Wz = A\theta + \varepsilon $$
(23)

where:

  • z is the vector of laser height/depth values, as for the non-parametric model (1).

  • ρ is a value that measures the mean spatial interaction between n neighbour points.

  • W is a spatial adjacency (binary) matrix, defined as w ij  = 1 if the points are neighbours, w ij  = 0 otherwise.

  • A is an r column matrix with rows as \( A_i = \left[ {\begin{array}{*{20}c} 1 & {X_i } & {Y_i } & {...} & {X_i^s } & {Y_i^s } \\ \end{array} } \right] \), where X i and Y i are X,Y-coordinates of points approximated by an s degree orthogonal polynomial.

  • \( \theta = \left[ {\begin{array}{*{20}c} {\theta_0 } & {\theta_1 } & {...} & {\theta_{r - 1} } \\ \end{array} } \right]^T \) is a [r × 1] vector of parameters.

  • ε is the vector of normally distributed noise, with mean 0 and variance \( \sigma_{\varepsilon }^2 \).

Discerning about the differences between the non-parametric model 5 and the parametric model 23 applied for processing the same laser points, we can observe that:

  • In model 5, the unknown parameters involve local differential terms (β) of a whatever (and not estimated) function, while in model 23, they correspond to the polynomial parameters (θ) of the best interpolating global analytical function.

  • In both cases, the coefficient matrix involves X and Y coordinates, expressed by relative values with respect to the local reference point for the non-parametric case, and by absolute values for the parametric one.

  • In both cases, the W weight matrices consider the distance among the laser points although with very different geometric and stochastic significance.

To solve Eq.23, a Maximum Likelihood (ML) estimation of the unknown parameters is carried out: in particular, the value ρ ML giving the maximum log-likelihood value is assumed as the ML estimation \( \widehat{\rho } \) of ρ. In this way, the optimal estimation of the SAR unknowns is given by (Pace et al. 1998):

$$ \hat{\theta } = \left( {A^T A} \right)^{- 1} A^T \left( {I - \hat{\rho }W} \right)z $$
(24.a)
$$ \hat{\sigma }^2 = n^{- 1} \left( {z - \hat{\rho }Wz - A\hat{\theta }} \right)^T \left( {z - \hat{\rho }Wz - A\hat{\theta }} \right) $$
(24.b)

Within the z values, the individual departures from the fitted polynomial trend surface can be estimated by the vector e = σ −1 ε of standardised residuals, computed from Eq.23 as:

$$ e = \hat{\sigma }^{- 1} \left[ {\left( {I - \hat{\rho }W} \right)z - A\hat{\theta }} \right] $$
(25)

Afterwards, the elements of vector e are inferentially evaluated to find which measures do not fit the estimated trend surface. The so-called “Forward Search” (FS) algorithm (e.g. Cerioli and Riani 2003) is applied. It makes possible the robust estimations \( \widehat{\rho } \) and \( \widehat{\theta } \) at each step of the search, starting from a partition of the dataset. The basic idea of the FS approach is to repeatedly fit the postulated model to subsets of increasing size, selecting for any new iteration the Z observations best fitting the previous subset, that is having the minimum absolute value of e. Thanks to this growing strategy, the outliers are potentially included only at the end of the FS process. To understand at which i iteration the outlier data enter into the subset, an F test is continuously applied to the weighted Mahalanobis distance of the difference vector \( \widehat{\theta }_i - \widehat{\theta }_{i - 1} \) (Crosilla et al. 2004). If the null hypothesis is rejected, any new point included from now on is an outlier: thus, there is no reason to go on with the FS iterations.

Therefore, subsequent to the estimation by formulas 24.a and 24.b of the analytical fitting function for each surface unit, the point segmentation is fulfilled. Figure8 shows, for agpart-2 surfaces, the FS parametric modelling of the different units and the resulting refined segmentation in the sequential order performed.

Fig.8
figure 8

Refined cluster segmentation by parametric modelling and Forward Search solution for each geometrical unit

Indirect segmentation of the surface units

The above parametric modelling makes also possible to find indirectly the analytical ridges separating the detected surfaces, that is the unit segmentation: in fact, the ridges can be estimated by means of 3D spatial intersections. This parametric approach makes thus possible to overcome the usual lack of data occurring, in principle, for every laser scan acquisition, in correspondence of the discontinuity lines.

In conclusion, starting from a raw segmentation based on the geometric values computed by a non-parametric model, the parametric modelling of each surface allows not only to refine the raw segmentation by an iterative point-enlargement process but also to fit the estimated analytical surfaces to the acquired points up to the detail of the analytical intersections between the surface units.

Direct segmentation of the surface units

As was mentioned before, Taylor’s expansion third- and fourth-order terms makes it possible to automatically determine surface curves presenting an extremum value of the local principal curvature directional derivatives. This means that these curves represent potential ridges of the surface units, and their determination allows to automatically proceed to a direct segmentation of the point cloud units. As written before, to directly estimate local principal curvature derivative values, it is necessary to express local point coordinates into a Monge basis. The estimation process is not simple since, according to Cazals and Pouget (2007), it requires a four-step algorithm:

  1. 1.

    First step performs a Principal Component Analysis (PCA) for each sampled point, relating to its surrounding ones. This analysis allows to determine three orthogonal eigenvectors and the associated eigenvalues. If the surface is well sampled, PCA provides one small and two large eigenvalues. The eigenvector associated to the small one approximates the normal vector.

  2. 2.

    At the second step, a change of coordinates is executed to move the original values into the new system, having as origin the point at which the estimation is performed. A polynomial fitting, extended to fourth-order terms, is then carried out.

  3. 3.

    Third step allows to determine the Monge basis by computation of the normal direction and by a diagonalization process of the Weingarten matrix.

  4. 4.

    Finally, the Monge coefficients are computed in the Monge frame by a new extended Taylor’s expansion.

Numerical experiments

Some more numerical experiments of the proposed procedure have been carried out for the agpart-2 model. First of all, a random error of ±1.5mm has been introduced to the original Z values, so simulating a true laser scanning acquisition characterised by a \( \sigma_{\text{ls}}^2 \) variance equal to 2.25mm2. It must be noticed that the entity of this error, although small, having in mind the laser scanning systems, is fully comparable with the grid steps (ΔX = 2.00mm and ΔY = 1.68mm) where the data are defined. Therefore, it can be considered a significant worsening of the original dataset. On the other hand, simulating a very high density laser acquisition, coherently, also the simulated measurement error has to be quite small.

Figure9 at left shows the estimated values of \( \widehat{\sigma }_0^2 \): the areas with variances different from zero are now larger with respect to the example without errors of Fig.3 at left. Figure9 at right depicts the quasi-constant value of about 2mm2 in correspondence of the central part of the various edges.

Fig.9
figure 9

Agpart-2 with \( \sigma_{\text{ls}}^2 = 2.25{\text{mm}}^2 \): overall \( \widehat{\sigma }_0^2 \) estimated values (at left) and along the yellow section (at right)

Although these higher variance factor values \( \widehat{\sigma }_0^2 \), the significance areas of non-parametric estimation, defined by the chi-square test, remain practically the same as that of the dataset without errors, as shown in Fig.10. Examining definition 7 of the chi-square test for α probability equal to 0.05 and the critical values \( \chi_{{\left( {p - 6} \right)_{{0.95}} }}^2 \) for different values of (p − 6) up to 300, one can notice that, for the H0 acceptance, \( \widehat{\sigma }_0^2 \) has to be less than \( 1,14\,\widehat{\sigma }_{\text{ls}}^2 \) ≅ 2.5mm2. This condition is practically fulfilled for most part of the dataset, even if characterised by noisy measurements. Summarising, these inferential results anyway numerically confirm the capability of the proposed method to correctly process noise laser datasets.

Fig.10
figure 10

Agpart-2 with \( \sigma_{\text{ls}}^2 = 2.25{\text{mm}}^2 \): results of the χ 2 test (at left, green H0) and along the yellow section (at right)

Also in this case, the estimated Gaussian K and mean H curvatures values present a wider variability but even around the theoretically correct values. In particular, as can be seen in Fig.11, the K curvature correctly assumes a quasi-zero value in correspondence of the central parts of the planar and cylindrical surfaces.

Fig.11
figure 11

Agpart-2 with \( \sigma_{\text{ls}}^2 = 2.25{\text{mm}}^2 \): overall K estimated values (at left) and along the yellow section (at right)

The addition of noise in the dataset seems to have a more significant effect on the values of the mean H curvature (see Fig.12), in particular for the central parts of the units rather than for the discontinuity edges.

Fig.12
figure 12

Agpart-2 with \( \sigma_{\text{ls}}^2 = 2.25{\text{mm}}^2 \): overall H estimated values (at left) and along the yellow section (at right)

A second numerical experiment has been carried out for the same agpart-2 model with a \( \sigma_{\text{ls}}^2 = 2.25{\text{mm}}^2 \) noise by considering now a smaller bandwidth b equal to 1.5cm. Of course, the number of the p points involved in the non-parametric estimations become smaller, about half, so reducing the statistical redundancy but, on the other end, the areas where \( \widehat{\sigma }_0^2 \) significantly differs from zero are reduced, as can be noticed in Fig.13 at left (compared with Fig.9 at left).

Fig.13
figure 13

Agpart-2 with \( \sigma_{\text{ls}}^2 = 2.25{\text{mm}}^2 \) and b = 1.5cm: estimated \( \widehat{\sigma }_0^2 \) values (at left) and χ 2 test results (at right): green H0

Figure13 at right reports the results of the chi-square test: the red areas, of not significant estimations, are reduced with respect to the ones obtained with a 2-cm bandwidth radius. In particular, for 30,105 points, the number of them for which a second-order Taylor’s expansion is adequate, grows from 20,846 corresponding to 69.2% (see Fig.10 at left) to 24,533 corresponding to 81.5% of points. Obviously, the percentage of fully automatic point classification depends on the geometrical complexity of the point cloud under exam: in general, the choice of a suitable reduced bandwidth b, but anyway allowing a satisfactory redundancy, makes possible reliable non-parametric estimations. The minimal value of b satisfying these properties depends from the curvature values, the data density and also from the data accuracy. The strategy to find the optimal value of b could be to decrease it until the chi-square test fails, rejecting points belonging to surfaces already recognised (classified) with a larger bandwidth b. In fact, for real noisy dataset, when a too-small bandwidth radius is applied, the residual v grows, and the chi-square test fails. For instance, a value of b equal to 1cm for agpart-2 models furnishes worse results than those obtained with a 1.5-cm radius.

Concluding, most part of a laser point cloud can be directly classified by means of the proposed curvature based procedure: for the unclassified points where the non-parametric modelling results are not correct, the automatic classification is anyway accomplished by a robust parametric modelling.

The proposed procedure has been experimented also for the column1 model of the OSU Range Image database: this is composed of 27,825 points simulating a cylindrical column over a parallelepiped base, upper closed by a circular plane: the scan column1–5 simulates a pointing-down laser acquisition from a scanning position so that the vertex among three planes of the base occurs at right (see Figs.14 and 15).

Fig.14
figure 14

Experiments for the column1–5 model: \( \widehat{\sigma }_0^2 \) values (at left) and χ 2 test results (at right): green H0

Fig.15
figure 15

Estimated curvature values for column1–5 model: Gaussian K values (at left) and mean H values (at right)

Figure14 at left shows the estimated local values of \( \widehat{\sigma }_0^2 \), coloured again from blue (0 values) to red (maximum values): as expectable, most part of the plane and cylindrical surfaces present null variance factor, while along the edges such values dramatically increase. In spite of this, \( \widehat{\sigma }_0^2 \) is not null also along some surface borders, but the strong irregularity of the dataset in such areas must be stressed. The result of the chi-square test 7 is reported in Fig.14 at right: the red areas, where the test fails, should be not submitted to the successive F ratio test 17 and to the curvature-based classification process.

Figure15 shows the estimated local curvature values. Since single curvature surfaces are only present, the value of the Gaussian K curvature should be always null, i.e. E(K) = 0. This condition is represented by light blue coloured points in Fig.15 at left. The mean H curvature values are instead correctly less than zero for the points belonging to the cylindrical column, as can be seen in Fig.15 at right, where such H negative value is represented by a dark yellow colour.

Conclusions

The paper proposes a procedure based on a statistical analysis able to automatically detect reliable Gaussian and mean curvature values for laser point clouds, computed by applying a local surface non-parametric Taylor’s expansion. First, the fulfilment of the analytical model applied is verified by a chi-square test comparison of the a priori and a posteriori variance factors. A second test considers the variance–covariance propagation law applied to the estimated Taylor’s terms, in order to compute the covariance matrix of the Gaussian and mean curvature values. If the null hypothesis of the applied F test is rejected, at least one curvature value is significantly different from zero, and the sign analysis allows to correctly classify the geometrical shape of each object surface unit. A parametric modelling method of the surface units is finally presented. The carried-out numerical experiments confirm the capabilities of the proposed method.

This research has been partially presented by the authors into the paper “A statistically proven automatic curvature based classification procedure of laser points” at the XXIst ISPRS Congress in Beijing, 2008, International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, XXXVII, B5:469–475 (on DVD).