Multistage strategy for ground point filtering on large-scale datasets

Teijeiro Paredes, Diego; Amor López, Margarita; Buján, Sandra; Richter, Rico; Döllner, Jürgen

doi:10.1007/s11227-024-06406-0

Multistage strategy for ground point filtering on large-scale datasets

Open access
Published: 17 August 2024

Volume 80, pages 25974–26001, (2024)
Cite this article

Download PDF

You have full access to this open access article

The Journal of Supercomputing Aims and scope Submit manuscript

Multistage strategy for ground point filtering on large-scale datasets

Download PDF

Diego Teijeiro Paredes¹,
Margarita Amor López¹,
Sandra Buján²,
Rico Richter³ &
…
Jürgen Döllner³

401 Accesses
1 Altmetric
Explore all metrics

Abstract

Ground point filtering on national-level datasets is a challenge due to the presence of multiple types of landscapes. This limitation does not simply affect to individual users, but it is in particular relevant for those national institutions in charge of providing national-level Light Detection and Ranging (LiDAR) point clouds. Each type of landscape is typically better filtered by different filtering algorithms or parameters; therefore, in order to get the best quality classification, the LiDAR point cloud should be divided by the landscape before running the filtering algorithms. Despite the fact that the manual segmentation and identification of the landscapes can be very time intensive, only few studies have addressed this issue. In this work, we present a multistage approach to automate the identification of the type of landscape using several metrics extracted from the LiDAR point cloud, matching the best filtering algorithms in each type of landscape. An additional contribution is presented, a parallel implementation for distributed memory systems, using Apache Spark, that can achieve up to $34\times$ of speedup using 12 compute nodes.

RETRACTED ARTICLE: Flexible big data approach for geospatial analysis

Article 08 February 2021

Generalized 3D fragmentation index derived from lidar point clouds

Article Open access 20 April 2017

A Review of Software Solutions to Process Ground-based Point Clouds in Forest Applications

Article Open access 17 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

At the present time, the transversal role of ALS (Airborne Laser Scanning) point clouds is unquestionable in large-scale geospatial applications such as agroforestry, archeology, 3D urban characterization or landslide recognition [1,2,3,4]. The transversality of LiDAR (Light Detection and Ranging) data has been made possible thanks to the rapid development of 3D acquisition techniques and data processing software (specialized tools for extraction, manipulation and analysis of 3D data) [5, 6]. These technological developments together with the multi-purpose character of ALS point clouds have resulted in many geographical national agencies collecting and delivering large size 3D point cloud datasets covering large areas. In Spain, the IGN (Instituto Geográfico Nacional) has gone from providing LiDAR data with a nominal point density of 0.5−2 points/m$^2$ (second LiDAR cover, 2016–2021) to delivering data with a nominal point density of 5 points/m$^2$ (third LiDAR cover, 2022–2025), although, depending upon the region, higher densities may be available (14 points/m$^2$ in the region of Navarra).

A major step in the use of LiDAR data is per-point classification to better address the needs of the applications based on it, especially the differentiation of ground and non-ground points. By point classification, we refer to the process of assigning a class to each point from a wider set of possible classes, not limited to ground or non-ground. This differentiation is a critical step in multiple applications since it allows to clearly identify the ground level for further applications [7, 8].

The task of point classification has been around for years, and there has been large amounts of work done in improving the existing proposals [9], either to increase accuracy, defined as the ratio of points correctly classified in several reference or benchmark datasets; or performance, defined as time required to classify a set. Improving the precision of the classifiers increases the quality of the results, which can have benefits in further processes. Increasing the performance allows to finish the classification task faster, or classify larger point clouds in reasonable amounts of time.

As a result of the longevity of this process, there is a plethora of different approaches to point cloud classification in the literature [10]. All of those proposals can be categorized in different ways, according to different features. We can differentiate four types of classifiers: geometric classifiers, traditional machine learning classifiers, modern machine learning classifiers, and hybrid classifiers that use geometric and machine learning together.

Geometric classifiers use purely heuristic algorithmic approaches to perform the classification of the points, using the data contained in the point cloud or derived from it. Multiple analyses can be performed on the geometric data in the point cloud, such as local point density, slope, elevation, normal vectors or coplanarity [11]. A critical step in most of these classifiers is segmentation, a process in which the point cloud is divided into multiple homogeneous regions, and there are multiple techniques for it, from edge detection [12] to region growing [13], model fitting techniques and unsupervised clustering [14].

The second type uses machine learning techniques from other fields, like image processing. The typical examples are creating images and then applying image segmentation techniques followed by assigning the class from each pixel to all of the points that the pixel is representing. There are multiple examples of these classifiers, using several approaches like support vector machines (SVM) [15], Markov Random Fields (MRF) [16] and Conditional Random Fields (CRF) [17]. There are also some established approaches, AdaBoost-based proposals are common [18], and CapsNet deserves a special mention since it is the basis of several other approaches [19].

The third type of classifiers fully exploit the recent advancement in machine learning using new techniques and exploiting the point data directly, instead of converting it to other types of data, a task that was very challenging due to the size of the data. However, the surge of deep learning opened the way to exploit this path, with competitive results compared to other proposals. One of the most important classifiers of this type is PointNet [20] and its derivatives such as PointNet++ [21], due to being one of the first to use point data directly in a neural network, and being the base for multiple improvements and extensions, becoming a sort of reference when benchmarking new proposals. Multiple proposals in this category exploit the convolution operation, some generating multiple images from different viewpoints to apply a more traditional convolution layer, for example, SnapNet [22], or using a voxelized version of the data like VV-NET [23]. Some works focus their efforts on the application of machine learning for geospatial point clouds to build geospatial digital twins at several scales, from indoor models to virtual 3D city models [24].

The last category encompasses approaches that combine machine learning with geometric approaches, and as such it can cover vastly different proposals, but there are fewer examples. Some proposals are more similar to geometric approaches, for example, using traditional region growing to segment the point cloud, then using a SVM to classify each region [25]. Other proposals use more complex steps, for example, combining Morphological Profiles and Convolutional Neural Networks [26].

Ground point filtering, the differentiation of ground and non-ground points, has specific characteristics that can be exploited to improve the results. As a result, ground point filtering has dedicated research as well as algorithms, usually in the geometric category. In this field, hybrid methods usually refer to those methods that combine other ones in some way, with no regard to the type of each method combined. In the literature, there are multiple classifiers based on morphological filters, such as the Simple Morphological Filter (SMRF) [27] used by the Point Data Abstraction Library (PDAL), the displacement segmentation filter [28] or the LIDAR2MDTPlus [29] algorithm. Other approaches are based on the surface method such as the Cloth Simulation Filter (CSF) [30], expectation maximization [31], Progressive Triangulated Irregular Network [32] used by Lastools or the combination of multiple methods [33]. A review of the updated ground filters can be found at [34].

Another challenge in this process is that different classifiers are more suited for some types of landscape and data sources. In [35] an in-depth study of several ground point filtering algorithms is shown on data from airborne LiDAR as well as UAV (Unmanned Aerial Vehicle) photogrammetry-based point clouds, including some of the ones used in this work, while another study on the impact of different characteristics is shown on the performance of different classifiers in [36]. Most studies into point classification are focused on one type of landscape, with urban landscapes being the most frequent. This responds to the fact that urban landscapes present more types of non-ground objects (buildings, vehicles, low and high vegetation, or road signaling) in higher concentrations in the same area, they are a more challenging landscape with more potential applications of the filtered point cloud.

On a larger scale, county-, province-, state- or even nation-wide, multiple types of landscape are present. These massive point clouds present additional challenges in the ground point filtering task due to the sheer size of the data (excessively long execution times, around 11 days for the complete dataset used in this article) and due to the heterogeneity of the different landscapes. Using only one classifier is not the best for every case. In some landscapes, like mountainous regions with little vegetation, most classifiers offer worse precision in the results compared to other landscapes, and different classifiers have wildly different results.

Our proposal consists of a multistage approach for ground filtering, using the geometric data contained in the point cloud to identify the type of landscape present in each area. The classifier and configuration to use for each area is selected according to the type of landscape identified in the area.

The sizes that modern datasets can reach are large enough, with 5 Terabytes (TB) of compressed data and over 130 billion points for the complete dataset used in this work, to exceed the computational capabilities of current computers. Some distributed computing proposals have been presented to exploit the computational power of distributed memory systems for point classification and other tasks [37,38,39]. Our proposal can make use of these distributed memory systems to increase performance, being able to parallelize the execution of the same stage for every area in the dataset. This is achieved through the use of the Apache Spark framework running on a local cluster.

This article details our proposal followed by an analysis of the results obtained. Section 2 introduces the dataset used and the specific areas of study. Section 3 describes in detail each of the stages of the proposal. Section 4 details the distributed memory implementation using Spark. Section 5 shows the analysis of the results obtained from different points of view and Sect. 6 closes the paper with some conclusions and future work.

2 Data sets

Considering that this research is conducted at national-level, which seeks to maximize the use of the compute resources, the PNOA LiDAR data over the region of Navarra [40] is used as benchmark and validation sets of the proposed methods (National Plan for Aerial Orography, hereafter referred to as PNOA) (see Fig. 1). One of the most distinctive characteristics of these point clouds is the point density, 14 points/m$^2$, while the point density is less than 4 points/m$^2$ in the remaining regions of Spain.

LiDAR data used in this study were collected using a Single Photon LiDAR sensor (SPL100) mounted on the Beechcraft B200 King Air aircraft between September and November 2017 and covers the region of Navarra (Spain). The point clouds correspond to the second round of nationwide ALS measurements, which are publicly available in Spain through PNOA. Square LiDAR blocks (files) of 1 km side, with a nominal point density of 14 points/m$^2$ were obtained from [40]. Each file (LASzip files, compressed version of LAS files, version 1.4 [41]) contains the points located in the 1 km$^2$ area it covers. We will refer to these files as tiles.

The point cloud in each tile includes, in addition to the usual point coordinates, RGB (Red, Green and Blue color channel data) and NIR (Near-InfraRed) attributes for each point, derived from a fast orthoimage collected using an RCD30 camera on a joint flight with the LiDAR sensor. Although this orthoimage may be displaced several meters in relation to the ALS point clouds, this limitation will not negatively affect the identification of the landscape. Furthermore, the data are already classified using automatic classification with machine learning processing and a subsequent process adjustment before being made public as a product of the PNOA project.

Finally, the data used in this research are stored in 107 tiles, each one covering a 1 km $\times$ 1 km area, for a total of 75 Gigabytes (GB) of data containing over 2 billion points. Forty tiles, 10 for each type of landscape (agriculture, forest, urban, and mountain), were used during the development of the lanscape classification process (benchmark set), while another 67 tiles (validation set) were used to evaluate the process developed with the benchmark set. The type of landscape of each tile has been manually identified. Table 1 includes the characteristics of benchmark and validation data sets, and Fig. 1 shows the location of tiles in the region of Navarra.

Table 1 Characteristics of the set of tiles used as benchmark and validation for each type of landscape

Multistage strategy for ground point filtering on large-scale datasets

Abstract

Similar content being viewed by others

RETRACTED ARTICLE: Flexible big data approach for geospatial analysis

Generalized 3D fragmentation index derived from lidar point clouds

A Review of Software Solutions to Process Ground-based Point Clouds in Forest Applications

Explore related subjects

1 Introduction

2 Data sets

3 Overall structure

3.1 Metric computation

3.2 Landscape identification

3.3 Ground point filtering and assessment

4 Distributed computing execution

5 Performance analysis and results

5.1 Landscape identification

5.2 Classification results

5.3 Distributed computing performance

6 Conclusions and future work

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation