Introduction

In the preservation of architectural heritage, structural health is assessed by measuring and monitoring deformed structural elements, cracks and fissures. Mapping these features is still often performed using a direct sketch based on visual observations, while the openings are measured using traditionally contact tools. The tool’s applications depend on accessibility and provide only discrete point measurements, instead of a continuous record of the damage dimension throughout the area concerned [1, 2]. Recent developments in the field of photogrammetry and LiDAR have provided an opportunity for cost-effective, reliable detection and measurement of structure details [3].

The digital camera in photogrammetry produces photographs that are obtained with adequate overlap. With its automatic matching processes based on structure from motion, the importance of the image-based approach is becoming a quite common in the 3D heritage community [4, 5]. Their products are primarily useful for navigating objects. Nevertheless, the image-based approach may have limitations where large and complex object need to be obtained with precise and high-resolution modeling [6]. For such practical projects, reliability of the optical active sensor (Laser scanner) workflow is still much higher, although time-consuming and costly. The popularity of laser scanner in the documentation of heritage sites results from its ability to acquire millions of 3D points accurately and within a short time. The scanner technology has advantages related to the high level of automation in data measurement including the geometry and texture information [7]. Generally, when using LiDAR information for any potential application, there are primary challenges. The data set consists of millions of points with attributes of geometry and radiometry. The generated points are a collection of discrete records of data and do not have semantic information. The LiDAR system also acquires noise information in the scene and outdoor applications [8]. Processing such big datasets is difficult for different applications where a high-level of abstraction is needed such as object recognition, facade classification and feature extraction. 3D models need to be segmented semantically for such implementations to allocate, visualize, manage and handled those features. Modeling these features can further reduce the data volume, allowing for further analysis [9].

This study aimed at evaluating the feasibility of combined images and LiDAR data for façade features detection and measurement. In particular the 3D representation of the crack propagation and geometrical formation. The approach acquires LiDAR data and 2D images independently, the images collected at optimal position and time for capturing the surface details. The transformation of 3D LiDAR point clouds to 2D structured depth images enables the implementation of existing computer vision algorithms developed for 2D color images. The depth map will be produced for every 2D image, resulting in an additional D-channel to the color (RGB) image channels. The algorithm is implemented on experimental data collected from the Treasury of Petra Ancient city in Jordan.

Briefly, the following key contributions were provided by this paper:

  • Novel image-based feature extraction approach from point cloud, robustly defined linear surface features, and significantly reduces the amount of data that can be viewed and fluently interact in 3D.

  • The approach provides better interpretation of spatial growth of weathering forms and severe façade cracks.

  • The suggested solution is flexible as it acquires point clouds and images separately, allowing the high visual quality of the scene features to be optimized in time and place.

  • The proposed algorithm simplified data manipulation and primitive extraction since the processing of 2D depth and color images requires minimal computational power.

The paper is structured according to the following: a brief overview of relevant work is presented in “Related works” section. The data collection, pre-processing and problem statement are discussed in “Data collection and pre-processing” section. The methodology of the suggested solution is discussed in “Methodological work” section. The experimental results are discussed in “Discussion” section. “Conclusion” section addresses the conclusions and future work.

Related works

Prior works on LiDAR segmenting based primarily on the geometric characteristics provided with the point cloud data such as local surface normal’s or curvature. Identifying local attributes depends on whether point cloud is defined as a structured depth image or a non-structured set point. The massive amount of unstructured 3D-point cloud and the high cost of neighborhood identification pose a challenge to the segmentation process [10]. The depth image is equivalent to the point cloud, but pixels encode distance or depth co-ordinates. Working with depth images makes neighborhood search solutions simpler to manipulate which can significantly reduce the computation complexity [11].

In general, 3D point segmentation approaches in the literatures are mainly categorized as region-growing, model-based, edge detection, and image-based approaches. The selection of suitable methods of segmentation relies on the type of image and applications [12]. Region growing is most common because it has a high-level understanding of the image component. In practice, two issues relate to the region-growing technique: the first issue is seed selection; the seed point is the reference for expanding the regions, their selection is very critical for segmentation success. The second is the selection of similarity criteria, where a fixed formula is required to contain the neighboring growth pixels and the suitable constraints to prevent the growth process [13]. Habib and Lin [14] introduced a region-growing method that uses the kd-tree data structure to distinguish point and pole-like surface features in point cloud. The segmentation method of Dimitrov and Golparvar-Fard [15] used a multi scale scheme to distinguish the features; curvatures are estimated at each 3D point followed by seed selection and regional growth processes. Che and Olsen [16] proposed multi-scale technique; the Normal Variation Analysis procedure is used in the first step to identify edge points, to achieve better segmentation performance, the points are then clustered on a smooth surface using a region-growing process. Vo et al. [17] suggested cloud point segmentation using adaptive growing Octree area. The algorithm has two steps based on the coarse to fine process method. First the major segments are extracted using octree-based voxelized representation. Then the results passed through refinement process. The challenges will appear in case of 3D complex scene which has irregular sampling point and different object types. Although the outcome of region growing techniques is reasonable, they have a common problem with over-segmentation, complexity and expensive computation. More challenges are the choice of the initial seed points and the number of object surfaces [18].

Model-based approaches, however, use well-known basic geometrics for grouping purposes, such as planes, sphere and other pre-defined shapes. The points with the same numerical representation are grouped together as one segment. Both the Random Sample Consensus and Hough Transforms (HT) are two widely used algorithms. RANSAC selects minimum arbitrarily and iteratively sets to identify the best parameters to suit the candidate’s mathematical model. The parameters will then be tested to determine the best fit model agreement. A modified RANSAC segmentation algorithm has been proposed by Li et al. [19]. The 3D Hough transformation is an expansion of the well-known (2D) Hough transformation used to distinguish image lines. The principle depends on the transformation of the 3d points into the parameter space before the detection of planes or cylinders and primitive spheres [20]. The main limitation of these approaches is in the structural context of complex shapes, details cannot always be modeled into easily recognizable geometric forms [21].

Comparatively to the region growing method in the literature, a few works present edge detection techniques in 3D point clouds, Lin et al. [22] developed strategy of LiDAR segmentation applied directly on the acquired point clouds. The point clouds are first split into facets by sorting the local k-means into carefully selected seeds. The extracted facets provide sufficient information to determine the linear characteristics of the local planar region. Huang and Brenner [23] used the curvature values for extracting the borders in the irregular mesh, for complex shape objects, multi level scheme is proposed to enhance the results. Many other algorithms in edge detection converted the LiDAR point clouds into depth image representation in order to structure the point cloud. The pixels in the image denotes the depth values of the object from the LiDAR, the values are stored in the pixels as a real number values. Generally, depth image edge detection techniques have three basic primitive edge types, step, crease, and roof edges. Step edges are corresponding to in-depth discontinuities, crease edges are congruent with normal surface discontinuities, where the roof edges are characterized by the discontinuity of curvature. Miyazaki et al. [24] proposed a line based approach to extract planar regions from an anisotropic distribution points. The approach splits the cloud input point into scanning lines, and then the algorithm selects segments that best represent the point sequence of each scanning line. Generally, most of the edge work algorithms are constrained to images of high quality. Others are complex with numerous parameters and cannot guarantee closed boundaries [25].

Many approaches turn to use image color information provided by the scanner for LiDAR data segmentation. The image-based approaches have two key benefits. First, computational 2D image processing is often more effective than processing objects in 3D. Second, significant imaging techniques such as edge detection can be applied in the 2D color images before projection back to TLS data [26, 27]. Awadallah et al. [28] used high satellite images as a solution to segment and extract surfaces from sparse noisy point clouds. Awrangjeb et al. [29] used multispectral orthoimagery information for automatic extraction of different roof surfaces form the LiDAR. Zhan et al. [30] presented algorithm for façade details segmentation using the colorimetrical similarity with the spatial information, the main challenges are problems in measurement noise, scaling issues, and coordinate system definition. Nex and Rinaudo [31] proposed segmentation and feature extraction approach using LiDAR data and muti-image matching. The approach starts using reference image acquired from the same position of LiDAR in order to avoid occlusion problem. After the edge extracted by Canny operator, the dominates points of each image are projected onto LiDAR data, then they were back project to each collected images. Then muti-image matching algorithm using SIFT operator is used to reconstruct the edges again in 3D space. Mahmoudabadi et al. [32] apply image segmentation algorithm to different input layers including the color and intensity data. Such output segments are projected back to the point cloud to make the modeling more effective. Dinç et al. [33] proposed segmentation methods for RGB image and depth image based on Kinect. Because the kinect camera can capture images of color and depth at the same time, normally expect that they are overlapped perfectly without registration. Most of the previous methods used images already registered in the point cloud. Despite the fact that most TLS devices have an internal camera, ideal photographic image conditions may not match with LiDAR’s location. In addition, in the real outdoor scene, photographic images suffer from variations in light. This leads to image disturbance that affects the results of segmentation. Moreover, the RGBD sensors are widely used in indoor environments as they are constrained by their close range. The segmentation of the plane is the main task of RGBD data [34, 35].

Data collection and pre-processing

Petra treasury

Data that has been used for our investigations were collected from the ancient city of Petra of Jordan. Petra city was the Nabataean empire capital between 400 B.C and A.D 106. The Nabateans were originally a tribe of Arabian nomads who settled in the Shara mountains, at a crossroads of trade routes. Petra city has many fascinating monumental of the ancient world with an outstanding quality of the architecture. Petra’s temples, theaters, tombs, and other monuments extend through 45 km2. The architectures were carved into rose-colored sandstone cliffs. The ancient Petra city is a UNESCO World Heritage Site since 1986. It was selected as one of the New Seven Wonders of the World in 2007. However, throughout recent years, the majority of Petra structures have degraded at a rapid pace. The World Monument Fund therefore placed Petra on the list of the one hundred most endangered monument assemblies in the world. The city is suffering from weathering and erosion problems. Problems arise from the monuments’ highly porous inorganic materials, and their uncontrolled environment. This weathering refers to potential salt damage as the principal factor on the monument’s stone structure [36]. The annual rainfall of Petra is very low but comes within a very short period of time. Water erosion therefore is a very active agent in this area. Hydrological structures in the city indicate the Nabateans were aware of the erosion problem. They established ceramic pipes along the basement and monumental faces to protect them from floodwaters. Moreover, horizontal surfaces were filled with several morter strata to minimize the impact of running water [37]. Unfortunately, the Nabatean water system at the site is now the major cause of water erosion. For most of the sites in ancient Petra there was no effective weathering monitoring. This is because of the lack of referential archival information on the weather damage in the past [38].

After a traveler enters Petra through Al-Siq, an impressive 2-km crack in the mountain, the first façade to be seen is the Treasury, depicted in Fig. 1. The Treasury or “Al-Khaznehas”, as it is commonly called, is the most recognized monument of Petra, its original purpose is unknown and the name Al-Khasneh, as it is called by the Arabs, means the tax house. Others have mentioned it was probably a tomb for one of the Nabataean Kings. The treasury façade is remarkably well preserved with 40 m high. Its location in confined space between the mountains has protected the treasury from weathering and erosion problems. The façade of Treasury has different classical elements including pediment, columns with Corinthian capitals, Friezes, and entablature. It consists of one primary chamber and three antechambers all have been carved out of the rock. Recent environmental monitoring of Al-Khazneh indicates serious recession and weathering processes of the chamber walls as depicted in Fig. 1; this is due to many weathering forms including interior relative humidity, salt content, large numbers of visitors, and insufficient care and lack management and conservation. A large-scale survey study is needed to provide a comprehensive data analysis of monument status monitoring and assessment [39].

Fig. 1
figure 1

Treasury monument of Petra city, the second image illustrates the chamber wall recessions

Field investigation

For our investigation, terrestrial Laser Scanner Mensi GS100 has been used to collect the surface point cloud. The scanner distance measurement range between 2 and 100 m with an accuracy of 3 mm for 50 m distance range. The Time of Flight scanner has an acquisition speed of 5000 points per second. The scanner has a calibrated camera with 768 × 576 pixel resolution, used for mapping the color to the corresponding point measurements. Since such complex 3D structures cannot be completely covered from one single station without occlusions, different points of view are needed. The problem of choosing the viewpoint positions for such a monument represents an important phase of the survey as the mountainous environment surrounding Al-Khasneh limits possible sensor stations. Altogether, the five scans resulted in nearly 5 million points obtained. In addition to the external survey, a 360° scan had been taken from a station inside the Al-Khasneh, resulting in 19 million points. The scans collected contain enough overlapping regions to allow subsequent registration. A non-redundant surface representation is constructed after registration, in which each part of the measured object is identified only once. The scanned data acquired for the Treasury monument facade and the left chamber of the monument is depicted in Figs. 2 and 3. The model has 3.3 million points with a 2 mm ground resolution. A Fuji S1 Pro camera has collected additional close-up images, which provide a resolution of 1536 × 2034 pixels with a focal length of 20 mm.

Fig. 2
figure 2

3D point cloud and meshed model of the treasury facade

Fig. 3
figure 3

3D point cloud and meshed model of the left Treasury chamber

Problem statement

Although a large number of surface points and triangles are identified in the 3D model created by the laser scanner, outlines such as edges and cracks are lost beyond the resolution of the available laser information. Modeling these features can further reduce data sizes, allowing for advanced analysis of simplified models instead of bulkier point clouds. While many current cloud segmentation approaches have been shown to effectively segment TLS data, complex real-scene implementations still have significant shortcomings and challenges. Existing methods of segmentation require curvature and normal estimation before data analysis and grouping. Despite a number of solutions to adaptive neighborhood description, curvatures approximation on edges or rough surfaces such as the historic building can still be unreliable [40]. In order to clarify this problem in our data, we apply a mean curvature segmentation algorithm proposed by Alshawabkeh et al. [41]. The proposed algorithm efficiently estimate the mean curvature value at each sampled pixel using convolution distinct sizes of windows running across the image in only one direction. The algorithm classifies the edge points based on selected threshold values of the mean curvature. Using multiple-scale masks allows for reliability in estimating curvature values in the presence of noise problems, particularly in real scene environments. The findings of the experiment are shown in Fig. 4. Various mask sizes and threshold values are used, but the small surface features are still missing, the clear edges are only detected.

Fig. 4
figure 4

Depth image segmentation with different mean curvature threshold values, small surface features such as cracks and roughness are not identified correctly

Theoretically, photographs have high resolution radiometric information and provide better interpretation and extraction of linear features than range data. The use of multiple data sources makes it possible to create different levels of detail. Integrating data into the processing chain at an earlier stage would help make information extraction more efficient. LiDAR device has already built-in camera, but ideal image conditions may not match with LiDAR’s location. Thus, such images frequently may not sufficient for detecting the surface details and surface weathering decay that is required for heritage applications. In fact, large time intervals between the scans will lead to different light conditions and shadows in outdoor activities leads to poor homogeneity and color jump. These problems can interfere with the resulting image-based segmentation approaches, which depend mainly on image quality. Tend to have high-quality pictures that contain important details will help the process. Images can be collected at different times and positions from laser scanning in order to have the best coloring required for the segmentation process.

Methodological work

The paper presents a new approach to cloud-based image feature extraction. The technique begins with the use of various images to capture surface features at the optimum time and location. The unstructured 3D point cloud is then converted into a series of 2D depth maps. The depth map is a discrete grid structure of the same pixel size as the RGB images. For this purpose, Camera configuration (translation and rotation) parameters are defined between the origin of the camera and the origin of the scanner. The removal of model occlusions is mainly due to the fact that images are not taken from the same perspective. Visibility detection is used in our method to ensure that only visible points are projected into the produced depth grid. Lastly, after the edge extracted in the color image by the Canny operator, each segment’s points are mapped back to the LiDAR data to get their 3d coordinates.

Data structuring

It is necessary to structure the irregular LiDAR data and define the relationship between point clouds to efficiently process and compute large 3D point cloud segmentation. LiDAR sensors provide raw data based on the number of rows and columns and the laser sensor’s horizontal resolution. Their depth component’s spatial resolution is considerably lower than that of the RGB component. While LiDAR sensors are not achieving cameras resolution. The solution of projecting RGB image into 3D data will be down-sampling operation, as it means you’re missing details. Alternatively, in our approach depth values are mapping to RGB image since there is a need for a high correlation between depth image and color image to get good geometric estimation in edges and smooth regions. The proposed algorithm uses interpolation between pixel points to up-sample the lower resolution depth image.

The two data sources should refer to the same spatial position in order to generate the depth map in the given RGB view. It is possible to achieve co-alignment parameters using some corresponding tie point in the overlap region. The relative transformation between the camera’s projection center and the scanner center consists of parameters for translation and rotation (Tx, Ty, Tz, ω ɸ ƙ). These parameters describe the relationship between the coordinate systems of the ground and image, as shown in Fig. 5. Registration usually performed in two methods; a pair-wise manual or automated pair-wise. The automated methods are usually carried out using feature-based approaches [42]. But registration is still a manually driven process in realistic projects involving large and complex datasets. Throughout our method, homologous points are manually defined between both the 3D data and the 2D image.

Fig. 5
figure 5

Pose estimation problem

The camera’s configuration parameters are used to translate point clouds to a structured depth image using central perspective transformation. The equations of co-linearity (Eq. 1) are used in our approach to project LiDAR points into ground plan grids; the depth grid has the same RGB pixel size. Parameters included in the equations are: image coordinates: \(x_{a }\), \(y_{a}\) object coordinates: \(X_{A}\), \(Y_{A}\), \(Z_{A}\) exterior orientation parameters: \(X_{o}\), \(Y_{o}\), \(Z_{o}\), r11 − r33, interior orientation parameters: \(x_{p}\), \(y_{p }\), \(c\).

$$x_{a} = x_{p} - c\frac{{r_{11} \left( {X_{A} - X_{o} } \right) + r_{21} \left( {Y_{A} - Y_{o} } \right) + r_{31} \left( {Z_{A} - Z_{o} } \right)}}{{r_{13} \left( {X_{A} - X_{o} } \right) + r_{23} \left( {Y_{A} - Y_{o} } \right) + r_{33} \left( {Z_{A} - Z_{o} } \right)}}$$
(1)
$$y_{a} = y_{p} - c\frac{{r_{12} \left( {X_{A} - X_{o} } \right) + r_{22} \left( {Y_{A} - Y_{o} } \right) + r_{32} \left( {Z_{A} - Z_{o} } \right)}}{{r_{13} \left( {X_{A} - X_{o} } \right) + r_{23} \left( {Y_{A} - Y_{o} } \right) + r_{33} \left( {Z_{A} - Z_{o} } \right)}}$$

Because of the different laser and image data acquisition position, depth images will contain occlusion information where an object or parts of it are not visible due to another object closer to the camera blocking the view. The concept of the problem of occlusion is shown in Fig. 6. Adequate concept of visibility is used in our approach to filter the hidden LiDAR points. The algorithm suggests that the visible point is closest to the center of the image perspective, while the other overlapping points are considered occluded points in the current view of the image. The algorithm compares the estimated depth values for each LiDAR point to the available depth buffer values. The final grid will only process and project the point clouds that could be visible in the camera field.

Fig. 6
figure 6

Filtering hidden points to draw the visible surfaces

In general, the following steps will briefly describe the RGB-D algorithm:

  1. 1.

    A corresponding empty grid matrix with the same pixel size is identified for each RGB image.

  2. 2.

    The information of the depth value associated with each grid in the matrix using the co-linearity equations.

  3. 3.

    Hidden points are filtered to draw only the visible points in this grid.

  4. 4.

    To estimate values in the empty grids, the nearest neighbor interpolation with a specific threshold is used.

  5. 5.

    The indexing grid structure and the color image generate an RGBD image.

The accuracy of the depth values is dependent on the transformation parameters and the laser scan ground sampling distance. The LiDAR data was collected with an average resolution of 2 mm for our test scene. Different RGB close-up images were collected for the Treasury monument’s chamber. The transformation of point cloud in the RGB image domain and the depth values of two different images are shown in Figs. 7 and 8.

Fig. 7
figure 7

Mapping 3D point cloud on the corresponding image

Fig. 8
figure 8

RGB-D channels for different pose images

Feature extraction

The routine uses one of the most popular and reliable edge detection algorithms of 2D image processing, Canny Edge Detection [43]. This is since its results are much useful for the 2D contour calculation. Different lower threshold values are examined to detect the facade details and the most appropriate threshold values are selected. Through its index value, each pixel in the segmented binary image has its 3D value from the corresponding depth image. The procedure is implemented and evaluated on two different color images from two different positions. From the results shown in Fig. 9, it can be seen that almost all facade details are detected in 3D representation.

Fig. 9
figure 9

3D linear features extracted from two different pose images

Discussion

Model size increases at a higher rate than developments in computer hardware and software, thus reducing the potential for an easy interaction with a huge 3D model, especially with the growing demand for 3D content sharing. The ideal solution should produce completely 3D geometries of the related features with simple protocols to allow non-technical users in handling such processing methods. In this case, it is appropriate to segment 3D models to allocate the information needed to be depicted, visualized and queried to each extracted feature. In this article, we proposed a solution based on the RGBD data that combines color and depth information for 3D feature extraction. The images generated by RGBD have rich colored geometric datasets that allow various levels of detail to be developed. Compared to other 3D point algorithms, our algorithm simplified data manipulation and primitive extraction, since minimum computational power is needed for the 2D images. The aforementioned methods [22,23,24,25] have a high calculation costs incurred in the neighborhood search for any large-scale LiDAR application. The methods require curvature and normal estimation, but for small surface features such as with the historic building façade, curvature approximations are still inaccurate.

In other hand, the existing methods that use color information during the data processing, depends on a camera fixed from the same viewpoint of the 3D system in order to capture color and depth images simultaneously [26, 31,32,33]. These photographic images may suffer from scene-wide variations in lighting, such as shadows that are common throughout the outdoor scene. Furthermore, the scanner location and distance from the scene may not be sufficient for the camera to capture the require fine surface information. The proposed method in this study solves such challenges by collecting the point clouds and images separately, allowing optimal time and position for the high visual quality of the scene features. The results provide satisfactory 3D contour points that represent the location of the facade edges and linear features as depicted in Fig. 10. The 3D edge features are accurately mapped into the corresponding 2D images. Figure 11 shows the flexibility mapping of the 3D extracted features into different 2D images and the 3D meshed through reverse central projective transformation. The results allow better data understanding and weathering forms quantification. Automatic detection of the continuous extent of material displacements with digital measurements will reduce cost of field inspections and increase safety. In addition, the method significantly reduces the amount of information that can be displayed and interact fluently with the obtained 3D model from 3.3 million to 148 thousand.

Fig. 10
figure 10

The 3D contour points of the extracted features are projected in the corresponding image

Fig. 11
figure 11

Mapping the 3D features on different RGB images and the meshed laser model

Conclusion

The integration of photogrammetric and LiDAR data has shown a significant promise in extracting the surface features from dense 3D point cloud data on the real scene façade. In heritage applications, automatic detection of the continuous extent of material displacements with digital measurements will reduce cost of field inspections and increase safety. The presented algorithm utilizes the intensity values of the color images with the LiDAR data to automatically detect and quantify façade linear features. Given an unstructured point cloud as input, a structured depth channel is sampled and projected to the color channels to compute (RGBD) layers. The linear features of the surface are initially extracted using the optical 2D imagery and subsequently, each pixel of the linear features is projected directly into 3D space. The proposed solution is flexible as it acquires point clouds and images separately, allowing the high visual quality of the scene features to be optimized in time and place. Experimental results from real data are used to evaluate the performance of the proposed methodology. The approach robustly defines façade features and provides better interpretation of spatial growth of weathering forms and severe cracks. Additionally, modeling these features significantly reduces the amount of data that can be viewed and fluently interact in 3D, allowing for surface analysis with simpler models. It is expected that the extracted features can be used in future researches to evaluate and monitor the architectural buildings structure. Although the current version of C++ algorithms does not yet run in real time, during improved application it is expected to be implemented in real time.