Introduction

Modern measurement technologies such as terrestrial laser scanning (TLS) are commonly applied to register, preserve, protect and monitor different engineering objects [1], perform structural health monitoring [2,3,4,5], assist with construction management [6], carry out three-dimensional (3D) model reconstruction [7], monitor deformation of structures [8,9,10,11,12,13] and significantly assist with preservation and safeguarding of cultural heritage objects and sites [14,15,16,17,18,19,20,21], owing to its accurate data acquisitions and processing, which is required to generate the documentation such as 3D models, vector drawings or other architectural documentation [22,23,24,25,26]. The acquisition and processing of point clouds from terrestrial laser scanners is a multi-step process consisting of (1) Survey planning, (2) Field operation, (3) Data preparation, (4) Data registration, (5) Data processing, and (6) Quality control and delivery [27]. Planning of the optimal TLS positions and target locations depends on the surveying area and the design consideration of the project. Based on the adopted data orientation method, these target locations might be natural points that are detected in the point cloud or specific signal points in the form of black and white chessboards, retroreflective points, or spheres with a known radius (Fig. 1). Since the TLS point clouds are collected in the local reference system, it is required to perform the registration step (first step of the TLS point cloud processing methodology), allowing to transform point clouds into the assumed reference system [28].

Fig. 1
figure 1

a The example of the artificial targets, b registration between two scanned positions [27]

For large and complex objects and sites, obtaining data from multiple TLS positions and transforming them into the defined reference system is required, as a single position will not provide the significant data needed for an accurate model generation. The transformation into the defined reference system relies on detecting corresponding points, shapes or features in at least two-point clouds, and the exterior orientation parameters are obtained for each scan. These parameters determine the spatial location of the central point of the scanner system in the assumed reference system together with three rotation angles, which are then used to transform the point cloud [29].

In literature, many investigations address the problem of TLS point cloud registration in the context of the effectiveness, efficiency and robustness of this process [30,31,32,33,34] and divide these methods into two main groups depending on the amount of the input data—pairwise or multiview registration [2]. Most of these algorithms are the coarse–fine-strategy [35, 36], which assumes that (1) in the first step—the translation and rotation parameters are approximated [28] and (2) in the final step—fine registration is performed by algorithms such as normal distribution transform (NDT) algorithm and its variants [37,38,39] or Iterative Closest Points (ICP) algorithm or its variants [38, 40]. A review of the commonly used methods for TLS registration can be found in the article [41].

Several challenges are encountered during data registration in Terrestrial Laser Scanning (TLS) point cloud processing. These challenges pertain to ensuring the accurate spatial distribution of data, addressing control point identification, enhancing automation in the process, and conducting robustness analysis. This becomes especially critical when examining extensive and intricate heritage sites where the deployment of marked control points is unfeasible. Furthermore, in the case of multi-temporal data alignment, the issue of establishing correspondences between reference points also arises. Consequently, automatic tie-point detection methods are necessary to mitigate these challenges effectively.

This paper aims to present the possibility of using the TLS-SfM method for the orientation of point clouds from terrestrial laser scanning of the interiors of historic and public buildings. This research compares the utilisation of selected 2D hand-crafted and learned methods for finding tie points. This article presents the effectiveness of different algorithms (AFAST, ASIFT, ASURF, LoFTR, SuperGlue and KeyNet with AffiNet and HardNet) in the point detection step with extended quality and robustness analysis based on the reliability assessment. The interiors of historical 17th-century basements at the Royal Castle in Warsaw without decorative structure (Test Site I and II), the Museum of King Jan III’s Palace at Wilanow with decorative elements, ornaments, and materials on walls (Test Site III) and flat frescos (Test Site IV), narrow office (Test Site V) and shopping mall (Test Site VI), were selected for this study. For such objects, the distribution of the signalised points utilised in the data registration process may not be possible owing to the inability to distribute it on historical wall fragments, the deployment of tripods that would have the effect of obscuring the objects under development and the spatial distribution of points (caused by the complex shapes of the objects being developed), which would affect the accuracy of registration and error detection according to robustness theory.

The method for point cloud registration is based on intensity rasters (together with a depth map) and an extended Structure-from-Motion (TLS-SfM) approach. The advantage of the method for point cloud registration over the Target-based method is that more automatically detected tie points are used for orientation with better spatial distribution and robust outliers' detection regarding the reliability theory. The Iterative Closest Points (ICP) method is based on the point-to-point and point-to-plane approaches, which require clouds to be pre-oriented when connecting point clouds to guarantee the final registration's correctness. In the TLS-SfM approach, such a condition is unnecessary since the selection and elimination of tie points are utilised in a two-step manner through descriptor matching and geometrical verification based on the RANSAC algorithm.

This article is divided into five main sections. Sect. “Principle of work” presents the fundamental principles of the hand-crafted and learned feature detectors and descriptors. Sect. “Methodology” contains a description of the test sites and the approach used. Sect. “Results and discussion” presents the results of the detector assessments, and Sect. “Conclusion” concludes the proposed study, highlighting the advantages and limitations of using different affine 2D detectors and future work approaches.

Principle of work

TLS point cloud registration

Several methods of TLS data registration exist, which may be generally divided (followed by Vosselman and Maas [42] proposed definitions) into target-based and feature-based [16, 43,44,45,46,47,48,49]. The TLS data registration methods are generally based on the corresponding features between two or more datasets. Still, the main differences might be seen in determining and matching these corresponding points. Despite the existence of two different approaches to the determination of tie points, to define the relationship between the local instrument and the global reference system, Eq. (1) is used:

$$ \begin{gathered} \left[ {\begin{array}{*{20}c} {X_{i} } \\ {Y_{i} } \\ {Z_{i} } \\ \end{array} } \right] = M_{ij} *\left[ {\begin{array}{*{20}c} {x_{ij} } \\ {y_{ij} } \\ {z_{ij} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {X_{j}^{c} } \\ {Y_{j}^{c} } \\ {Z_{j}^{c} } \\ \end{array} } \right] \hfill \\ M_{ij} = \left[ {\begin{array}{*{20}c} {a_{11} } & {a_{12} } & {a_{13} } \\ {a_{21} } & {a_{22} } & {a_{23} } \\ {a_{31} } & {a_{32} } & {a_{33} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {cos\varphi cos\kappa } & { - cos\varphi sin\kappa } & {sin\varphi } \\ {cos\omega sin\kappa + sin\omega sin\varphi cos\kappa } & {cos\omega cos\kappa - sin\omega sin\varphi sin\kappa } & { - sin\omega cos\varphi } \\ {sin\omega sin\kappa - cos\omega sin\varphi cos\kappa } & {sin\omega cos\kappa + cos\omega sin\varphi sin\kappa } & {cos\omega cos\varphi } \\ \end{array} } \right] \hfill \\ \end{gathered} $$
(1)

In this Equation, the coordinates of the object points (reference points) correspond to the vector \({\left(\begin{array}{ccc}{X}_{i}& {Y}_{i}& {Z}_{i}\end{array}\right)}^{T}\), points in the local (scanner) coordinate system and are represented by the vector \({\left(\begin{array}{ccc}{x}_{ij}& {y}_{ij}& {z}_{ij}\end{array}\right)}^{T}\), the scanner position \({\left(\begin{array}{ccc}{X}_{i}^{c}& {Y}_{i}^{c}& {Z}_{i}^{c}\end{array}\right)}^{T}\) scanner rotation \({M}_{ij}\)(three Euler angles \(\omega , \varphi , \kappa \) that are used to construct the rotation matrix).

The least-square estimation is required to determine the exterior orientation parameters for the oriented point. Teunissen [50] used the well-known Gauss-Markow linear model (a linearised form of the nonlinear input relationships), which is also used in the TLS/photogrammetric bundle adjustment process [51]. To determine the normal equation matrix and vector, the least-square adjustment is used with the following analytic form (Eqs. 2, 3, 4):

$$ \begin{gathered} y + e = Ax; e \sim \left( {0, C_{e} } \right) \hfill \\ A = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {x_{1} } & {y_{1} } & {z_{1} } \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} 0 & 0 & 0 \\ {x_{1} } & {y_{1} } & {z_{1} } \\ 0 & 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} 0 & 0 & 0 \\ 0 & 0 & 0 \\ {x_{1} } & {y_{1} } & {z_{1} } \\ \end{array} } & {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \\ {\begin{array}{*{20}c} {x_{2} } & {y_{2} } & {z_{2} } \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} 0 & 0 & 0 \\ {x_{2} } & {y_{2} } & {z_{2} } \\ 0 & 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} 0 & 0 & 0 \\ 0 & 0 & 0 \\ {x_{2} } & {y_{2} } & {z_{2} } \\ \end{array} } & {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \\ \vdots & \vdots & \vdots & \vdots \\ {\begin{array}{*{20}c} {x_{m} } & {y_{m} } & {z_{m} } \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} 0 & 0 & 0 \\ {x_{m} } & {y_{m} } & {z_{m} } \\ 0 & 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} 0 & 0 & 0 \\ 0 & 0 & 0 \\ {x_{m} } & {y_{m} } & {z_{m} } \\ \end{array} } & {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \\ \end{array} } \right]\,\,\,\,\,x = \left[ {\begin{array}{*{20}c} {a_{11} } \\ {a_{12} } \\ {a_{13} } \\ {a_{21} } \\ {a_{22} } \\ {a_{23} } \\ {a_{31} } \\ {a_{32} } \\ {a_{33} } \\ {X^{c} } \\ {Y^{c} } \\ {Z^{c} } \\ \end{array} } \right]\,\,y = \left[ {\begin{array}{*{20}c} {X_{1} } \\ {Y_{1} } \\ {Z_{1} } \\ {X_{2} } \\ {Y_{2} } \\ {Z_{2} } \\ \vdots \\ {X_{m} } \\ {Y_{m} } \\ {Z_{m} } \\ \end{array} } \right] \,\,\,\,e = \left[ {\begin{array}{*{20}c} {e_{{X_{1} }} } \\ {e_{{Y_{1} }} } \\ {e_{{Z_{1} }} } \\ {e_{{X_{2} }} } \\ {e_{{Y_{2} }} } \\ {e_{{Z_{2} }} } \\ \vdots \\ {e_{{X_{m} }} } \\ {e_{{Y_{m} }} } \\ {e_{{Z_{m} }} } \\ \end{array} } \right] \hfill \\ \end{gathered} $$
(2)
$${A}^{T}PAx= {A}^{T}Py$$
(3)
$$P= {C}_{y}^{-1}$$
(4)

where: \(A\)—coefficient matrix (m × n) (m—number of observational equations,—\(n\)number of unknowns), \(rank(A)=u\) (full rank);—\(x\)parameter vector (n × 1);—\(y\)observation vector (m × 1) (uncorrelated observations); \({C}_{e}\)—observation error covariance matrix (m × m) (positively determined) is also the observation result covariance matrix, i.e., Ce ≡ Cy.

The selection and arrangement of tie points in the point cloud orientation process play a crucial role. When considering the possibilities of using tie points in the data orientation process, it is essential to consider their use in accuracy and detecting, locating, and eliminating outliers that may occur during the adjustment process. Reliability theory deals with diagnosing outliers in observations and datasets used in the alignment process [52,53,54,55,56,57].

In this article, the proposed reliability approach will compensate for the orientation quality based on the local reliability criteria, which enables determining if the pair of tie points is correctly matched. The proposed quality assessment method will focus on the RMSE on control and check points evaluation and consider the points’ spatial distribution. Based on the least square method (Eqs. 2 and 3), the formula for local reliability criteria is determined (Eq. 5), which is called the “disturbance-response” dependency and is one of the basic elements of reliability theory:

$$ \begin{gathered} v = - Ry \hfill \\ R = I - A\left( {A^{T} A} \right)^{ - 1} A^{T} \hfill \\ \end{gathered} $$
(5)

where: \(R\)—reliability matrix of the tie points; \(I\)—identity matrix, \(A\)—coefficient matrix based on the tie points.

The analysis of the internal reliability factors, based on the diagonal value of the matrix R (an orthogonal projection operator), that values are between < 0,1 > . It is stated that if: (1) \({\{R\}}_{ii}=0\) the tie point is uncontrolled by other points; (2) \({\{R\}}_{ii}=1\) the tie point is fully controlled by other points; (3) \({\{R\}}_{ii}>0.5\) tie point (in relation to other points) is well distributed regarding reliability theory. This method is very useful for automatically analysing and selecting the TLS point registration [58].

TLS point featured-based cloud registration

The current state-of-the-art approach for TLS data registration is based on two main methods, namely, (1) point-based (provided control points/markers) and (2) feature-based methods [42]. One of the feature-based methods is Structure-from-Motion (SfM), which is carried out in the following steps: (1) feature extraction; (2) feature matching; (3) geometric verification; (4) reconstruction initialisation; (5) image registration; (6) triangulation, and (7) bundle adjustment (Fig. 2). To generalise, the SfM approach might be divided into two main parts: the correspondence search phase (1–3) and iterative reconstruction phase (4–6) [46, 59,60,61,62].

Fig. 2
figure 2

Incremental SfM methodology [59]

The classical SfM uses the group of collected images. Still, in the case of TLS registration, the point cloud should be converted into the spherical raster based on the cartographical Equation (Eqs. 6, 7, 8). Referring to Fig. 3, a TLS with the panoramic architecture acquire the spherical coordinate observation defined as a ρ—the measured distance between the object and scan position, θ—horizontal direction and φ—vertical (elevation) angle. These values might be expressed concerning the Euclidean coordinate system (Eqs. 6, 7, 8):

Fig. 3
figure 3

Relation between spherical coordinates and coordinates on spherical photographs a Graphical representation of the relation between polar coordinates measured and the raster image in spherical projection [63], b formula for recalculation of polar coordinates to spherical projection, and c formula for recalculation of x,y spherical projection onto polar coordinates

$${\rho }_{ij}= \sqrt{{x}_{ij}^{2}+{y}_{ij}^{2}+{z}_{ij}^{2}}$$
(6)
$${\theta }_{ij}=\mathrm{arctan}\left(\frac{{y}_{ij}}{{x}_{ij}}\right)$$
(7)
$${\alpha }_{ij}=\mathrm{arctan}\left(\frac{{z}_{ij}}{\sqrt{{x}_{ij}^{2}+{y}_{ij}^{2}}}\right)$$
(8)

A spherical image (for which the raster grey-level value assumes the laser beam reflectance intensity value) is used, together with the map of depth (i.e., the distance to the analysed object), for TLS data orientation. This point cloud representation is applied and implemented in many commercial software tools [46, 64,65,66,67]. The main advantage of that data representation is the possibility of using raw data with the highest resolution and without the interpolation of new values of pixel coordinates. It is also possible to generate an intensity raster of any resolution, and this can be done by converting new pixel values based on the formulas shown in Fig. 3.

To compare the points in different rasters, it is necessary to determine the invariant features. The detection and description of features for each characteristic point are essential for the process of detection of homologous points because the final points' recognition as tie points is carried out by matching their relative descriptors in the process of data orientation. There are two approaches usually applied: (1) the Approximate Nearest Neighbour-Based Point Matching [68] and (2) Brute Force matching [69].

The fundamental principles of the 2D feature

Feature detection (also called extraction) is the first and the most essential step in the SfM methodology that relies on the detectors. The key extraction principle is to recognise each raster data (image from a group of processed images) and a group of characteristic points (also called keypoints) based on the local characteristic of the intensity. For feature extraction, different methods and algorithms can be used, such as point detectors [70], line detectors [71] or blob detectors [72], which affects the robustness of the detected features and efficiency of the matching method.

Those features should have the following properties that allow to determine the characteristics of the detector “[73]: (1) Repeatability—the possibility to detect a high percent of the features possible to recognise the scene part visible in both images taken under different viewing conditions; (2) Distinctiveness/informativeness—the intensity patterns used for detecting points should show a lot of variations; (3) Locality—the neighbourhood used to determine the point should be local in order to reduce the probability of occlusions and invariant of the photometric and geometric deformations; (4) Quantity—a number of the detected features that should be sufficiently large and allow to detect features even on small objects (however, number of keypoints depends directly on the application); (5) Accuracy—definition of the quality and possibility of feature localisation in regards to the scale-space and photometric and geometrical distortions; (6) Efficiency—determination of the required time for feature detection (important in the time-critical applications)”.

At present, there are two distinct approaches for detecting keypoints in images. The first approach involves utilising a group of hand-crafted algorithms, such as Scale-Invariant Feature Transform (SIFT) introduced by Lowe [74] and Speeded Up Robust Features (SURF) proposed by Bay and Ess [75]. The second approach, a learned-based feature extraction approach, employs methods such as SuperGlue or LoFTR. Hand-crafted detectors operate by detecting keypoints based on the grayscale gradient values in the local neighbourhood, using either blob detectors like SIFT, SURF, or CenSurE, or corner detectors like FAST introduced by Rosten and Drummond [76] and BRISK proposed by Leutenegger et al. [77] [REF], which compare grayscale differences with the analysed pixel. Point and blob detectors found wide application in the orientation of point clouds from terrestrial laser scanning [63]. The advantages of using point and blob detectors are (1) the speed of detection and match of tie points—might be extracted very efficiently, (2) the accuracy of localisation and scale-invariant, (3) stability over varying viewpoints and (4) the accuracy of TLS data registration [73, 78]. One of the significant limitations of these detectors is that they were designed to use images projected in the central projection. Such an approach assumes that standard image deformations might be expected. For this reason, using spherical rasters from point cloud conversions can result in significant deformations that contribute to problems concerning explicit identification and matching keypoints [19, 46, 63, 79]. This problem can be solved in two ways: (1) using different mapping representations (i.e., “virtual image”, orthoimage or Mercator representation) [60, 63] or (2) adding an affine component to the detectors [80].

In recent years, novel learning-based solutions have been developed to overcome the limitations of hand-crafted methods. These solutions encompass various approaches. The first approach, known as “detect-then-describe,” involves using a learned detector and descriptor, which can either be fully learned or combined with hand-crafted and learning-based methods. Notable works in this domain include Barroso-Laguna et al. [81], Verdie et al. [82] for the detector, Ebel et al. [83], and Mishchuk et al. [84] or the descriptor.

The second approach, “end-to-end,” aims to jointly optimise the entire pipeline to extract sparse image correspondences. Examples of end-to-end methods include SuperPoint, introduced by DeTone et al. [85]; SuperGlue, proposed by Sarlin et al. [86]; and DISK, presented by Tyszkiewicz et al. [87]. These end-to-end methods have been utilised to enhance both the repeatability and reliability of keypoints, leading to improved success rates in image matching and more accurate pose estimation, as demonstrated by Remondino [88].

More recently, researchers such as Choy et al. [89], Rocco et al. [90], and Li et al. [91] introduced a new approach, “end-to-end detector-free local feature matching methods.” These methods eliminate the feature detector phase and directly generate dense descriptors or dense feature matches. Notably, Sun et al. [92] introduced the LoFTR approach, which builds upon the Transformer architecture proposed by Vaswani et al. [93]. In contrast to the sequential process of image feature detection, description, and matching, LoFTR establishes pixel-wise dense matches at a coarse level and subsequently refines these matches at a fine level.

The feature description, matching and images registration

To match characteristic points in several photographs, it is necessary to describe their features based on their neighbourhood [72]. This is carried out by descriptors, which enable the determination of the invariant features that form the basis for comparing points in different photographs. The characteristic points' descriptions can be unified using one descriptor for each detector. For that purpose, the operations of the SIFT descriptor were utilised [72]. The operations of the SIFT descriptor consist of two stages: (1) calculation of the gradient (scale) and orientation of each point within the neighbourhood of a key point and (2) determination of a 128-element vector of features (a descriptor). The Gaussian images are used to determine the orientation of keypoints, which corresponds to the scale of a given keypoint. For each image point, the gradient module and orientation are calculated. The keypoints’ features are measured in relation to the determined orientation, which results in the description being independent of the rotation. The SIFT algorithm considers the gradient module and orientation within the neighbourhood of 16 × 16 for a given keypoint. Then, this area is divided into regions of 4 × 4 size, in which the resultant orientation histograms are re-created. The consequent gradient module for eight orientations is determined within each area based on the particular points of the modules. Thus, the point feature descriptor is a vector of 4 × 4 × 8 = 128 elements. The vector is normalised to reduce the influence of illumination. The next stage of considering points as tie points in image data orientation is their relative matching. In this article, the Approximate Nearest Neighbourhood-Based Point Matching [60] was used. At the end of the final iterative, the bundle adjustment process relies on the methodology described in subSect. “TLS point cloud registration”.

Methodology

Selected test site

The proposed method for automatic Terrestrial Laser Scanning data registration that involves the use TLS-SfM with hand-crafted and learned features to detect non-signalised tie points on point clouds was tested at six different sites, namely historic 17th-century basements at the Royal Castle in Warsaw without decorative structure (Test Site I and II), Museum of King Jan III's Palace at Wilanów with decorative elements, ornaments, and materials on walls (Test Site III) and flat frescos (Test Site IV), narrow office (Test Site V) and shopping mall (Test Site VI).

The Test Sites I and II are constructed of bricks filled with mortar. It has an irregular shape with a ceiling in the form of arches, with a maximum height of approximately 3.2 m and a minimum of about 2.1 m. Due to its historical character and the prevailing humidity conditions, the part of the room has damp walls and fragments of bricks crumble, making it impossible to place the signalled control points on the object. On the other hand, it was impossible to place the points on tripods because of the size and dimensions of the individual rooms. If the target-based methodology is implemented, it will increase the number of required scanner positions, leading to inaccurate point cloud registration.

Both Test Sites were marked with check points (that were not used for orientation parameters determination but were used for the independent quality assessment), which were placed at different heights. All points were measured with Total Station Leica TCRP 1202 with angular accuracy 2 s., linear accuracy 2 mm + 2 ppm. TLS data used in this work was acquired by phase-shift scanners Z + F 5006 h (Test Sites I, II and IV–VI) and Z + F 5003 from different positions and heights with an angular resolution \(360^\circ /320^\circ \) and point resolution 6.3 mm/10 m (Test Site III). Figure 4 presents the floor plan with marked dimensions for Test Sites I and II, including Terrestrial Laser Scanning (TLS) positions and marked reference points.

Fig. 4
figure 4

a The floor plan with marked dimensions and Terrestrial Laser Scanning (TLS) positions (red dots). Each name of the laser scanner position contains the name of the selected test site (I and II) and specified id (1, 2, 3, 4, etc.). For each TLS position, the height (h) was also defined as b a spherical map of point clouds for each Test Site

The Test Site I is a regular-shaped facility with dimensions of approx. 5.6 m × 5.1 m. A ventilation pipe runs through the centre of the room (halfway up the room) and is used to dehumidify the room, which limits the placement of the scanner stations. It was necessary to increase the number of scanner positions used for a full Test Site inventory and the number of marked control points. The Test Site II has dimensions of 7.4 m × 5.1 m and is divided by curves at 1/3 and 2/3 of the distance. In addition, it has recesses and long windowpanes. Therefore, increasing the number of signalised points and scanner positions was necessary, which resulted in some points not being visible on all scans.

Test Sites III and IV are two decorated historical chambers at the Museum of King Jan III's Palace at Wilanów. Test site III: “The Queen’s Bedroom” was characterised by geometric complexity in the form of rich ornaments, bas-reliefs, and facets. Moreover, mirrors in golden frames, decorative fireplaces, fabrics, etc., hung on the walls (Fig. 5). Test Site III is dimensions are approximately 6.4 m × 7.3 m × 5.3 m.

Fig. 5
figure 5

The point cloud in the spherical projection of Test Site III with marked points (red circles) [63]

Figure 5 presents the distribution of scanner positions and the scanning distances. Five out of six scans were acquired with the selected fragment of a chamber (the incomplete extent). The seventh scan (acquired with the full angular resolution) was applied as the reference scan. Sixteen marked points were distributed over the test site (considered as check points in further analyses), which were used for TLS data orientation.

Test site IV: “The Chamber with a Parrot” is characterised by the small number of ornaments and the lack of bas-reliefs, facets, or fabrics on the walls. In this Test Site, the walls were painted with patterns, which imitated spatial effects. Figure 6 presents the distribution of scanner positions and scanning distances, where the first scan was considered the reference scan. Due to the restriction on placing marked points on historical surfaces, automatically detected points defined as check points were used for the accuracy analysis. The dimensions of Test Site IV are approximately 4.2 m × 4.2 m × 2.6 m.

Fig. 6
figure 6

The point cloud in the spherical projection of Test Site IV without marked points [63]

The Test Site V is the office room at the main hall of Warsaw University of Technology. The smooth walls characterise the selected Test Site without the texture; lamps and power wires were on the ceiling, and the floor was covered with dark carpet. Figure 7 presents the distribution of scanner stations and scanning distances. The dimensions of the office room are approximately 7.4 m × 5.9 m × 4.5 m.

Fig. 7
figure 7

The point cloud example in the spherical projection of Test Site V with marked check points (red circles) [63]

The Test site VI is the “Empty shopping mall”. The walls of the room were smooth, without texture. Lamps, electric wires, and an air-conditioning system were on the ceiling; the floor was concrete. Figure 8 presents the distribution of scanner stations and scanning distances. Scan three was used as the reference scan, and eight marked points were distributed over the test site (considered as check points in further analyses), which were used for TLS data orientation. The dimensions of the Test Site VI are approximately 21.5 m × 7.1 m × 6.3 m.

Fig. 8
figure 8

The point cloud example in the spherical projection of test site VI with marked check points (red circles) [63]

The TLS-SfM approach

The approach based on a modified SfM algorithm was used to register the TLS-derived point. Figure 9 shows a schematic of the data processing using the TLS-SfM method.

Fig. 9
figure 9

Workflow of the proposed TLS-SfM point cloud registration approach

The TLS-SfM method is a multi-stage approach that consists of the following steps:

  1. 1)

    Conversion of point clouds to raster form (3D-2D).


    To convert point clouds to raster form, unprocessed raw data was selected to generate rasters with the maximum possible resolution (for each raster) and do not require interpolating the coordinate values for pixels. The mathematical relationship between cartesian and spherical coordinates was described by Fangi [44]. The data conversion from 3 to 2D consisted of converting the coordinates of the points from Cartesian to spherical based on Eqs. 6, 7, 8. The x and y coordinates in the raster area correspond to the values of the vertical and horizontal angles, respectively, and the intensity of the laser beam reflection and the X, Y, and Z coordinates of the points, respectively, are used to assign grey level values of the new raster. As a result of this step, 4 rasters are generated for each point cloud.

  2. 2)

    Corresponding search


    In the proposed TLS-SfM method, the process of finding tie points (feature detection and description) has been implemented using detect-the-describe, detect & describe (end-to-end) and describe-to-detect (end-to-end detector-free local feature matching methods) approaches. The detect-than-describe approach used a two-stage data transformation based on affine-based feature point detection and feature description using a descriptor. For both cases, both hand-crafted and learned-based algorithm approaches were used. A detailed description of the algorithms used is presented in subSect. “Overview of the investigated algorithms and evaluated criteria”. This step is performed for all possible pairs of rasters. To determine these pairs, the methods of permutations without repetitions are used:

    $$\left(\begin{array}{c}n\\ k\end{array}\right)=\frac{n!}{k!\left(n-k\right)!}$$
    (9)

    where: k = 2 (a pair of scans), n—the number of all scans.

    Descriptor matching (for detect-than-describe and end-to-end methods) is performed using the Approximate Nearest Neighbourhood-Based Point Matching algorithm and L2 distance metrics.

  3. 3)

    Tie points XYZ determination


    The 2D coordinates of the pre-matched tie points detected on the intensity rasters were used to interpolate the coordinates of the XYZ points. The X, Y and Z rasters generated in the first data processing step were used for this purpose, respectively. The bilinear method was used as the interpolation method.

  4. 4)

    Tie point geometrical verification


    The geometrical verification of the detected tie points (based on 3D coordinates, performed in the iterative process (RANSAC method) with the following assumptions—full registration (the accuracy on control and check points do not exceed 5 mm and covariances factors are higher than 0.5), initial registration used for final registration bases on the ICP (threshold 10 mm) and non-registration (values on control and check higher than 10 mm). The output of this data processing step was (1) the set of correct tie points, (2) the linear RMSE value of the scan pair match, (3) the number of tie points and (4) approximate transformation parameters.

  5. 5)

    Incremental reconstruction


    The Incremental reconstruction process starts with selecting the reference scan to which the other point clouds will be registered. To do this, the pair of point clouds for which the highest number of tie points was first detected is selected. From this pair of points, the point cloud with more connections to the other scans is selected. To match the remaining pairs of scans, the process is performed iteratively according to the following steps:

  6. (a)

    Localise a new pair of scans to the current pre-registered point clouds,

  7. (b)

    Compute the approximate point clouds registration parameters,

  8. (c)

    Find correspondence points on multiple point clouds,

  9. (d)

    Repeat steps a-c until all pairs of scans have been added.

    The result of this stage is an approximation of the mutual orientation parameters and all possible connections between point clouds.

  10. 6)

    Final bundle adjustment

    A final bundle adjustment is based on early iterative matching of the point clouds to the reference scan. This involves determining the orientation elements of the point clouds with simultaneous filtering of outlier observations based on RMSE error values and reliability coefficients. In addition, based on the measured control points, it is possible to orient the point clouds to the reference coordinate system. As a result of the TLS-SfM process, point cloud orientation elements are obtained in the adopted reference system.

Overview of the investigated algorithms and evaluated criteria

This study investigates the quality improvement and completeness of the TLS registration process using 2D raster data and affine-detectors. To compare and verify the results of the point cloud registration, based on the selected hand-crafted and learned features, the multi-stage TLS-SfM registration methodology was followed.

  1. (1)

    Hand-crafted affine detectors, namely, corner detector (AFAST) and blob detectors (ASURF and ASIFT), were tested. The use of affine in feature point detection involves two steps: (a) multiple virtual image generation (which includes the skew, tilt, and rotation) to simulate the influence of the affine and (b) for each virtual image, apply the detector:

  2. FAST (Features from Accelerated Segment Test) [76] utilises corner keypoints in images to detect by comparing the brightness intensities of pixels in a circular neighbourhood around each pixel of interest. The technique will classify the pixel as a corner depending on the neighbourhood's brightness and number of contiguous pixels and then to the central pixel using a threshold value. The FAST corner detector is based on a decision tree structure that allows for quick evaluation of the pixel intensities, making it suitable for real-time applications.

  3. SIFT (Scale-Invariant Feature Transform) [74]—the purpose of SIFT is to detect and describe distinctive image keypoints. The advantage of this technique is its invariant nature to the scale changes, rotations, and changes in illumination, which makes it robust to variations in image conditions. The working principle of the SIFT algorithm is identifying stable keypoints using a scale-space representation of the image and applying a Difference of Gaussians (DoG) operator to detect local extrema. These keypoints are then described based on their surrounding gradient orientations, resulting in highly distinctive and invariant feature descriptors.

  4. SURF (Speeded-Up Robust Features) [75] offers faster computation. It provides robustness against image transformations by utilising integral images to efficiently calculate various image filters, such as the Haar wavelet responses, which capture both local intensity and orientation information. SURF detects keypoints by identifying locations with extreme responses in scale-space and orientation.

  5. (2)

    Authors implemented the learned-based features:

  6. SuperGlue [94] for reliable correspondence between keypoints across different images. Unlike traditional hand-crafted methods, SuperGlue predicts the matching likelihood and establishes matches directly from the input data. It consists of two main components: (1) a learned embedding network and (2) a geometric verification module. The embedding network is used to map keypoints from two images into a shared feature space, where their similarity is measured. The geometric verification module uses the learned embeddings to estimate a geometric transformation between the keypoints and refine the matches. SuperGlue can leverage rich contextual information and handle challenging scenarios such as occlusions and viewpoint changes owing to jointly learning feature representation and the matching process.

  7. LoFTR (Local Feature Transformer) is an end-to-end detector-free local feature-matching method introduced by Sun et al. [92]. LoFTR creates dense pixel-wise correspondences between images using a Transformer-based architecture. LoFTR directly predicts dense correspondences without needing a feature detector, unlike traditional approaches that require separate stages for feature detection, description, and matching. It operates in two steps: (1) coarse matching and (2) fine matching. LoFTR employs a self-attention mechanism in the coarse matching stage to allow each pixel to attend to its neighbours and capture their contextual information to create a pixel-wise dense matching. The coarse matching stage is used to provide the initial estimation of correspondences. LoFTR uses a hierarchical refinement network to refine the initial matches in the matching stage. This network takes the initial correspondences and iteratively refines them by considering local spatial relationships and context. LoFTR improves the accuracy and reliability of the correspondences by iteratively refining the matches. LoFTR's Transformer-based architecture captures long-range dependencies and global contextual information, enhancing the quality of the dense correspondences. This approach eliminates the need for explicit feature detection and produces dense descriptors directly, leading to improved matching performance.

  8. KeyNet detector + AffNet + HardNet descriptor (later called KeyNetAffine)—is a combined hand-crafted and learned method to detect features. KeyNet is a state-of-the-art keypoint detector [81] that leverages deep learning techniques to detect distinctive image keypoints. KeyNet utilises a convolutional neural network (CNN) architecture, which is trained on large-scale datasets with annotated keypoints. KeyNet, to maximise the detection accuracy and robustness, identifies salient and repeatable keypoints, which allows for optimising the network parameters. This detector is highly adaptable to diverse image conditions due to excellent handling of variations in scale, rotation, and illumination, demonstrating outstanding performance in keypoint-based applications, namely, image matching, object recognition, and visual tracking. The HardNet is a feature descriptor used in computer vision applications, particularly for matching and recognition tasks. The HardNet descriptor [84] is designed to capture and encode distinctive information from image patches, making it robust to variations in scale, rotation, and lighting conditions. The descriptor is computed by extracting local patches around keypoints and encoding them into fixed-length feature vectors. HardNet can handle challenging scenarios, such as significant viewpoint changes and occlusions, owing to focusing on the most informative and discriminative patches. HardNet utilises a Siamese neural network architecture that learns to optimise the feature representation for improved matching accuracy. During training, pairs of matching and non-matching patches are used to learn discriminative feature embeddings.

To evaluate the accuracy of TLS point cloud on learned-based methods, it was decided to use those approaches trained on images depicting historical buildings and architectural objects (for LoFTR—MegaDepth, SuperGlue and KeyNetAffine—PhotoTurism, respectively). The additional retrained learned-based descriptors were chosen due to the desire to test ready-made solutions and compare them with hand-crafted methods.The quality improvement and completeness of the TLS registration process were compared against several metrics presented in Table 1.

Table 1 Metrics for evaluating the hand-crafted and learned-based features

Results and discussion

Automatic pairwise point cloud registration- accuracy evaluation

To assess the detector’s or affine-detector’s applicability in the TLS registration process, the accuracy of the orientation of all possible overlapping pairs of scans from different heights and distances from scanned surfaces was analysed. The results are presented in Table 2 and marked in colour: (1) green—the complete registration with the X, Y and Z with RMSE ≤ 0.005 m and covariance factor > 0.5; (2) orange—preliminary orientation; obtained parameters should be treated as the initial parameters for Iterative closest Point (ICP) registration and (3) red—no registration because the points were not well distributed and/or the RMSE < 0.01 m and/or covariance < 0.5. Additionally, due to the processing of point clouds of wall fragments (rather than the entire room) on Test Site III, it was decided to mark "x" pairs of scans that do not overlap.

Table 2 The accuracy of the TLS registration for detectors and a-detectors

The results in Table 2 show that only AFAST (point detector) and ASIFT (blob detector) allow for correct registration of all pairs of scans for all test sites. The remaining algorithms should be analysed individually for each test site. The LoFTR approach obtained the worst results: for Test Site I, only 1 of 6; Test Site II, 0 of 15; Test Site III, 0 of 9; Test Site IV 6 of 6; Test Site V, 0 of 28 and Test Site VI 0 of 20 pairs of scans were correctly oriented (full orientation). For the other learned-based approaches for point detection, significantly better results were obtained. In the case of the SuperGlue detector for Test Site I, 2 of 6; Test Site II, 11 of 15; Test Site III, 8 of 9; Test Site IV, 6 of 6; Test Site V, 24 of 28 and Test Site VI 6 of 21 pairs of scans were correctly registered. With the KeyNetAffine, it was possible to register all pairs of scans from Test Site IV, 5 of 6 pairs of scans for Test Site I, 12 of 15 for Test Site II, 1 of 9 for Test Site III, 16 out of 28 for Test Site V and 3 of 21 for Test Site VI.

When the multi-position TLS point clouds are registered, not only the percentage of the correctly aligned point cloud is necessary, but also the possibility of a global registration for all possible point clouds. The full registration (based on results of full and preliminary pair of scans orientation) for Test Site I, II, III, IV and V. For Test Site VI, it was impossible to perform the multi-position registration. The incompleteness of a pair of scan registrations for Test Site I and IV might affect the robustness of the global adjustment and approximately equivalence redundancy of the tie point on point clouds.

The hand-crafted detectors are the potential solution to overcome the problems mentioned above. Table 2 shows that the full multi-stage registration was conducted for Test Sites I–V. The worst results were obtained for Test Site VI, for which full registration was only possible with the ASIF and AFAST detectors.

The analyses of the performance of point/blob detectors and a-detectors on test fields characterised both by different textures, structures, numbers, and decorations and by scanner positions to varying distances from walls and heights demonstrated that:

  • Using the LoFTR approach, it was not possible to correctly register point clouds obtained by scanner positions, for which corresponding fragments were measured at significantly different angles to the normal vector surface (i.e., acute angles to the normal vector surface) and for significantly different distances from the scanner position. This influenced the occurrence of significant "distortions" in the spherical projection caused by the cartographic conversion of the 3D data from the 2D form.

  • Hand-crafted algorithms allow more resistant tie points to be detected, which translates into more correctly oriented scan pairs. The SIFT and SURF algorithms are based on greyscale gradients, making them scale-invariant and more robust. The performance difference is based on using a filter (CenSurE and SiFT—Laplasian centre-surround and Difference of Gaussian algorithms, respectively) and a Hessian (SURF and Difference of Boxes detector). For this reason, with these detectors, it was possible to detect a higher number of correctly matched keypoints, which affected the higher number of correctly registered pairs of scans.

  • Applying affine significantly improved the quality of the TLS point cloud pairwise and multi-stage registration. The use of ASIFT and AFAST allowed the orientation of point cloud pairs, necessary for final multi-position registration, for all Test Sites. This is also noticeable when applied to the KeyNetAffine approach. Compared to other learned-based methods, it was possible to orient more pairs of scans with a wide baseline (Test Site I, II and VI). For the orientation of short baseline pairs of scans characterised by high distortion (Test Site III and V), significantly better results were obtained for the SuperGlue approach.

The number of detected and matched keypoints after the final bundle adjustment

The number of tie points obtained after the full bundle adjustment process was analysed to assess the influence of the hand-crafted and learned features in the TLS registration process and the selection of the appropriate features. Table 3 presents the number of all tie points used in the full bundle adjustment and points for cases for which full bundle adjustment was impossible (marked with a cross).

Table 3 The number of all tie points used in the full bundle adjustment and points for cases for which full bundle adjustment was impossible (marked with a cross)

The number of used tie points presented in Table 3 indicated that hand-crafted detectors recorded the highest number of keypoints for all Test Sites apart from Test Site VI, for which the SuperGlue approach detected the most points. When considering the ratio of the number of points detected by the hand-crafted versus learned-based approach, it can be concluded that 26 times more were detected for Test Site I (AFAST—KeyNetAffine), 91 for Test Site II (ASIFT—LoFTR), 2.8 for Test Site III (AFAST—KeyNetAffine), 21 for Test Site IV (AFAST—KeyNetAffine), and 5 for Test Site V (AFAST—KeyNetAffine). Due to the lack of full bundle adjustment of all scans using Learned features, it was impossible to calculate the points ratio for the two approaches.

The analyses presented in Table 3 also show that, on average, the most tie scores were detected for AFAST and the least for KeyNetAffine. The significant difference in the number of points detected for the two approaches for Test Site I, II and IV is due to the characteristics of the sites. Test Site I and II is a historic brick cellar with an arched ceiling, and Test Site IV is a room with paintings imitating the spatial effect. For this reason, hand-crafted detectors, notably the AFAST detector (due to its mode of operation), detect significantly more points than other Test Sites characterised by less such unambiguous detail.

Their spatial distribution should also be considered when assessing the quality of the tie points used in the bundle adjustment process. This is crucial, as it impacts the quality of registration and the accuracy of the entire process. Figure 10 shows the distribution of points used in full bundle adjustment and points for cases for which full bundle adjustment was impossible (marked with a cross).

Fig. 10
figure 10

The tie points distribution used for TLS point cloud registration for each method

The analysis shows that despite the lower number of tie points detected by Learned-based methods compared to Hand-crafted detectors, their placement guarantees a correct point cloud registration. As with the number of points analysed, the distribution of points should be assessed independently for each Test Site:

  • Test Site I—The points detected by the hand-crafted detectors for all detectors have a similar spatial distribution. Noticeably, the issues are clustered in the lower part of the room and the middle of the ceiling. An uneven distribution characterises points detected using the LoFTR algorithm, and an increased density of points on wall sections is noticeable. For KeyNetAffine, the points are evenly distributed, and unlike for LoFTR, there are no areas with a significantly higher point density. When analysing the results for SuperGlue, there is a significant density of points in one part of the basement due to the inability to detect tie points on the minimum number of pairs of scans mortising full bundle adjustment.

  • Test Site II—The distribution of scores for all methods is similar for Test Site I. For the hand-crafted algorithms, the most points (highest density) were detected and used on the two walls visible on all scans. Significantly fewer points are on the ceiling, and the highest density was obtained in the central part of the basement. The best results were obtained for the ASURF, AFAST and ASIFT algorithms. For the learned-based algorithms, the best distribution of points (both points were on the ceiling and the walls) while maintaining a similar density for the entire basement was obtained for KeyNetAffine and the worst for LoFTR, for which points were mainly distributed on the walls in groups of different thicknesses. For the SuperGlue method, most points were distributed on the walls mapped on all scans and a small number on the ceiling. However, it should be emphasised that the number and distribution of points detected by the learned-based methods allowed the correct registration of all point clouds.

  • Test Site III—For Test Site III, which contains rich ornaments, bas-reliefs, and facets, the distribution of tie points was similar for all hand-crafted and learned-based methods except for the LoFTR algorithm. In summary, it can be concluded that the best distribution was obtained for points detected using the SuperGlue approach.

  • Test Site IV—As for Test Sites I and II, in this case, a higher point density for points detected by hand-crafted methods. For this type of algorithm, it is noticeable that there is a higher point density for areas where there is a more significant change in grey degree gradients. For this reason, these points are not evenly distributed throughout the study area. For learn-based methods (SuperGlue and KeyNetAffine), the distribution of points is more even than for hand-crafted methods. As for the previous Test Sites of the learned-based algorithm group, the most points were detected using the SuperGlue approach, the least using LoFTR.

  • Test Site V—In the case of an office room test field characterised by a lack of diverse texture and equipped with furniture and office equipment, the number, density, and distribution of tie points were similar for the AFAST, ASIFT, ASURF, SuperGlue and KeyNetAffine algorithms. As for the previous Test Sites, the worst results were obtained for the LoFTR-based approach, for which all point clouds could not be registered.

  • Test Site VI—An analysis of the distribution of tie points detected on the empty shopping mall scans shows that only hand-crafted ASIFT and AFAST detectors could orient all point clouds. This was due to the conversion of the 3D data to 2D and the influence of the presence of significant distortion in the image. Considering that points were searched on wide-based point clouds, applying the abovementioned methods allowed the detection of an adequate number of points evenly distributed over the entire study area. Comparing the results for points detected on rasters generated from pairs of scans with smaller baseline between point clouds and less distortion, the use of learned-based methods allowed the detection of a more significant number of correctly detected tie points. For this reason, when planning a survey of this type of object, it is crucial to decide whether to make fewer point clouds and use affine-detector-based hand-crated methods or to add several scanner stations to reduce the baseline between point clouds and use learned-based algorithms.

The comparison with the current state-of-the-art methods

To assess the accuracy and correctness of the presented approach for point cloud orientation based on affine-detectors and point clouds converted to raster form, it was decided to compare point clouds with the commonly used approach based on signalised control points (target-based registration) implemented in Z + F LaserControl software [47] and the Iterative Closest Points (ICP) method implemented in the open-source CloudCompare [48].

The target-based

The target-based method relies on the marked points and is commonly applied for TLS point cloud registration. These points should be evenly distributed across the investigated object. To compare results from the feature-based registration method with “normal” and affine detectors, the obtained results were compared with the TLS target-based registration from Z + F LaserControl software. To automatically analyse the influence of the geometrical point distribution with reliability assessment, the values of the covariance factors were compared. Results are shown in Table 4.

Table 4 Comparison of results of TLS joint/full registration method for all scans and the target-based registration method with reliability assessment for all Test Sites

Results presented in Table 4 show that the differences between the RMSE values on marked check points (obtained from multi-position TLS registration) depend on Test Sites.

  • For Test Site I, significantly higher accuracy of full-bundle adjustment can be observed on points detected with the ASIFT detector compared to the commonly used Target-based approach. The linear RMSE value was 2 times lower (1.8 mm). For the other algorithms, the linear RMSE values were similar to those of the Target-based approach and were for AFAST—3.4 mm, ASURF—3.7 mm, KeyNetAffine—3.6 mm and Target-based 3.5 mm, respectively. For the LoFTR-value method, the RMSE was 4.2 mm (0.7 mm higher) than the Target-based approach. The significant impact of using a Hand-crafted detector can be seen by analysing the minimum covariance factors. This contributed to fulfilling the network’s controllability condition and improving the geometric distribution of tie points for the minimum values (above 0.5, which is the threshold value). There is a noticeable increase in values from 0.35 for Target-based to 0.94 for AFAST, 0.98 for ASIFT, 0.97 for ASURF, 0.76 for LoFTR and 0.51 for KeyNetAffine.

  • For Test Site II, varying linear RMSE values are evident. The best results were obtained on points detected with ASIFT and KeyNetAffine—linear RMSE values of 2.3 mm—2 times lower than for Target-based. For AFAST and SuperGlue, the linear RMSE values are lower than for Target-based. Only for ASURF, which is 0.6 mm higher than Target-based and LoFTR—6.1 mm. Analysing the values of the minimum reliability indices, as for Test Site I, a significant increase in their values (which translates into a better geometric distribution and resistance to the influence of outliers) for all methods except SuperGlue.

  • In the case of Test Site III, the RMSE's deviation on detectors is approximately 2 times lower than Target-based (5.7 mm) for Hand-crafted detectors and similar to Target-based (but still lower) for Learned-based approaches. The covariance factor for the Hand-crafted method is in the range of 0.58–0.98, for Learned-based methods in the range of 0.51–0.86 and for target-based is 0.22. As mentioned, full registration for all scans with the LoFTR algorithm was impossible.

  • For Test Site IV, both Hand-crated and Learned features provided comparable results; therefore, it is difficult to judge if it is necessary to use the Learned-based method, as the obtained mean RMSE values for detectors and target-based method are similar. The minimum covariance factors values (about 0.98) are about 4.5 times better than the target-based method (0.23).

  • For Test Site V, similar results for Hand-crafted (2.5 mm–2.8 mm) and Learned-based methods (1.9 mm–2.4 mm) but slightly worse than Target-based (1.3 mm). The minimum covariance factor for both methods is in the range of 0.65–0.94, and for target-based is 0.29. In this case, orienting the point clouds using LoFTR-detected points was also impossible.

  • Completing the multi-station registration scans for Test Site VI was impossible due to the challenge of finding the corresponding points for Hand-crafted and Learned methods, except ASIFT and AFAST. Comparing values of RMSE, similar values can be seen for ASIFT and target-based methods. However, the AFAST detector demonstrated approximately 2–2.5 times worse performance. The min covariance factors for the AFAST, ASIFT and target-based methods were 0.1, 0.60 and 0.28, respectively.

Iterative closest points (ICP)

To assess the accuracy of TLS data registration using affine-detectors, the results were compared with the point-to-point ICP method using open-source CloudCompare software, commonly used in point cloud registration. The quality of point cloud matching was assessed by analysing the linear distance between pairs of point clouds. Point cloud resampling was performed with a fixed distance (1 mm) between points. Figures 11, 12, 13, 14, 15, 16 show the example of the worst scenario for all Test Sites. Each figure contains 8 histograms showing the probability density function of linear deviations between point clouds using the target-based method, the ICP point-to-point, Hand-crafted detectors (AFAST, ASIFT, ASURF) and Learned-based features (SuperGlue, LoFTR and KeyNetAffine).

Fig. 11
figure 11

The probability density histogram of linear deviations between the worst oriented pair of scans for Test Site I: a target-based method, b ICP point-to-point, c AFAST, d ASIFT, e ASURF, f SuperGlue, g LoFTR, h KeyNetAffine

Fig. 12
figure 12

The probability density histogram of linear deviations between the worst oriented pair of scans for Test Site II: a target-based method, b ICP point-to-point, c AFAST, d ASIFT, e ASURF, f SuperGlue, g LoFTR, h KeyNetAffine

Fig. 13
figure 13

The probability density histogram of linear deviations between the worst oriented pair of scans for Test Site III: a target-based method, b ICP point-to-point, c AFAST, d ASIFT, e ASURF, f SuperGlue, g LoFTR, h KeyNetAffine

Fig. 14
figure 14

The probability density histogram of linear deviations between the worst oriented pair of scans for Test Site IV: a target-based method, b ICP point-to-point, c AFAST, d ASIFT, e ASURF, f SuperGlue, g LoFTR, h KeyNetAffine

Fig. 15
figure 15

The probability density histogram of linear deviations between the worst oriented pair of scans for Test Site V: a target-based method, b ICP point-to-point, c AFAST, d ASIFT, e ASURF, f SuperGlue, g LoFTR, h KeyNetAffine

Fig. 16
figure 16

The probability density histogram of linear deviations between the worst oriented pair of scans for Test Site VI: a target-based method, b ICP point-to-point—CloudCompare, c AFAST, d ASIFT, e ASURF, f SuperGlue, g LoFTR, h KeyNetAffine

Based on the analysis of the results for Test Site I (Fig. 11), it can be seen that results obtained from ASIFT, ASURF, SuperGlue, LoFTR, Target-based and ICP methods are similar to a chi-square distribution. Still, better results are obtained from the detector-based approach.

For Test Site II (Fig. 12), all histogram shapes except the Target-based and LoFTR methods are similar to a chi-square distribution. The distance for 95% of the points for the Target-based method algorithm does not exceed 6 mm. The histogram peak of probability density histogram of linear deviations between the worst oriented pair of scans by LoFTR shows that deviations are higher than 10 mm and registration was performed incorrectly.

Test Site III’s best point cloud matching results were obtained for the ICP-based approach (Fig. 13b). The results obtained from Hand-crafted detectors (Fig. 13c–e) are similar to those obtained from target-based registration (Fig. 13a). The peaks of histograms are approximately 2 mm. The shapes of the linear deviations histograms for Learned-based approaches (Fig. 13f–h) are “flat”, indicating more significant errors in deviations between point clouds than for Hand-crafted methods.

Based on the analysis of the results for Test Site IV (Fig. 14), the results obtained from all methods (except KeyNetAffine) are similar to a chi-square, which were obtained by the Target-base and ICP point-to-point approaches. Despite not obtaining a chi-square distribution for the KeyNetAffine methods, it should be considered that the scans were oriented correctly as, for 95% of the points, the distance does not exceed 4 mm, which does not exceed a scanning point resolution of 6 mm/10 m.

Results obtained for Test Site V (Fig. 15) show that point clouds were oriented correctly using algorithms based on Hand-crafted detectors. In contrast, for the ASIFT detector, the distribution of values takes the shape of a chi-square distribution and coincides with histograms obtained for the target-based and ICP methods. Similar to the results obtained for Test Site IV (not chi-square distribution of other detectors) for the Learned-based approach, the deviations of 95% of the points do not exceed 6 mm, which does not exceed a scanning point resolution of 6 mm/10 m.

The worst results for comparing point cloud distances were obtained from empty shop using the Target-based method (Fig. 16a). This was due to the 12 mm/10 m scanning resolution, which translated into point cloud density and the ability to identify signalised points. For this reason, it is recommended to use the ICP method, which allows for the correct orientation of the data. Despite this, the probability density histogram of linear deviations between the worst oriented pair of scans shows that the distances between clouds do not exceed the accepted scanning resolution of 12 mm/10 m, which can be considered an acceptable registration result.

In summary, the data orientation results presented using an affine-detector allow robust registration, and choosing the ASIFT detector allows for complete data registration.

Conclusion

This article evaluated the quality improvement and completeness of the TLS registration process using 2D raster data from spherical images and Hand-crafted and Learning features in the multi-stage TLS point cloud registration. For this study, to compare and verify the detectors and A-detectors, the Royal Castle in Warsaw without decorative structure (Test Site I and II), Museum of King Jan III's Palace at Wilanow with decorative elements, ornaments, and materials on walls (Test Site III) and flat frescos (Test Site IV), narrow office (Test Site V) and shopping mall (Test Site VI) were used. The performed experiments demonstrated that:

  • The proposed TLS point cloud registration approach is a fully automatic solution independent of the object's interior type.

  • The selection of a suitable detector should depend on the test site being measured. In the case of cultural heritage interiors (characterised by a good texture and number of ornaments), it is possible to use both Hand-crafted detectors AFAST, ASURF, ASIFT and Learned-based SuperGlue and LoFTR. For the point cloud registration of public buildings, it is recommended to use detectors such as AFAST or ASIFT. On the other hand, using the ASIFT detector allowed for point cloud registration regardless of the geometry dependencies between individual scans and the test field being developed.

  • It is recommended to use the ASIFT or AFAST detector for TLS point cloud registration because these detectors could perform the multi-station registration at all Test Sites. Another solution might be to consider increasing the number of posts to minimise significant deviations on spherical images and use Learned methods, namely SuperGlue and KeyNetAffine.

  • The use of the affine hand-crafted detectors allows for detecting the high number of tie points, improving the accuracy and completeness of the TLS registration process compared to the learning-based approach. The number of ties detected increased for cultural heritage sites by 21–91 times and for public objects by about 2.8–5 times.

  • In analysing the accuracy of point cloud orientation on signalised check points, two cases should be considered separately, i.e., decorated rooms and public facilities. For decorative sites, the smaller values can be observed for linear RMSE errors for hand-crafted features (values approximately 2 times smaller) than those obtained by the Target-based approach and similar to Target-based values for the Learned-based approach. When comparing the results obtained for public interiors, it can be observed that similar accuracies to the target-based method were obtained for hand-crafted features and learned-based (where it was possible to register all scans). That proves that using a-detectors for point cloud orientation is correct and reasonable.

  • For low internal reliability indices, we have relatively low controllability of observations and thus low detection of outliers at the reference points. An important consideration is the number of points and their geometric distribution. In the target-based method, it is challenging to distribute many points and sometimes even impossible, while in the feature-based approach, a large number of points are automatically detected. A large number of points distributed over the entire surveyed object allows for relative control of points and the correct removal of outliers.

  • By analysing the internal reliability indices, using a-detectors allows for increased controllability of points and the detection of outliers in the dataset. This fulfilled the network's controllability condition, with 0.5 being the acceptable threshold value. Comparing results obtained from Hand-crafted and Learned features with values obtained for the points detected with the Target-based method, it can be observed that for Test Site I, the minimum value is 0.51–0.97, while for the target-based method, the minimum is 0.35. For Test Site II, the minimum is between 0.59 and 0.98 (only for SuperGlue is 0.28), while for the target-based method, the average is 0.20. For Test Site III, the average minimum covariance factors values (0.71) are about 3.2 times better compared to the target-based method (0.22); for Test Site IV, the minimum covariance factors for the targets-based method is 0.23 and about 4 times worse than the detector-based method. In the case of Test Site V, the minimum covariance factor for the detector-based method is in the range of 0.65–0.94, while for the target-based method, it is 0.29 and for the Test Site VI, the minimum covariance factors are 0.10, 0.60 and 0.28 for AFAST, ASIFT and target-based, respectively.

  • The proposed robust method for point cloud registration based on intensity rasters (together with a depth map) and affine-detectors allows us to obtain similar results as commonly used target-based and Iterative Closest Points methods. The advantage of the proposed approach for point cloud orientation over the Target-based method is that more automatically detected tie points are used for orientation, with better spatial distribution and robust outliers detection regarding the reliability theory. When registering point clouds using the ICP method, the clouds must be pre-oriented, as this guarantees the correctness of the final registration. In the affine-detectors approach, such a condition is not required since the selection and elimination of tie points are utilised in a two-step manner through descriptor matching and geometrical verification based on the RANSAC algorithm.

  • The obtained TLS registration results based on learned-based methods (on data trained on the images by the authors of the solutions) attest to high performance and use in data orientation. To further improve the accuracy and completeness of the data orientation on objects with poorer texture and less ornamentation (Test Sites V and VI), the authors plan to prepare a test dataset based on intensity rasters based on TLS point clouds.