Sampling locally, hypothesis globally: accurate 3D point cloud registration with a RANSAC variant

Correspondence-based six-degree-of-freedom (6-DoF) pose estimation remains a mainstream solution for 3D point cloud registration. However, the heavy outliers pose great challenges to this problem. In this paper, we propose a random sample consensus (RANSAC) variant based on sampling locally and hypothesis globally (SLHG) for 6-DoF pose estimation and 3D point cloud registration. The key novelties are efficient sampling by guiding the sampling process locally and accurate pose estimation by generating hypotheses with global information. SLHG first generates a correspondence subset via compatibility clustering on the initial set. Second, locally guided graph sampling is performed. Third, 6-DoF hypotheses are generated by incorporating global information with a voting scheme. The best hypothesis serves as the estimation result by repeating the second and third steps. Extensive experiments on four popular datasets and comparisons with state-of-the-art methods confirm that: SLHG manages to 1) achieve accurate registrations with a few iterations, and 2) yield better accuracy performance than most competitors.


Introduction
Registration of 3D point clouds is a fundamental problem in 3D computer vision, which contributes significantly to 3D reconstruction [1], 3D object recognition [2], remote sensing [3] and odometry [4].The objective of the 3D registration method is to obtain the relative six-degree-offreedom  pose between the source point cloud and the target point cloud to achieve alignment.
To perform accurate and robust point cloud registration, two different types of methods have been investigated, correspondence-free and correspondence-based methods.The 6-DoF estimation of correspondence-free methods does not rely on the local feature correspondences computed between point cloud pairs, and tends to first design an error function and then optimize it in an iterative manner.One of the most representative methods is the iterative closest point (ICP) method [5].It performs registration by iteratively searching the closest points of the source point cloud in the target one and optimizing the point-to-point distance error.Many variants of ICP have been proposed with different error functions and closest point definitions [6][7][8][9][10][11].Other typical correspondence-free methods include those based on branch-and-bound (BnB) [12,13] and normal distribution transform (NDT) [14].However, common drawbacks of correspondence-free methods include the high computational complexity and the risk of being trapped in local minima.
For correspondence-based methods, random sample consensus (RANSAC) [15] and its variants [16][17][18][19][20] are popular methods, because RANSAC-fashion methods are technically simple and robust to common nuisances.With abundant correspondences between point cloud pairs, RANSAC randomly samples three correspondences in each iteration, employs singular vector decomposition to generate hypothesis and performs hypothesis evaluation algorithm to determine the optimal registration solution.In challenging registration cases with sparse correct correspondences, RANSAC usually requires a huge number of iterations and the accuracy still may not be guaranteed.Some methods attempt to resolve this issue by devising either more efficient sampling techniques [21][22][23][24] or more reliable performance evaluation metrics [16,25,26].However, most of them still fail to achieve accurate 3D registration with a few iterations.
To this end, we propose a RANSAC variant based on sampling locally and hypothesis globally (SLHG) to achieve accurate 3D point cloud registration with a few iterations.
The key insight of our algorithm is that we perform locally guided sampling in a compatibility graph to ensure the efficiency and hypothesis globally to improve the accuracy.SLHG first constructs several clusters among the raw correspondences based on the compatibility features, and the maximum cluster is modeled as a graph.Second, guided local sampling is performed in the graph based on a novel metric to evaluate the confidence of samples.Third, samples are mutually voted for each other to mine global information for hypothesis generation.The second and third steps are repeated to find the best hypothesis.Experiments are performed on four datasets, i.e., U3M [27], 3DMatch, 3DLoMatch [28], and RESSO [29], which incorporate different application contexts and data modalities, to validate the efficacy of our proposed method.In summary, this paper presents two main contributions: 1) A new metric is proposed for guided sampling in a compatibility graph, which allows the mining of good samples in an early stage.2) A RANSAC variant called SLHG is proposed for 6-DoF pose estimation and 3D point cloud registration, which samples locally and hypothesizes globally.It is able to achieve accurate registrations with a few iterations.The rest of the paper is structured as follows.Section 2 provides a review of correspondence-based and correspondence-free registration methods.Section 3 details our proposed SLHG method.Section 4 presents the experiments deployed to validate the effectiveness of our method with necessary discussions.Finally, Sect. 5 draws the conclusions and presents potential future research directions.

Related work
In this section, we present an overview of correspondencefree and correspondence-based 3D point cloud registration methods.

Correspondence-free registration
The current representative correspondence-free 3D registration methods are primarily based on the iterative closest point (ICP), branch and bound (BnB), and normal distributions transform (NDT).
ICP registration.ICP is one of the earliest 3D registration methods, proposed by Besl and McKay [5].It employs a weighted Euclidean distance error function with an initial 6-DoF pose matrix, and iteratively solves the optimal 6-DoF pose.Some works focus on improving the closest point searching phase.Masuda et al. [30] proposed a method of randomly sampling points in each iteration of ICP.Weik [31] introduced the method of selecting points with high intensity gradient points in the convergence process of ICP.Turk [32] proposed removing boundary points in point selection to improve the accuracy.Some works improved ICP in the point matching stage.For instance, Godin et al. [33] proposed adding surface normal information based on Euclidean distance in closest match point searching.There are also several variants of ICP that define different error functions such as point-to-plane [6] and plane-to-plane [8] distance functions.
BnB registration.The BnB [34] method was applied to 3D point cloud registration by Olsson et al [12].BnB registration depends on point-to-point, point-to-line, and point-to-plane correspondences to constrain the upper and lower bounds of the registration error function.One typical work is GO-ICP [13], which transfers the BnB method into ICP registration to enhance its efficiency.Liu et al. [35] proposed increasing the efficiency of BnBbased registration methods by decoupling translation and rotation separately, which performs better than GO-ICP.Brown et al. [36] introduced deterministic nested BnB (BnB-D) and probabilistic nested BnB (BnB-P), which use inner BnB to compute the upper and lower bounds to guide rotation searching to achieve faster convergence and the global optimal solution.However, BnB-based methods usually suffer from high computational complexity.
NDT registration.Magnusson et al. [37] proposed adapting NDT [38] to the field of 3D point cloud registration.For 3D registration in NDT, an objective function is set to measure the probability density of each target point in the source point cloud, and the Newton optimization method is applied iteratively to obtain the optimal solution.Since the NDT method is not limited by local feature correspondence computation, it is efficient for registration.Many subsequent approaches have been proposed, such as Fast-NDT [39], which uses the global description method to increase reliability and achieve real-time performance.Stoyanov et al. [40] presented the L 2 distance-based 3D-NDT method to formulate the objective function in NDT-based registration, which improves the registration accuracy and efficiency.Shi et al. [41] combined the ICP and NDT methods and proposed the NDT-ICP registration method, and then employed the ICP fine registration to achieve better accuracy.
Correspondence-free methods usually require a good initialization and hold limited efficiency performance because the solution search space SE(3) is large without point-to-point correspondences.

Correspondence-based registration
Correspondence-based methods aim to minimize a registration error metric by correspondence searching and transformation estimation.
RANSAC registration.The RANSAC registration method remains the mainstream solution among various correspondence-based methods.Fischler et al. [15] proposed RANSAC to address model fitting problems, which shows significant effectiveness in the 3D registration task.However, in order to overcome the insufficiency of RANSAC in terms of efficiency and accuracy, a number of RANSAC variants [16][17][18][19][20] for 3D point cloud registration have been proposed.Rusu et al. [42] introduced the sample consensus-based initial alignment (SAC-IA) method, which employs a distance threshold to constrain the correspondence sample and uses the Huber penalty function for hypothesis evaluation.Yang et al. [43] developed a global distance metric in point cloud in assessing the accuracy of the estimated 6-DoF pose hypothesis.Chen et al. [44] designed a second-order spatial compatibility measurement (SC 2 ) to enhance the outlier rejection performance in correspondence sampling.To reduce RANSAC iterations, Yang et al. [22] proposed a guided correspondence sampling method that employs compatibility triangles as samples to speed up the sampling process in order to improve the registration efficiency and accuracy.Yang et al. [45] developed a mutual voting mechanism for high confidence correspondence ranking that involves graph construction among correspondences and has greatly improved inlier sampling performance.There are also twopoint and one-point based sampling consensus methods.For instance, Yang et al. [46] introduced a two-point based sample consensus approach with global constraint (2SAC-GC).Quan et al. [21] presented the compatibility-guided sampling consensus (CG-SAC) approach, which involves point normal information for hypothesis generation.For one-point based RANSAC, Quan et al. [47] proposed a globally constrained one-point based sample consensus method that samples one correspondence with local reference frames per iteration.A comprehensive survey of RANSAC methods for 3D registration can be found in [48].Nonetheless, we will experimentally show that most existing RANSAC methods still fail to achieve accurate registrations efficiently.
Learning-based registration.With the rapid development of deep learning, some trials have been completed toward performing registration with deep neural networks.Aoki et al. [26] proposed PointNetLK registration method, which employs PointNet [49] for local feature extraction.Bai et al. [24] designed PointDSC, which leverages a deep neural network based on spatial consistency for outlier rejection.Qin et al. [50] developed GeoTransformer, which can learn the geometric features of points and implement robust correspondence searching in low overlapping registration scenarios.Pais et al. [51] established a learning-based registration architecture called 3DReg-Net, which includes an inlier-outlier classification block and a pose regression registration block for registration.Jiang et al. [52] formulated outlier rejection as a classification problem and proposed involving the variational Bayesian long-range dependency property in aggregating the inlier and outlier correspondence distinction.Lombardi et al. [53] proposed a variant of 3DRegNet named 3DReg-i-Net to achieve efficient registration with a lightweight model.Although learning-based registration methods have achieved promising results, the lack of sufficient training data may limit their efficacy in many practical applications.
Other methods.Aiger et al. [54] proposed the 4PCS method, which uses congruent co-planar four points between two point clouds to generate the pose hypothesis.Drost et al. [55] proposed a method to model the description globally based on all orientated point pair features and used local correspondences to obtain the 6-DoF pose hypothesis.Guo et al. [56] proposed a cluster-based method that estimates the 6-DoF pose by projecting the rotation and the translation spaces independently.Buch et al. [57] proposed using one correspondence to vote in a rotational sub-group of 6-DoF, and the final 6-DoF pose is obtained by combining the rotation sub-group with the highest kernel density obtained from the voting.Tombari et al. [58] proposed a voting-based method to generate the best hypothesis by obtaining voters from a 3D Hough space.These correspondence-based methods typically rely on point orientations such as normals and local reference frames, which are shown to be sensitive to noise and limited overlap [48].
Consequently, we focus our research on the geometric RANSAC-fashion method.We propose a novel RANSACbased SLHG estimator.It constructs a graph based on a novel metric during correspondence sampling and incorporates mutual voting and global hypothesis generation in 6-DoF pose estimation, producing more accurate registrations with a few iterations.

Method
The pipeline of our method is represented in Fig. 1.It consists of four main steps: cluster construction, sampling locally, hypothesis globally, and hypothesis evaluation, where the last three steps are performed iteratively.These four stages play the following roles in our method.
1) Cluster construction.SLHG performs clustering for each correspondence based on the compatibility of each pair of correspondences, and the cluster with The pipeline of the proposed SLHG method.First, a cluster with the maximum number of correspondences that are compatible with each other is constructed.Second, a graph is constructed for the cluster and local guided sampling is performed in the graph.Third, for each local sample, we perform voting to mine globally consistent information and generate hypotheses from a global perspective.Fourth, each hypothesis is evaluated to determine the best one, which is the output of SLHG the largest size is retained.Coping with this cluster is supposed to be more efficient than dealing with the initial correspondence set.2) Sampling locally.The retained cluster will be modeled as a graph, in which local ternary loops serve as samples.In this step, we propose a new metric to rank ternary loops and guide the sampling process in order to reduce iterations.3) Hypothesis globally.For each ternary loop sample, we perform voting in the graph to mine more stable global information for hypothesis generation.After voting, the singular value decomposition (SVD) algorithm is used to generate the 6-DoF pose.4) Hypothesis evaluation.Finally, we use the mean absolute error (MAE) metric [59] to quantitatively evaluate the hypotheses and find the best one as the output.
To improve the readability, necessary notations are displayed in Table 1.The input of our method is the correspondence set C between the source point cloud P s and the target point cloud P t , when the output is a 6-DoF pose matrix M that aligns P s to P t .

Cluster construction
Given a set of initial feature correspondences C, we address it in two steps to generate the maximum cluster, as shown in Fig. 2. First, we quantitatively analyze the compatibility between every two correspondences.Given two correspondences c i and c j , the compatibility score between them is defined as: Illustration of the maximum cluster construction.First, each correspondence will find its compatible neighbors in the correspondence and form a cluster.Second, the cluster with the maximum number of correspondences is retained where d(c i , c j ) = | p s i -p s j -p t i -p t j | and d thresh is a distance threshold.We empirically set d thresh to 5 pr.Here, the unit 'pr' denotes the point cloud resolution, which is the average distance of each point in a point cloud to the closest neighbor to it.
Second, we make the assumption that inliers are only compatible with inliers, and inliers are supposed to form a cluster.Therefore, we establish a cluster s(c) for each correspondence c and ensure that elements in the cluster are compatible with c.Then, we rank all clusters based on the cluster size and keep the maximum one s max (c).The scale of the cluster is supposed to be smaller than that of C, while the inlier ratio is usually greater (verified in Sect.4).

Sampling locally
In this step, we perform local sampling as follows.First, the maximum cluster s max (c) is modeled as an undirected graph G. Specifically, in G, nodes denote correspondences and edge link compatible nodes based on Eq. ( 1).Second, we serve local ternary loops in the graph as samples, be- In particular, we can mine the following four metrics to evaluate the priority of a ternary loop in G, as illustrated in Fig. 3.
1) Degree metric: the sum of vertex degrees of a ternary loop.2) Compatibility metric: the sum of the compatibility scores of all edges in a ternary loop.3) Congruence metric: the congruent degree between the source and the target spatial triangles in two point clouds formed by a ternary loop.4) Area metric: the area difference of the source and the target spatial triangles in two point clouds formed by a ternary loop.Here, we propose a spatial consensus metric (SCM) by combining the congruence metric and compatibility metric as: Intuitively, SCM combines constraints from the compatibility space (graph G) and the spatial space (the source and target point clouds), and achieves a balance.Moreover, by independently checking the performance of the four metrics, the compatibility metric and congruence metric are more stable.Then, we rank all ternary loops in G to achieve guided sampling.

Hypothesis globally
The ternary loop only contains local information.In some scenarios, we find that correct ternary loops (consisting of three inliers) do not always yield correct 6-DoF poses.In particular, we sample approximately 4000 correct ternary loops from the UWA 3D modeling dataset [27] and generate hypotheses based on these ternary loops to perform registration.As shown in Fig. 4(a), most of them are not able to achieve accurate registration, and more than 20% of the correct ternary loops fail to achieve successful registrations with a loose judging threshold, i.e., the threshold equals 10 pr.This can be explained by the fact that small scale and nearly co-planar ternary loops, which even consist of inliers, often fail to generate correct hypotheses.This motivates us to gather global information for more accurate hypothesis generation.
Specifically, global information is gathered via a voting process.Let the ternary loop l ijk = (c i , c j , c k ) be a candidate, voters for l ijk are those ternary loops sharing at least one common node with l ijk .As such, each candidate will have a voter set L ijk and the voters in L ijk are ordered in a descending order based on the SCM score.Next, the correspondence set C ijk incorporating global information will be generated for l ijk as C ijk = {c | c ∈ l, l ∈ L ijk }.The confidence score for C ijk is given as: where n = max{|L ijk |, V thresh }, and V thresh is a threshold to limit the number of voters in case that bottom-ranked voters are involved.After voting, we rank all C ijk by their confidence scores in a descending order and compute the hypotheses via the SVD algorithm.
Notably, in low inlier ratio cases, we find that even topranked voters may turn out unreliable.Therefore, we rely on a threshold T thresh representing the number of ternary loops in the graph G to address this issue.More specifically, when the number of ternary loops in G is smaller than T thresh , we judge it as an inlier ratio case and compute the confidence score of C ijk as s(C ijk ) = SCM(l ijk ).

Hypothesis evaluation
We employ the efficient mean absolute error (MAE) metric [59] to evaluate all hypotheses, which is defined as: where e(c j ) = R i p s j + t i -p t j represents the transformation error of c j .After N iter iterations, the hypothesis yielding the highest MAE score is retained to perform registration.
In summary, our method is supposed to be efficient and accurate (verified in Sect.4) for the following reasons: 1) a cluster is extracted from the initial set to reduce the input size while increasing the inlier ratio; 2) guided local sampling can reduce the iteration counts; and 3) global hypothesis generation can gather more convincing global information.

Experimental results
This section demonstrates the effectiveness and superiority of our proposed SLHG method in detail through various experiments.

Figure 5 Exemplar views of point cloud pairs in four experimental datasets
for object registration, 3DMatch [28] and 3DLoMatch [60] for indoor scene registration (3DLoMatch only includes point cloud pairs with less than 30% overlap), and the Realworld Scans with Small Overlap (RESSO-LO) [29] for large outdoor scene registration.The main properties of these four datasets are shown in Table 2 and exemplar views of point cloud pairs in these datasets are displayed in Fig. 5.One can see that the experimental datasets cover a variety of data modalities and application scenarios.
Metrics.We evaluate the performance of our method by computing the registration accuracy (RA).In our experiment, the root mean square error (RMSE) with point cloud resolution (pr) is used for evaluation on the U3M and RESSO-LO datasets as in [22], and the rotation error (RE) and translation error (TE) are used for evaluation on the 3DMatch and 3DLoMatch datasets according to [24].For the RMSE error metric, given a pair of corresponding points c i = (p s , p t ), their registration error is defined as: and RMSE is calculated based on point-wise error (p s , p t ) as: where R gt and t gt represent the ground-truth rotation matrix and translation vector, respectively; C gt represents the ground-truth set of corresponding points between two point clouds.Under this error metric, we define a registration result as correct when its RMSE error is lower than a judging threshold t rmse .
For the rotation error and the translation error, we compute the them according to R gt and t gt as: where R est and t est represent estimated rotation matrix and translation vector, respectively; (R est ) and (t est ) represent the rotation error and the translation error, respectively.
RA is then given as the ratio of successfully registered point clouds in a dataset.
Implementation details.Our method is implemented based on the point cloud library (PCL) using C++ programming language.For U3M and RESSO-LO, we use the Harris3D keypoint (H3D) detector and the signatures of histograms of orientations (SHOT) descriptor for initial correspondence generation as in [22].For the 3DMatch and 3DLoMatch datasets, we use the fast point feature

Method analysis
We conduct a series of experiments to verify the rationality of each stage of SLHG and analyze key parameters.These experiments are deployed on the U3M dataset.

The benefit of cluster construction
To validate the necessity of cluster construction in the first stage of SLHG, we statistically calculate the correspondence counts and inlier ratios for each point cloud pair with and without cluster construction on the U3M dataset in Fig. 6.

Figure 8 The registration accuracy results achieved by five ternary loop scoring metrics
As seen from Fig. 6, the size of the correspondence set decreases significantly on each pair of point clouds, while the inlier ratio increases dramatically on most point cloud pairs.Overall, the average number of correspondences decreases from 1877 to 390 and the average inlier ratio rises from 0.07 to 0.22.This indicates that cluster construction can reduce the input size and increase the inlier ratios simultaneously, which is beneficial for efficient and accurate 3D registration.

Ternary loop scoring metric analysis
Guided local sampling is based on the ternary loop scoring metric.We first check the performance of the four metrics introduced in Sect.3.2, i.e., "compatibility", "degree", "congruence" and "area", and then compare SCM with them to demonstrate its rationality.The performance is checked concerning two terms: 1) the correct rate of top-N iter ternary loops ranked by examined metrics; 2) the registration accuracy with top-N iter ternary loops ranked by examined metrics.Here, a ternary loop is defined as correct if it consists of three correct correspondences.
For correct ternary loop sampling results as shown in Fig. 7, it can be observed that SCM is not the best performer.However, regarding the registration performance as shown in Fig. 8, SCM achieves the best overall performance under different iterations and RMSE thresholds.The results indicate two points.First, correct ternary loops are not guaranteed to achieve correct registrations, as explained previously in Fig. 4. Second, SCM effectively combines constraints from the compatibility space and the spatial space, resulting in reliable guided sampling results.

The rationality of hypothesis globally
To validate the rationality of the hypothesis globally, we conducted a comparative experiment for hypothesis locally and hypothesis globally.Here, hypothesis locally indicates computing the hypothesis with a single ternary loop.The two parameters in the hypothesis globally stage are also discussed.The registration results are presented in Table 3.
The results suggest that hypothesis globally clearly can align more point cloud pairs, especially under strict RMSE thresholds.This demonstrates that hypothesis globally can achieve more accurate registrations.In addition, T thresh = 1000 and V thresh = 20 achieve the best overall performance and we keep this as the default parameter setting.

Varying iteration count
To demonstrate that SLHG can achieve good registration performance with a few iterations, we vary the number of iterations N iter to check the performance variation.The results are presented in Table 4.
According to the results, it can be found that our method can achieve a remarkable registration accuracy with only 5 iterations.In addition, by observing the registration results under different iterations varying from 20 to 200, it can be seen that our proposed SLHG method can achieve pleasurable performance with 20 iterations and the performance gain becomes marginal as N iter further increases.Thus, we set N iter to 20 for SLHG by default.

Registration accuracy results
To further verify the effectiveness and accuracy of SLHG, we test it on four experimental datasets addressing different application scenarios.Only the 3DMatch and 3DLo-Figure 9 The registration accuracy achieved by eight methods is compared in 20 iterations with a 5 pr RMSE error Match datasets have benchmark records of deep learning methods and we will compare SLHG with learning-based methods on these two datasets.

U3M
The registration accuracy results under different RMSE thresholds on the object-scale U3M dataset are shown in Fig. 9.
It can be seen from Fig. 9 that SLHG and SAC-COT are the two top-ranked methods on this dataset.The two methods outperform most competitors by a clear margin.In particular, when the RMSE threshold is smaller than 2 pr, our SLHG method still outperforms SAC-COT, indicating that SLHG can achieve very accurate registrations.

3DMatch & 3DLoMatch
On the 3DMatch and 3DLoMatch datasets, we use FPFH descriptors to generate initial correspondences with ap-proximately 1k scale.In some scenarios, outlier rejection is employed before 6-DoF pose estimation to perform registration.As such, we test two cases for SLHG: 1) the initial correspondences are directly fed to SLHG; 2) outlier rejection is performed prior to SLHG (we employ PoinDSC for outlier rejection).In addition, we test methods under two thresholding conditions: 1) loose threshold, i.e., the rotation error is within 15 • and translation error is within 30 cm; 2) tight threshold, i.e., the rotation error is within 10 • and translation error is within 10 cm.The results of the two datasets are shown in Table 5, Table 6, Table 7 and Table 8.
On the 3DMatch dataset, as can be seen from Table 5 and Table 6, SLHG + PointDSC achieves the best registration performance with both loose-threshold and tightthreshold methods.Impressively, even without the outlier rejection module, SLHG outperforms a state-of-the-art learning method, i.e., PointDSC, under a tight threshold.Moreover, we find that our method with only 20 iterations surpasses RANSAC with 10k iterations by a significant margin.
On the 3DLoMatch dataset, as suggested by Table 7 and Table 8, the performance of all methods drops significantly.However, PointDSC + SLHG still achieves the best performance.It also demonstrates that our method can effectively boost the performance of existing correspondencebased methods.
To demonstrate that our method is capable of achieving very accurate results, we present more detailed results in Table 9 and Table 10.The results suggest that our method can achieve smaller RE and TE than the compared methods.This can be explained that hypothesis globally incorporating more stable global information can bring more accurate pose hypotheses.

RESSO-LO
Finally, on the large-scale outdoor RESSO-LO dataset, we follow [22] and generate correspondences with different inlier rates as inputs for the tested methods.Note that some methods are ultra-consuming on this dataset, such as OSAC, Go-ICP and PPF, and they are not included for comparison here.The results are shown in Fig. 10.It can be seen from Fig. 10 that SLHG is the best performer on all tested conditions.The superiority of SLHG still remains clear even with only 5% inlier rate.

Visualization
In Fig. 11, we visualize the registration results of the proposed SLHG method and two competitors.
Fig. 11 suggests that the registration results achieved by SLHG are clearly more accurate than those achieved by GC1SAC and RANSAC.The results are consistent across different datasets, indicating the good generalization ability of SLHG.

Efficiency
In Fig. 12, we report the efficiency results by different methods with different scales of input correspondences.Learning-based methods are not compared here because our method as well as compared methods here are implemented with CPU only without parallel computing.
First, one can see that SLHG is less efficient than RAN-SAC, SAC-IA, CG-SAC and 1PRANSAC.This is because SLHG additionally performs cluster construction and voting during the hypothesis generation stage.However, SLHG achieves significantly better registration results.Second, SL-HG still can finish 6-DoF pose estimation with around 10 ms given 100 correspondences, which can already satisfy the requirement of real-time applications.Overall, SLHG strikes a good balance in terms of efficiency, accuracy, and robustness.

Conclusions
In this paper, we present a RANSAC variant called SLHG for 6-DoF pose estimation and point cloud registration.The key idea is performing guided local sampling to reduce iterations while hypothesizing globally to increase registration accuracy.Experiments on the object, indoor, and outdoor datasets as well as comparison with both geometry-based and learning-based methods verify the overall superiority of our method.
In the future, we plan to explore a more stable voting scheme to further reduce the impacts of heavy outliers so as to improve the performance in cases with extreme low inlier ratios.

Figure 1
Figure 1The pipeline of the proposed SLHG method.First, a cluster with the maximum number of correspondences that are compatible with each other is constructed.Second, a graph is constructed for the cluster and local guided sampling is performed in the graph.Third, for each local sample, we perform voting to mine globally consistent information and generate hypotheses from a global perspective.Fourth, each hypothesis is evaluated to determine the best one, which is the output of SLHG

Figure 3
Figure 3 Illustration of four metrics for evaluating a ternary loop in a graph

Figure 4
Figure 4 Illustration of the limitation of hypothesis locally.(a) The registration accuracy achieved by 6-DoF poses generated using correct local samples; (b) and (c) present two exemplar cases that local samples often result in inaccurate registrations

Figure 6 Figure 7
Figure 6Comparison of the initial correspondence set and the max cluster in terms of correspondence count and inlier ratio

Figure 10 Figure 11 Figure 12
Figure 10 Results on the RESSO-LO dataset with different inlier rates

Table 1 Notations
s ∈ P s : point in source point cloud p t p t ∈ P t : point in target point cloud c c i ∈ R 6 : a pair correspondence of p s and p t

Table 2
Experimental datasets and their properties

Table 3
Comparison of the numbers of successfully registered point clouds with and without hypotheses globally on the U3M dataset

Table 4
The numbers of successfully registered point cloud pairs by SLHG with different iterations on the U3M dataset

Table 5
Registration accuracy performance (%) with loose thresholds on the 3DMatch dataset (the best and the second best results are shown in bold and underlined, respectively)

Table 6
Registration accuracy performance (%) with tight thresholds on the 3DMatch dataset

Table 7
Registration accuracy performance (%) with loose thresholds on the 3DLoMatch dataset

Table 8
Registration accuracy performance (%) with tight thresholds on the 3DLoMatch dataset

Table 9
Registration performance with loose thresholds on the 3DMatch dataset

Table 10
Registration performance with loose thresholds on the 3DLoMatch dataset