Modeling spatial uncertainty of point features in featurebased RGBD SLAM
 1.2k Downloads
Abstract
This paper deals with the problem of modeling spatial uncertainty of point features in featurebased RGBD SLAM. Although the featurebased approach to SLAM is very popular, in the case of systems using RGBD data the problem of explicit uncertainty modeling is largely neglected in the implementations. Therefore, we investigate the influence of the uncertainty models of point features on the accuracy of the estimated trajectory and map. We focus on the recent SLAM formulation employing factor graph optimization. Unlike some visual SLAM systems employing factor graph optimization that minimize the reprojection errors of features, we explicitly use depth measurements and minimize the errors in the 3D space. The paper analyzes the impact of the information matrices used in factor graph optimization on the achieved accuracy. We introduce three different models of point feature spatial uncertainty. Then, applying the most simple model, we demonstrate in simulations how important is the influence of the spatial uncertainty model on the graph optimization results in an idealized SLAM system with perfect feature matching. A novel software tool allows us to visualize the statistical behavior of the features over time in a real SLAM system. This enables the analysis of the distribution of feature measurements employing synthetic RGBD data processed in an actual SLAM pipeline. Finally, we show on publicly available real RGBD datasets how an uncertainty model, which reflects the properties of the RGBD sensor and the image processing pipeline, improves the accuracy of sensor trajectory estimation.
Keywords
SLAM Uncertainty model Bundle adjustment Factor graph optimization1 Introduction
1.1 Motivation
The compact and affordable RGBD sensors based on structured light, such as PrimeSense Carmine, Microsoft Kinect and Asus Xtion, fostered the progress in 3D visual odometry (VO) and simultaneous localization and mapping (SLAM). Visual odometry computes the sensor motion between selected frames of the RGBD input and recovers the trajectory [37]. However, the trajectory recovered using the frametoframe motion estimation has an unavoidable drift, as there are no constraints that enforce the global consistency of sensor motion. Therefore, the VO pipeline treated as a frontend for RGBD data processing is often paired with an optimization engine (called backend) to form a SLAM system, which yields globally consistent trajectories. Typically, the backend postprocesses a posegraph, whose vertices correspond to the sensor poses, whereas its edges represent motion constraints between these poses. Point features are often employed for frametoframe motion estimation in posebased RGBD SLAM and VO systems [9, 18].
Performance depends on the configuration of the image processing module (frontend) and constraints management strategy (backend) [1]. The posebased approach keeps only the relative posetopose constraints, marginalizing out the actual 3D point features; hence, it cannot improve the estimation of motion using the large number of featuretopose correspondences established by the frontend. On the other hand, keeping the point features and solving the Structure from Motion (SfM) problem defined as nonlinear least squares optimization allows obtaining very precise sensor trajectories [28]. Featuretopose measurements can be used to find coordinates of features and relative sensor motion using the Bundle Adjustment (BA) approach [42]. This approach is applied in the most successful visual SLAM systems: PTAM [21], ORB SLAM [29] and ORBSLAM2 [30] to obtain accurate trajectories of the camera. Also RGBD SLAM systems use the graphbased optimization framework to integrate all featuretopose measurements [12, 41].
In this research, we also employ factor graph optimization, but instead of minimization of the reprojection error onto images, as in PTAM [21] and ORBSLAM [29], we directly define the feature position error in the 3D space. Our approach to RGBD SLAM exploits to a greater extent the depth measurements, and thus, it seems to be a better vehicle to demonstrate the role of modeling the uncertainty of RGBD measurements than other systems of similar architecture, such as ORBSLAM2 [30], which employs triangulation of feature positions and uses reprojection errors even for RGBD data containing dense depth frames. Most of the published RGBD graphbased SLAM research neglect the role of the measurement uncertainty model. The information matrices used to represent uncertainty in graphbased optimization are commonly set to identity, which means an equal importance of each measurement and isotropic spatial uncertainty [9, 12, 30]. From the literature, we know that the depth measurement accuracy depends on the measured distance [13, 20]. Thus, the simplified approach with the uniform distribution of measurements is not justified by the underlying physics of the measurements in RGBD cameras. Therefore, in this work, we investigate how to improve the accuracy of sensor trajectory estimation by explicitly modeling spatial uncertainty of the point features.
1.2 Problem statement
In the SLAM problem, we consider a RGBD sensor moving freely in the environment and measuring the position of each detected point feature. The factor graph representation consists of vertices representing both the 3D features and the sensor poses. Edges represent measurements between the poses and the features, or between two poses. The quality (or importance) of each measurement is represented by an information matrix. The information matrix can be computed by inverting the covariance matrix of the measurement. Thus, more accurate measurements produce stronger constraints in graph optimization [24].
This paper presents methods to compute the information matrices on the basis of the measurement model of the RGBD sensor, taking into account the additional uncertainty introduced by keypoint detection on the RGB images and dependencies between the geometric structure of the scene, and the resulting spatial uncertainty of the features. Hence, the elaborated uncertainty models we introduce try to capture the spatial uncertainty of point features resulting from the whole processing pipeline in the SLAM frontend.
1.3 Contribution

we introduce a new uncertainty analysis methodology based on software tools that allow us to simulate RGBD SLAM systems and to analyze the behavior of point features; these tools, in turn, make it possible to understand the nature of the spatial uncertainty of the point features in RGBD SLAM;

we propose mathematical uncertainty models for point features in RGBD SLAM, which are based upon the investigations using the new methodology;

we verify the suitability of the proposed uncertainty models on real RGBD benchmark data.
As we argued in [1], there is a broad diversity of SLAM architectures and implementation details, even if we consider only featurebased systems. Therefore, it is impossible to address the features uncertainty modeling for a generic RGBD SLAM architecture, as such an architecture hardly exists. We focus on the architectures employing the BA concept, with the structure of the factor graph (in the backend) based on the featuretopose constraints (Fig. 1). In the recent literature [30], this approach is considered as superior to the (perhaps more popular) posebased SLAM architecture [9]. Hence, the contributed methodology should be helpful for researchers that want to improve the performance of the recent SLAM architectures. The PUT SLAM, which we use here, belongs to this family of BAbased systems, but was designed for Kinectlike RGBD sensors, and its backend optimization procedure minimizes the Euclidean distance errors in feature positions. This approach, different from the more commonly used feature reprojection error in the image plane, is motivated by the fact that the spatial uncertainty of RGBD sensors can be modeled in the Euclidean space regardless of the depth measurement principle (i.e., active stereo vision or timeofflight), which potentially makes the methodology proposed in this article universal with respect to the first and second generation of the RGBD sensors [23].
The remainder of this paper is organized as follows: Section 2 presents the most relevant previous work in the areas of uncertainty modeling for RGBD sensors and the uncertainty models in featurebased SLAM, while Sect. 3 details our PUT SLAM system, used through the paper as a reference architecture and a tool to investigate the uncertainty of features. The three approaches to modeling the uncertainty of point features in RGBD SLAM are introduced in Sect. 4, and followed by the description of the simulation and visualization tools in Sect. 5. These tools are used to investigate the behavior of the point features depending on the uncertainty model and the noise characteristics of the RGBD data. A quantitative evaluation of the results of applying the proposed uncertainty models in PUT SLAM on synthetic RGBD data is provided in Sect. 6, while Sect. 7 demonstrates how the selected best model influences the accuracy of sensor trajectory estimation in PUT SLAM tested on real RGBD data from benchmark sequences. Section 8 concludes the paper and sets outlook on the future research directions.
2 Related work
The uncertainty of depth and combined RGBdepth measurements from sensors based on the PrimeSense structured light technology was investigated in a number of recent papers. Khoshelham and Elberink [20] studied the accuracy and resolution of depth data from Kinect sensor. GonzalezJorge et al. [14] demonstrated that the metrological characteristics in terms of accuracy and precision are almost independent on the type of sensor (Kinect or Xtion) due to the use of the same PrimeSense depth camera. Recently, research on the noise characteristics in Kinect v1 depth data was surveyed in [25]. The firstgeneration RGBD sensors have been also compared to the Microsoft Kinect v2 based on the timeofflight principle [13]. This research revealed that correlation used in the PrimeSense technology to compare the observed pattern of “speckles” to a reference pattern creates dependency inbetween pixels in the depth images, which in turn causes errors in the range measurements. The papers dealing with uncertainty in RGBD sensors focus mostly on applications outside of VO/SLAM, but Park et al. [36] proposed a mathematical uncertainty model for Kinect v1 sensor and RGBD sparse point features. However, the approach of [36] was demonstrated without an application in real SLAM or VO. Although the structured light depth measurement principle in the most common RGBD sensors may be considered as active stereo vision, the existing literature on uncertainty modeling in stereo vision considers mainly systems, in which the stereo matching is applied to discrete features, and no dense depth map is created [7, 27]. An exception is the older work of Matthies and Shafer [26], who applied 3D Gaussians to model the measurement errors in the discrete digital images, and demonstrated that propagating the uncertainty in the form of covariance matrices enables reduction in uncertainty in the localization task.
In spite of these results, up to now there is little work being done on the utilization of physical characteristics of RGBD sensors in VO/SLAM. In contrast, the feature uncertainty modeling is widely used in SLAM research employing the extended Kalman filter. In these frameworks, the covariance from the sensor measurement model is propagated to feature model for both 2D laser sensors [39] and 3D vision [6]. In the work of Oskiper et al. [34] the uncertainty of the observed features (called “landmarks”) is explicitly modeled taking into account the stereo imagery processing pipeline and using the method from [26]. However, as far as we know, only Dryanovski et al. [8] formulated an uncertainty model of point features used to register a Kinect sensor pose in relation to a map of features which are estimated using Kalman filter. This model, based on the Gaussian mixture, was motivated by experimental assessment of the Kinect sensor depth measurements uncertainty. In graphbased SLAM, Endres et al. [9] applied the depth measurement model from [20] in the motion estimates verification procedure for their featurebased posegraph RGBD SLAM. The possibility of using the Mahalanobis distance that takes into account the uncertainty instead of the Euclidean distance in the posetopose motion estimate computation is also mentioned in [9], but the paper provides no clear description of the uncertainty model being used. Conversely, the featurebased RGBD visual odometry system presented in [18] minimizes feature reprojection error in the image space in order to compute the posetopose motion estimates. This approach implicitly takes into account the fact that the uncertainty increases with range. Nguyen et al. [31] applied a depth uncertainty model of Kinect in a VO system that employs dense depth data. Although the improved accuracy of the recovered trajectories was shown in [31], and the dense depthbased approach generally achieves impressive results in terms of environment map reconstruction [44], it cannot model uncertainty of all the individual range measurements and then propagate this uncertainty to the dense map.
The RGBD SLAM formulation used in this paper is similar to the structure from motion problem in computer vision, which is commonly solved applying the Bundle Adjustment (BA) [42]. Advanced, keyframebased BA variants, such like the one implemented in PTAM [21], may be used for online mapping and motion estimation in robotics [4]. However, while there were some efforts to incorporate the uncertainty of feature points in the BA, much of the computer vision literature simply assumes onepixel Gaussian noise in the location of features [17]. For example, Konolige and Agrawal [22] use such uncertainty model in the posebased SLAM utilizing visual imagery. Ozog and Eustice [35] demonstrate that accounting for the uncertainty of the relativepose transformation, computed using the Haralick’s method [16], in the twoview sparse BA improves the relative motion estimation between two image frames. Our approach can be considered similar to BA because the trajectory of the sensor and feature position are simultaneously optimized over sensor measurements. Such an approach is not only superior to the posebased RGBD SLAM formulation [9], but also gives better accuracy and robustness than dense/direct methods in visual SLAM [11] and RGBD SLAM [19]. The direct methods are more prone to image distortions and artifacts due to such factors as the rolling shutter or automatic white balance because they need to model the whole image acquisition process that influences the pixel intensities [10]. Conversely, the featurebased approach employed in PUT SLAM enables this system to take great advantage from modeling of the uncertainty, because the uncertainty of features directly influences the strength of the constraints in optimization. As so far, other featurebased RGBD SLAM systems of similar architecture, such as [12] and [41], did not attempt to model this uncertainty.
3 RGBD SLAM with a map of features
We consider spatial uncertainty models of point features treating the SLAM algorithm itself as a “black box” that processes the measurements (constraints) and depends on the provided description of the “importance” of these measurements in the form of uncertainty model. Although the general structure of the BAbased SLAM algorithms is similar with respect to the main data processing components, there is no SLAM standard, generic architecture. Therefore, we describe here the architecture of our PUT SLAM system [2, 3]. This brief description should make it easier to understand some of the mechanisms that are responsible for the spatial uncertainty of the features (e.g., matching and RANSAC), but is also necessary to introduce the software used to investigate the behavior of features, which is based on PUT SLAM (see Sect. 5).
The accuracy of each featuretopose is represented in the factor graph by an information matrix \({\varvec{\Omega }}_{i,j}^{t}\). The information matrix can be obtained by inverting the covariance matrix of the measurement. The information matrix \({ \varvec{\Omega }}_{i,j}^{o}\) for the posetopose edge in the factor graph is set to an identity matrix, as in [9].
3.1 SLAM frontend
We implemented a frontend of the SLAM algorithm to verify our method on reallife RGBD sequences. We use standard procedures for feature detection, description [38], matching, and tracking [1]. The frontend starts from detection of salient features on the RGBD frame. The set of detected features is used to estimate frametoframe sensor displacement. The PCG solver used in the backend requires a good initial guess. Therefore, we provide a reliable sensor displacement guess from the VO pipeline. We implemented a fast VO algorithm [33], which is independent of the map structure. In our investigations, we consider two configurations of the VO pipeline. In the first one, the associations between two consecutive RGB frames are found using SURF descriptors. However, the SURF descriptors are slow to compute and match [38]. Thus, we alternatively implemented fast sparse optical flow tracking with the Kanade– Lucas–Tomasi (KLT) algorithm [33]. In this case, ORB keypoint detector is used. Regardless of the method used to establish associations between features belonging to two frames in the sequence, the SE(3) transformation is computed from the paired 3D points. To estimate the camera motion from this set of paired features, we apply the Umeyama algorithm [43] and preemptive RANSAC to remove outliers. The camera pose estimated from VO pipeline is used as an initial guess for the camera pose in the graph optimization. The constraints between new camera pose and features in the map are obtained by matching of features from the last frame and features projected from the map. Again, we use RANSAC, to determine the set of inliers. The features that are observed, but cannot be associated with the features projected from the map are added to the map, extending the environment model to newly discovered areas.
4 Spatial uncertainty modeling
For instance, the distribution of feature measurements is significantly influenced by the RANSAC procedure, which is used in the frontend to remove outlier matches. The influence of RANSAC on the spatial uncertainty of measurements is presented in Fig. 4. The set of features is observed from two different sensor poses (Fig. 4a, b). The uncertainty of measurements is modeled using an anisotropic model. However, if RANSAC is used to remove outliers, as in the frontend of PUT SLAM, the distribution of measurements is changed (Fig. 4c). In RANSAC, the Umeyama algorithm [43] is used to find a transformation between two sensor poses. Outlier measurements which are not consistent with the found transformation are removed from the set of inliers (feature \(\mathbf{f}_3\) in Fig. 4c). The RANSAC outlier threshold is defined here as the Euclidean distance between the expected and measured position of the feature. Thus, all the inlier measurements are inside a sphere defined by the RANSAC outlier threshold. In this case, the uncertainty can be modeled also as a sphere, and the information matrix for each feature should be set to the identity matrix in the graph.
In these investigations, we use the synthetic ICLNUIM data set [15], as we need perfect ground truth sensor trajectories to isolate the spatial uncertainty introduced by processing the RGBD frames in the frontend. Moreover, the ICLNUIM offers an insight into the nature of the noise introduced into the sensory data. Hence, using this dataset it is possible to draw clear conclusions as to the dependencies between the characteristics of the sensory data uncertainty and the behavior of the uncertainty model.
4.1 Uncertainty propagation from the sensor model
4.2 Normalbased uncertainty model
4.3 Gradientbased uncertainty model
The \(\mathbf{C}_n\) uncertainty model leverages the role of visual localization of the point features in the RGB frames. However, the uncertainty of the feature location depends not only on the performance of the features detector implemented in the SLAM frontend pipeline and the quality of the RGB images, but also on the local structure of the scene. During visual analysis of the distribution of point features, we observed that features located in the vicinity of strong intensity gradients (photometric edges) slide along lines defined by these gradients. Thus, we propose another uncertainty model, which includes the observed behavior. In the proposed uncertainty model, the major axis of the ellipsoid is located along a photometric edge defined by strong intensity gradient in the RGB image (Fig. 7b and compare Fig. 10b). To compute the uncertainty matrix, we detect the RGB edge using a 3\(\times \)3 Scharr kernel. The direction of the edge in 3D space is computed using the depth data. Then, the procedure is similar to the procedure presented for the normalbased uncertainty model. We construct the rotation matrix \(\mathbf{R}\) representing the local coordinate system. The z axis is related to the RGB gradient vector (Fig. 7a). The x axis of the coordinate system is located on the RGB edge (Fig. 7a and b). The RGB edge might be related to the edge of an object or a photometric edge on a flat surface. The covariance matrix \(\mathbf{C}_g\) is computed using (6). We scale the x, y and z axes of the ellipsoid using the scaling matrix \(\mathbf{S}\). Gradientbased uncertainty ellipsoids (according to the \(\mathbf{C}_g\)model) for selected 3D point features are presented in Fig. 7c.
5 Tools that give insight into the uncertainty of features
5.1 Experiments in Simulation
ATE and RPE for the frontview sensor experiment
Numerical results (ATE and positional RPE) obtained in the ICLNUIM office/kt1 environment
An experiment in simulation allows us not only to assess the accuracy of the trajectories but also to compare the trajectories and featurebased maps obtained with and without the uncertainty model. We use simulations to confirm the hypothesis, that modeling of spatial uncertainty in featurebased PUT SLAM improves the accuracy. To obtain statistics, we run the simulation 100 times for each investigated configuration. The quantitative results are computed according to the methodology introduced in [41], using the Absolute Trajectory Error (ATE) and Relative Pose Error (RPE) metrics. The distance between corresponding points of the estimated and ground truth trajectories is measured by ATE, whereas the RPE metric reveals the local drift of the trajectory. We can obtain the translational or rotational RPE, taking, respectively, the translational or rotational part of the homogeneous matrix computed when comparing the sensor poses along the trajectories. As ATE compares absolute distances between two rigid sets of points representing the estimated and the ground truth trajectory, it is always expressed as a metric distance. With the uncertainty model, the ATE error is 4 times smaller while the RPE error is about 4.5 times smaller (Table 1). Details of the simpler localization systems: VO and posebased SLAM included in Table 1 for comparison have been described in [1], while the approach used to run these methods in the simulation was the same as for PUT SLAM.
It is worthwhile noticing that even though the error is smaller for PUT SLAM with uncertainty model in comparison with PUT SLAM without uncertainty modeling, the standard deviation is greatly higher. In the series of 100 trials, the one solution from 100 optimization procedures is incorrect. One camera pose from the whole trajectory optimized by g\(^2\)o is incorrect, and the mean ATE error of the trajectory is significantly higher [5]. We notice this behavior when the camera observes only a flat wall and the number of posetofeatures measurements is small. In this case, the graphbased backend sometimes finds a solution, which is incorrect but satisfies measurements from the given camera pose. This situation is difficult to detect in practice. Thus, whenever the number of measurements is smaller than a threshold an additional posetopose constraint is added to the graph to stabilize the optimization process in the backend.
In the next simulation, we use the kt1 sensor trajectory from the ICLNUIM dataset (Fig. 9) to confirm the results in a more realistic scenario. The positions of features are obtained offline from the PUT SLAM frontend processing the ICLNUIM office sequence of RGBD frames. The 86 extracted features are fixed in the simulator and augmented by unique IDs that enable the simulator to perfectly associate the observed features to the map. The results summarized in Table 2 confirm that the featurebased PUT SLAM yields more accurate trajectory estimate than the posebased SLAM implemented as in [1]. Moreover, the anisotropic feature uncertainty model allows PUT SLAM to achieve much smaller ATE than it was possible using identity matrices.
5.2 Reverse SLAM: looking closer at the features
We demonstrated in the simulation that an anisotropic spatial uncertainty model of features improves significantly the accuracy of estimation in factor graphbased SLAM. However, we found it difficult to apply the approach to uncertainty modeling proposed by Park et al. [36] in the actual PUT SLAM working with real data. When the \(\mathbf{C}_p\)model is used, the optimization in g\(^2\)o becomes unstable or returns worse results than the optimization with identity matrices. This suggests that the \(\mathbf{C}_p\) covariance matrix does not represent correctly the uncertainty of real measurements.
In practice, the uncertainty of features is also influenced by the tracking/matching algorithm [33]. Therefore, in order to investigate how the proposed uncertainty model fits to the distribution of real measurements we have developed the reverse SLAM tool.^{2} In the reverse SLAM tool, we are looking for the distribution of feature measurements that is obtained in the frontend when SLAM returns a perfect trajectory. Thus, we move the sensor along the known trajectory and run the frontend of the SLAM. At each frame, we provide the real position of the camera instead of its estimate. Finally, the sensor noise and image processing errors accumulate in the representation of features, not in the trajectory error. Assuming that an accurate ground truth trajectory is available, a distribution of measurements (3D positions of features) for each feature in the map can be computed. Unfortunately, datasets obtained using real RGBD sensors and external motion capture techniques typically suffer from synchronization issues between the RGBD data and the ground truth trajectory. For instance, in the TUM RGBD Benchmark [41] data from the Vicon motion capture system, and data from the Kinect sensor are not well synchronized due to the different sampling frequencies [23]. This dataset cannot be used in reverse SLAM tool because errors that usually manifest themselves only when computing benchmarking values for the particular trajectory [30], in reverse SLAM may change significantly the distribution of the features, hence leading to wrong conclusions as to the uncertainty model. Therefore, we use the synthetic ICLNUIM dataset [15] in experiments with reverse SLAM. This dataset provides not only perfect ground truth trajectories of the sensor, but also perfect synchronization between the RGBD frames and trajectory points.
The output from the reverse SLAM tool gives us the information about the real distribution of feature measurements in PUT SLAM. The uncertainty model obtained from the reverse SLAM tool contains not only information about the sensor noise but also about the accuracy of image processing and RANSACbased outliers rejection in the frontend.
Using the reverse SLAM tool, we can analyze the distribution of measurements in the 3D space. The example distributions of measurements are presented in Fig. 10. Using the visualization tool, we can analyze the distribution of measurements for each feature in the map. We noticed that some measurements are located on the surfaces of the objects (Fig. 10a). We can also compute statistical properties of the distribution. The standard deviation of measurements in z axis (normal to the surface) is 0.005 m which is 62% smaller than in y axis and 130% smaller than in x axis. Another type of distribution is presented in Fig. 10b. In this case, the measured positions of the selected feature are located along the computer cable. The standard deviation of measurements along the y axis is 39% larger than in x axis and 96% larger than in the z axis.
Statistical parameters of feature measurements in the ICLNUIM office/kt1 sequence
Config.  Model  \(\sigma _x\) (m)  \(\sigma _y\) (m)  \(\sigma _z\) (m)  \(d_\varepsilon \) 

Matching no noise  \(\mathbf{C}_n\)  0.0137  0.0125  0.0081  0.38 
\(\mathbf{C}_g\)  0.0107  0.0106  0.0129  \( 0.21\)  
Tracking no noise  \(\mathbf{C}_n\)  0.0097  0.0100  0.0080  0.19 
\(\mathbf{C}_g\)  0.0085  0.0092  0.0101  \( 0.14\)  
Matching with noise  \(\mathbf{C}_n\)  0.0138  0.0134  0.0118  0.13 
\(\mathbf{C}_g\)  0.0124  0.0121  0.0138  \(0.13\)  
Tracking with noise  \(\mathbf{C}_n\)  0.0112  0.0119  0.0101  0.13 
\(\mathbf{C}_g\)  0.0111  0.0102  0.0127  \(0.19\) 
6 Analysis of quantitative results
6.1 Evaluation of the Uncertainty Models
Table 4 gives exemplary numerical results of applying the reverse SLAM tool to the ICLNUIM office environment with the kt1 trajectory. Uncertainty ellipsoids were computed for the \(\mathbf{C}_n\)model and the \(\mathbf{C}_g\)model. The ICLNUIM dataset contains two variants of the depth data: noiseless (ideal) and with simulated Kinectlike noise. The results obtained from our reverse SLAM tool were considerably different for these two variants of the RGBD data. For the noiseless depth data, the distributions of feature points were best captured by the \(\mathbf{C}_n\)model. This is explained by the fact that in the absence of depth errors the spatial uncertainty of point features is caused mainly by errors in the location of keypoints. In contrast, the \(\mathbf{C}_g\)model was better for the ICLNUIM data with synthetic noise indepth measurements. Apparently, in this case, the spatial uncertainty of features depends also on the depth noise. The fact that the gradientbased uncertainty model fits best, in this case, is explained by the increased depth errors on edges of objects. In a real sensor based on the PrimeSense technology, this effect is related to the dependency inbetween pixels in the depth image [13].
6.2 Impact of the uncertainty models on the SLAM accuracy
Improvement of SLAM accuracy by using proposed uncertainty models (ICLNUIM office/kt1 sequence)
Config.  Identity mat.  Model  Uncert. mod.  \(\eta \) (%) 

Matching no noise  0.025±0.015  \(\mathbf{C}_n\)  0.012±0.001  54.5 
\(\mathbf{C}_g\)  0.019±0.002  26.0  
Tracking no noise  0.030±0.014  \(\mathbf{C}_n\)  0.020±0.003  35.7 
\(\mathbf{C}_g\)  0.024±0.007  21.6  
Matching with noise  0.027±0.003  \(\mathbf{C}_n\)  0.026±0.002  4.3 
\(\mathbf{C}_g\)  0.026±0.002  5.2  
Tracking with noise  0.057±0.007  \(\mathbf{C}_n\)  0.058±0.012  \(1.4\) 
\(\mathbf{C}_g\)  0.046±0.008  19.0 
Results obtained for the ICLNUIM sequence are provided in Table 5. For both variants of PUT SLAM (the matchingVO and trackingVO), the ATE RMSE is decreased significantly by using the \(\mathbf{C}_n\)model on the sequence with noiseless depth data. Moreover, the standard deviation of the resulting ATE metric decreased by applying the uncertainty model. However, when a sequence with simulated depth noise was used, the achieved accuracy improvement was much smaller. Apparently, the \(\mathbf{C}_n\)model is inadequate in this case. The application of the \(\mathbf{C}_g\)model resulted in an improvement of the ATE metric for the trackingVO variant, while the accuracy improvement for the matchingVO PUT SLAM was rather insignificant. These results are explainable in light of our observations made using the reverse SLAM tool. When noise is present in the depth data, the dominant source of errors in the location of point features is sliding of these points along edges. The trackingVO variant uses ORB keypoints that are more prone to dislocation along edges because the detection in ORB is less repeatable than in the SURF keypoints used in the matchingVO variant [38]. Hence, the sliding effect is more profound in the trackingbased version and can be to some extent compensated by a proper model of the anisotropic uncertainty.
7 Applicationoriented evaluation
RPE RMSE value [mm] improvement by application of uncertainty models in factor graph optimization (TUM RGBD benchmark)
Sequence  Config.  Identity mat.  \(\mathbf{C}_g\)model  \(\eta \) (%) 

fr1_xyz  Matching  6.90±0.12  5.85±0.08  15.2 
Tracking  34.97±1.14  28.34±0.72  19.0  
fr1_desk  Matching  14.50±0.48  11.92±0.42  17.8 
Tracking  21.08±2.09  17.80±0.71  15.6  
fr1_desk2  Matching  14.52±0.46  12.79±0.36  11.9 
Tracking  26.21±3.02  18.67±1.02  28.8  
fr2_desk  Matching  17.97±0.54  5.41±0.06  69.9 
Tracking  24.52±2.58  11.36±0.96  53.6  
fr3_long _office  Matching  10.69±0.30  7.89±0.13  26.1 
Tracking  14.51±2.46  13.81±2.25  4.83 
Table 6 summarizes the RPE RMSE metric mean values and standard deviations yielded by the SLAM for five sequences from the TUM dataset. Results obtained using the \(\mathbf{C}_g\)model are compared to the accuracy achieved by using the default identity matrices in g\(^2\)o. It is worth noting that in the literature often only the best results are provided, without any statistics. However, as the SLAM frontend employs RANSAC, thus being not fully deterministic, we computed the statistics for 100 trials, in order to provide convincing numerical results.
To show the improvement of the trajectory estimation using uncertainty modeling, we enlarge some region of the trajectories obtained on the fr2_desk sequence (Fig. 13). Without uncertainty modeling, the noisy measurements of the feature positions propagate to the sensor position. As a result, the camera trajectory is noisy. When the proposed uncertainty model is applied, the optimizationbased backend can utilize measurements with low uncertainties and ignore uncertain measurements. The obtained trajectory is smooth for both, tracking and matchingbased PUT SLAM (Fig. 13a and b).
Whereas we focused on the local improvement of the trajectory, PUT SLAM with the proposed uncertainty model \(\mathbf{C}_g\) provides also globally consistent trajectories, which is proved by small ATE RMSE values obtained using the TUM RGBD benchmark. We do not attempt to compare PUT SLAM trajectory estimation accuracy against other SLAM approaches, as the ATE values computed for the entire benchmark sequences depend on a number of factors not related to the uncertainty model of the features (or to a lack of such a model). Among these factors, the loop closure mechanism seems to be most important. As we recently have shown elsewhere [32], the loop closure and relocalization function of a SLAM system can compensate the trajectory drift that mounted because of inaccurate matching of the point features. Hence, we only show on few example sequences from the TUM RGBD Benchmark that PUT SLAM, despite its rather simple architecture, yields globally consistent and accurate trajectories. The best ATE RMSE error for fr3_long_office sequence is 0.022, which is better than ORBSLAM2 [30] and SlamDunkSIFT [12] and the best ATE RMSE error for fr1_desk2 sequence is 0.030 (0.031 SubMap BA [41], 0.048 ElasticFusion [45]).
Moreover, in order to demonstrate that with the selflocalization accuracy achieved by PUT SLAM using the new uncertainty model accurate reconstruction of a dense environment map is possible, and we produced a surfelbased visualization using the FastFusion algorithm [40] (Fig. 14). The mapping algorithm is integrated with our software and can run in real time along with PUT SLAM, as demonstrated for the fr3_long_office sequence in the video material accompanying this paper.
8 Conclusions

the reverse SLAM tool concept and implementation; this tool is useful to analyze the distribution of feature measurements in both the 3D and image space and to identify the most suitable uncertainty model.

new uncertainty models of point features, which incorporate not only the axial and lateral spatial uncertainty caused by the PrimeSense technology RGBD sensors, but take into account image processing uncertainties in the whole SLAM frontend pipeline;

an indepth analysis of the application of feature uncertainty models in the factor graphbased formulation of SLAM, which demonstrates accuracy gains on real RGBD data sequences due to using more elaborated spatial uncertainty models;
Finally, we demonstrated experimentally on the TUM RGBD Benchmark that the proposed gradientbased model is appropriate for real RGBD data. The accuracy improvement is higher for the trackingVO variant of the PUT SLAM frontend. The accuracy improvement for this variant of PUT SLAM is important, as the trackingbased variant is much faster (20 to 40 frames per second) than the matchingbased variant, but was less accurate in most tests. As the two variants of our frontend differ in the type of point features detector and differ in the details of processing, we were able to demonstrate that the internal architecture of the SLAM frontend indeed influences the uncertainty of features. The opensource PUT SLAM developed in our earlier research served here only as our research vehicle and basis for the reverse SLAM tool, but the developed models and research methodology should be suitable for other factor graphbased RGBD SLAM systems, as long as they rely on depth data accuracy and use a similar backend architecture.
As the available RGBD sensors based on structured light are very similar with respect to performance and uncertainty characteristics [14], the presented uncertainty models should suit all of them: Kinect, Xtion, Carmine, and the recent RealSense. Although some characteristics differ for RGBD sensors using the timeofflight principle, which may influence the uncertainty of features (e.g., smaller depth noise along sharp edges [23]), the methodology of developing the uncertainty model with the use of the reverse SLAM tool is still valid for those sensors. Hence, we believe that the simulation/experimental framework based on the reverse SLAM tool will enable us to quickly develop uncertainty models suitable for the next generation of RGBD sensors.
Among our further research directions, there is also the work on uncertainty models for other types of constraints, including posetopose and posetoplane measurements in factor graph optimization. We are also interested in developing a procedure that determines a suitable uncertainty model of each individual feature according to the local characteristics of the RGBD data.
Footnotes
 1.
Source code is at https://github.com/LRMPUT/PUTSLAM/tree/release.
 2.
Video is available at https://www.youtube.com/watch?v=tP5l5IDBYLw.
Supplementary material
Supplementary material 1 (mp4 28504 KB)
References
 1.Belter, D., Nowicki, M., Skrzypczyński, P.: On the performance of posebased RGBD visual navigation systems. In: Cremers, D., Reid, I., Saito, I., Yang, M.H. (eds.) Computer Vision—ACCV 2014, Volume 9004 of the Series Lecture Notes in Computer Science, pp. 407–423. Springer, Cham (2015)Google Scholar
 2.Belter, D., Nowicki, M., Skrzypczyński, P.: Accurate mapbased RGBD SLAM for mobile robots. In: Reis, L.P., Moreira, A.P., Lima, P.U., Montano, L., MuñozMartinez, V. (eds.) Robot 2015: Second Iberian Robotics Conference, Advances in Robotics, vol. 2, pp. 533–545. Springer, Cham (2016)CrossRefGoogle Scholar
 3.Belter, D., Nowicki, M., Skrzypczyński, P.: Improving accuracy of featurebased RGBD SLAM by modeling spatial uncertainty of point features. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1279–1284. Stockholm, Sweden (2016)Google Scholar
 4.Belter, D., Skrzypczyński, P.: Precise selflocalization of a walking robot on rough terrain using parallel tracking and mapping. Ind. Robot Int. J. 40(3), 229–237 (2013)CrossRefGoogle Scholar
 5.Belter, D., Skrzypczyński, P.: The importance of measurement uncertainty modeling in the featurebased RGBD SLAM. In: Proceedings of the 10th International Workshop on Robot Motion and Control, pp. 308–313. Poznań, Poland (2015)Google Scholar
 6.Davison, A.J., Reid, I., Molton, N., Stasse, O.: MonoSLAM: realtime single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)CrossRefGoogle Scholar
 7.Di Leo, G., Liguori, C., Paolillo, A.: Covariance propagation for the uncertainty estimation in stereo vision. IEEE Trans. Instrum. Meas. 60(5), 1664–1673 (2011)CrossRefGoogle Scholar
 8.Dryanovski, I., Valenti, R., Xiao, J.: Fast visual odometry and mapping from RGBD data. In: Proceedings of IEEE International Conference on Robotics & Automation, pp. 2305–2310. Karlsruhe, Germany (2013)Google Scholar
 9.Endres, F., Hess, J., Sturm, J., Cremers, D., Burgard, W.: 3D mapping with an RGBD camera. IEEE Trans. Robot. 30(1), 177–187 (2014)CrossRefGoogle Scholar
 10.Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)CrossRefGoogle Scholar
 11.Engel, J., Schöps, T., Cremers, D.: LSDSLAM: largescale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision—ECCV 2014, pp. 834–849. Springer, Cham (2014)Google Scholar
 12.Fioraio, N., Di Stefano, L.: SlamDunk: affordable realtime RGBD SLAM. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) Computer Vision—ECCV 2014 Workshops, pp. 401–414. Springer, Cham (2015)CrossRefGoogle Scholar
 13.Gesto Diaz, M., Tombari, F., RodriguezGonzalez, P., GonzalezAguilera, D.: Analysis and evaluation between the first and the second generation of RGBD sensors. IEEE Sens. J. 15(11), 6507–6516 (2015)CrossRefGoogle Scholar
 14.GonzalezJorge, H., Riveiro, B., VazquezFernandez, E., MartínezSánchez, J., Arias, P.: Metrological evaluation of Microsoft Kinect and Asus Xtion sensors. Measurement 46(6), 1800–1806 (2013)CrossRefGoogle Scholar
 15.Handa, A., Whelan, T., McDonald, J.B., Davison, A.J.: A benchmark for RGBD visual odometry, 3D reconstruction and SLAM. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 1524–1531. Hong Kong (2014)Google Scholar
 16.Haralick, R.: Propagating covariance in computer vision. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 493–498. Jerusalem, Israel (1994)Google Scholar
 17.Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
 18.Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGBD mapping: using kinectstyle depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)CrossRefGoogle Scholar
 19.Kerl, C., Sturm, J., Cremers, D.: Robust odometry estimation for RGBD cameras. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 3748–3754. Karlsruhe, Germany (2013)Google Scholar
 20.Khoshelham, K., Elberink, S.: Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 12(2), 1437–1454 (2012)CrossRefGoogle Scholar
 21.Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of the International Symposium on Mixed and Augmented Reality, pp. 225–234. Nara, Japan (2007)Google Scholar
 22.Konolige, K., Agrawal, M.: FrameSLAM: from bundle adjustment to realtime visual mappping. IEEE Trans. Robot. 24(5), 1066–1077 (2008)CrossRefGoogle Scholar
 23.Kraft, M., Nowicki, M., Schmidt, A., Fularz, M., Skrzypczyński, P.: Toward evaluation of visual navigation algorithms on rgbd data from the first and secondgeneration kinect. Mach. Vis. Appl. 28(1), 61–74 (2017)CrossRefGoogle Scholar
 24.Kümerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: g\(^2\)o: a general framework for graph optimization. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 3607–3613. Shanghai, China (2011)Google Scholar
 25.Mallick, T., Das, P., Majumdar, A.: Characterizations of noise in kinect depth images: a review. IEEE Sens. J. 14(6), 1731–1740 (2014)CrossRefGoogle Scholar
 26.Matthies, L., Shafer, S.A.: Error modeling in stereo navigation. IEEE J. Robot. Autom. 3(3), 1255–1262 (1987)CrossRefGoogle Scholar
 27.Miura, J., Shirai, Y.: An uncertainty model of stereo vision and its applications to visionmotion planning of robot. In: 13th International Joint Conference on Artificial Intelligence, pp. 1618–1623. Chambery (1993)Google Scholar
 28.Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Generic and realtime structure from motion using local bundle adjustment. Image Vis. Comput. 27(8), 1178–1193 (2014)CrossRefGoogle Scholar
 29.MurArtal, R., Montiel, J.M.M., Tardós, J.D.: ORBSLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)CrossRefGoogle Scholar
 30.MurArtal, R., Tardós, J.D.: ORBSLAM2: an opensource SLAM system for monocular, stereo and RGBD cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)CrossRefGoogle Scholar
 31.Nguyen, C., Izadi, S., Lovell, D.: Modeling kinect sensor noise for improved 3D reconstruction and tracking. In: Proceedings of the Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 524–530. Zürich, Switzerland (2012)Google Scholar
 32.Nowicki, M., Belter, D., Kostusiak, A., Čížek, P., Faigl, J., Skrzypczyński, P.: An experimental study on featurebased SLAM for multilegged robots with RGBD sensors. Ind. Robot Int. J. 44(4), 428–441 (2017)CrossRefGoogle Scholar
 33.Nowicki, M., Skrzypczyński, P.: Combining photometric and depth data for lightweight and robust visual odometry. In: Proceedings of the European Conference on Mobile Robots, pp. 125–130. Barcelona, Spain (2013)Google Scholar
 34.Oskiper, T., Chiu, H.P., Zhu, Z., Samaresekera, S., Kumar, R.: Stable visionaided navigation for largearea augmented reality. In: Proceedings of the IEEE Virtual Reality Conference, pp. 63–70. Singapore (2011)Google Scholar
 35.Ozog, P., Eustice, R.: On the importance of modeling camera calibration uncertainty in visual SLAM. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 3777–3784. Karlsruhe, Germany (2013)Google Scholar
 36.Park, J.H., Shin, Y.D., Bae, J.H., Baeg, M.H.: Spatial uncertainty model for visual features using a kinect sensor. Sensors 12(7), 8640–8662 (2012)CrossRefGoogle Scholar
 37.Scaramuzza, D., Fraundorfer, F.: Visual odometry: Part I the first 30 years and fundamentals. IEEE Robot. Autom. Mag. 18(4), 80–92 (2011)CrossRefGoogle Scholar
 38.Schmidt, A., Kraft, M., Fularz, M., Domagala, Z.: Comparative assessment of point feature detectors and descriptors in the context of robot navigation. J. Autom. Mobile Robot. Intell. Syst. 7(1), 11–20 (2013)Google Scholar
 39.Skrzypczyński, P.: Spatial uncertainty management for simultaneous localization and mapping. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 4050–4055. Rome, Italy (2007)Google Scholar
 40.Steinbrücker, F., Sturm, J., Cremers, D.: Volumetric 3D mapping in realtime on a CPU. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 2021–2028. Hong Kong (2014)Google Scholar
 41.Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGBD SLAM systems. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots & Systems, pp. 573–580. Vilamoura, Portugal (2012)Google Scholar
 42.Triggs, B., McLauchlan, P., Hartley, R., Fitzgibbon, A.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, S., Szeliski, R. (eds.) Vision Algorithms: Theory and Practice. Lecture Notes in Computer Science, vol. 1883, pp. 298–372. Springer, Cham (2000)CrossRefGoogle Scholar
 43.Umeyama, S.: Leastsquares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 80–92 (1991)CrossRefGoogle Scholar
 44.Whelan, T., Johannsson, H., Kaess, M., Leonard, J., McDonald, J.: Robust realtime visual odometry for dense RGBD mapping. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 5704–5711. Karlsruhe, Germany (2013)Google Scholar
 45.Whelan, T., SalasMoreno, R.F., Glocker, B., Davison, A.J., Leutenegger, S.: ElasticFusion: realtime dense slam and light source estimation. Int. J. Robot. Res. 35(14), 1697–1716 (2016)CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.