Abstract
This paper deals with the problem of modeling spatial uncertainty of point features in featurebased RGBD SLAM. Although the featurebased approach to SLAM is very popular, in the case of systems using RGBD data the problem of explicit uncertainty modeling is largely neglected in the implementations. Therefore, we investigate the influence of the uncertainty models of point features on the accuracy of the estimated trajectory and map. We focus on the recent SLAM formulation employing factor graph optimization. Unlike some visual SLAM systems employing factor graph optimization that minimize the reprojection errors of features, we explicitly use depth measurements and minimize the errors in the 3D space. The paper analyzes the impact of the information matrices used in factor graph optimization on the achieved accuracy. We introduce three different models of point feature spatial uncertainty. Then, applying the most simple model, we demonstrate in simulations how important is the influence of the spatial uncertainty model on the graph optimization results in an idealized SLAM system with perfect feature matching. A novel software tool allows us to visualize the statistical behavior of the features over time in a real SLAM system. This enables the analysis of the distribution of feature measurements employing synthetic RGBD data processed in an actual SLAM pipeline. Finally, we show on publicly available real RGBD datasets how an uncertainty model, which reflects the properties of the RGBD sensor and the image processing pipeline, improves the accuracy of sensor trajectory estimation.
Introduction
Motivation
The compact and affordable RGBD sensors based on structured light, such as PrimeSense Carmine, Microsoft Kinect and Asus Xtion, fostered the progress in 3D visual odometry (VO) and simultaneous localization and mapping (SLAM). Visual odometry computes the sensor motion between selected frames of the RGBD input and recovers the trajectory [37]. However, the trajectory recovered using the frametoframe motion estimation has an unavoidable drift, as there are no constraints that enforce the global consistency of sensor motion. Therefore, the VO pipeline treated as a frontend for RGBD data processing is often paired with an optimization engine (called backend) to form a SLAM system, which yields globally consistent trajectories. Typically, the backend postprocesses a posegraph, whose vertices correspond to the sensor poses, whereas its edges represent motion constraints between these poses. Point features are often employed for frametoframe motion estimation in posebased RGBD SLAM and VO systems [9, 18].
Performance depends on the configuration of the image processing module (frontend) and constraints management strategy (backend) [1]. The posebased approach keeps only the relative posetopose constraints, marginalizing out the actual 3D point features; hence, it cannot improve the estimation of motion using the large number of featuretopose correspondences established by the frontend. On the other hand, keeping the point features and solving the Structure from Motion (SfM) problem defined as nonlinear least squares optimization allows obtaining very precise sensor trajectories [28]. Featuretopose measurements can be used to find coordinates of features and relative sensor motion using the Bundle Adjustment (BA) approach [42]. This approach is applied in the most successful visual SLAM systems: PTAM [21], ORB SLAM [29] and ORBSLAM2 [30] to obtain accurate trajectories of the camera. Also RGBD SLAM systems use the graphbased optimization framework to integrate all featuretopose measurements [12, 41].
In this research, we also employ factor graph optimization, but instead of minimization of the reprojection error onto images, as in PTAM [21] and ORBSLAM [29], we directly define the feature position error in the 3D space. Our approach to RGBD SLAM exploits to a greater extent the depth measurements, and thus, it seems to be a better vehicle to demonstrate the role of modeling the uncertainty of RGBD measurements than other systems of similar architecture, such as ORBSLAM2 [30], which employs triangulation of feature positions and uses reprojection errors even for RGBD data containing dense depth frames. Most of the published RGBD graphbased SLAM research neglect the role of the measurement uncertainty model. The information matrices used to represent uncertainty in graphbased optimization are commonly set to identity, which means an equal importance of each measurement and isotropic spatial uncertainty [9, 12, 30]. From the literature, we know that the depth measurement accuracy depends on the measured distance [13, 20]. Thus, the simplified approach with the uniform distribution of measurements is not justified by the underlying physics of the measurements in RGBD cameras. Therefore, in this work, we investigate how to improve the accuracy of sensor trajectory estimation by explicitly modeling spatial uncertainty of the point features.
Problem statement
In the SLAM problem, we consider a RGBD sensor moving freely in the environment and measuring the position of each detected point feature. The factor graph representation consists of vertices representing both the 3D features and the sensor poses. Edges represent measurements between the poses and the features, or between two poses. The quality (or importance) of each measurement is represented by an information matrix. The information matrix can be computed by inverting the covariance matrix of the measurement. Thus, more accurate measurements produce stronger constraints in graph optimization [24].
This paper presents methods to compute the information matrices on the basis of the measurement model of the RGBD sensor, taking into account the additional uncertainty introduced by keypoint detection on the RGB images and dependencies between the geometric structure of the scene, and the resulting spatial uncertainty of the features. Hence, the elaborated uncertainty models we introduce try to capture the spatial uncertainty of point features resulting from the whole processing pipeline in the SLAM frontend.
Contribution
The contribution of this paper with respect to the stateoftheart is threefold:

we introduce a new uncertainty analysis methodology based on software tools that allow us to simulate RGBD SLAM systems and to analyze the behavior of point features; these tools, in turn, make it possible to understand the nature of the spatial uncertainty of the point features in RGBD SLAM;

we propose mathematical uncertainty models for point features in RGBD SLAM, which are based upon the investigations using the new methodology;

we verify the suitability of the proposed uncertainty models on real RGBD benchmark data.
Basic aspects of the new research methodology concerning the spatial uncertainty modeling of RGBD point features have been introduced in two earlier papers: the workshop paper [5] demonstrated the importance of using the anisotropic spatial uncertainty models in BAbased SLAM exclusively on simulated examples, whereas the conference paper [3] introduced two new uncertainty models and demonstrated their feasibility mostly on synthetic RGBD data. The aim of this journal article is to cover all aspects of our approach to spatial uncertainty modeling in a unified manner, to analyze in more detail some data processing steps that influence the uncertainty (such as RANSAC), and to extend the analysis of the behavior of features in the map on real RGBD benchmark sequences (section “Applicationoriented Evaluation”). Our ultimate aim is to demonstrate that considering the anisotropic uncertainty of features in factor graph optimization improves the accuracy of sensor trajectory estimation in applications of RGBD SLAM. Comparing to the conference paper that introduced the new uncertainty models [3], we demonstrate here, using idealized simulations, how large could be the improvement in trajectory accuracy due to proper models of the uncertainty in the featuretopose constrains in a BAbased SLAM system. This provides a motivation for investigating such models for a real SLAM system, and the journal paper demonstrates how particular building blocks of the processing pipeline in the SLAM frontend influence the uncertainty. In particular, we demonstrate this for the RANSAC procedure, which is commonly used in image processing, but as far as we know no one has shown before its influence on the spatial uncertainty of the produced point features. The journal article makes it possible to demonstrate with all the necessary details, that the spatial uncertainty of features should account not only for errors in RGBD sensor measurements, but for the inaccuracy introduced in the whole process of feature extraction using the RGB images and depth images.
Although we also extend in this article the presentation of the PUT SLAM architecture, we do not consider this particular implementation of RGBD SLAM a contribution, as it has been presented in detail in [2]. However, we use our own implementation of RGBD SLAM as a convenient research material, upon which we can implement the software tools necessary for our current work on uncertainty, such as simulations and the reverse SLAM tool, having control on the implementation details.
As we argued in [1], there is a broad diversity of SLAM architectures and implementation details, even if we consider only featurebased systems. Therefore, it is impossible to address the features uncertainty modeling for a generic RGBD SLAM architecture, as such an architecture hardly exists. We focus on the architectures employing the BA concept, with the structure of the factor graph (in the backend) based on the featuretopose constraints (Fig. 1). In the recent literature [30], this approach is considered as superior to the (perhaps more popular) posebased SLAM architecture [9]. Hence, the contributed methodology should be helpful for researchers that want to improve the performance of the recent SLAM architectures. The PUT SLAM, which we use here, belongs to this family of BAbased systems, but was designed for Kinectlike RGBD sensors, and its backend optimization procedure minimizes the Euclidean distance errors in feature positions. This approach, different from the more commonly used feature reprojection error in the image plane, is motivated by the fact that the spatial uncertainty of RGBD sensors can be modeled in the Euclidean space regardless of the depth measurement principle (i.e., active stereo vision or timeofflight), which potentially makes the methodology proposed in this article universal with respect to the first and second generation of the RGBD sensors [23].
The remainder of this paper is organized as follows: Section 2 presents the most relevant previous work in the areas of uncertainty modeling for RGBD sensors and the uncertainty models in featurebased SLAM, while Sect. 3 details our PUT SLAM system, used through the paper as a reference architecture and a tool to investigate the uncertainty of features. The three approaches to modeling the uncertainty of point features in RGBD SLAM are introduced in Sect. 4, and followed by the description of the simulation and visualization tools in Sect. 5. These tools are used to investigate the behavior of the point features depending on the uncertainty model and the noise characteristics of the RGBD data. A quantitative evaluation of the results of applying the proposed uncertainty models in PUT SLAM on synthetic RGBD data is provided in Sect. 6, while Sect. 7 demonstrates how the selected best model influences the accuracy of sensor trajectory estimation in PUT SLAM tested on real RGBD data from benchmark sequences. Section 8 concludes the paper and sets outlook on the future research directions.
Related work
The uncertainty of depth and combined RGBdepth measurements from sensors based on the PrimeSense structured light technology was investigated in a number of recent papers. Khoshelham and Elberink [20] studied the accuracy and resolution of depth data from Kinect sensor. GonzalezJorge et al. [14] demonstrated that the metrological characteristics in terms of accuracy and precision are almost independent on the type of sensor (Kinect or Xtion) due to the use of the same PrimeSense depth camera. Recently, research on the noise characteristics in Kinect v1 depth data was surveyed in [25]. The firstgeneration RGBD sensors have been also compared to the Microsoft Kinect v2 based on the timeofflight principle [13]. This research revealed that correlation used in the PrimeSense technology to compare the observed pattern of “speckles” to a reference pattern creates dependency inbetween pixels in the depth images, which in turn causes errors in the range measurements. The papers dealing with uncertainty in RGBD sensors focus mostly on applications outside of VO/SLAM, but Park et al. [36] proposed a mathematical uncertainty model for Kinect v1 sensor and RGBD sparse point features. However, the approach of [36] was demonstrated without an application in real SLAM or VO. Although the structured light depth measurement principle in the most common RGBD sensors may be considered as active stereo vision, the existing literature on uncertainty modeling in stereo vision considers mainly systems, in which the stereo matching is applied to discrete features, and no dense depth map is created [7, 27]. An exception is the older work of Matthies and Shafer [26], who applied 3D Gaussians to model the measurement errors in the discrete digital images, and demonstrated that propagating the uncertainty in the form of covariance matrices enables reduction in uncertainty in the localization task.
In spite of these results, up to now there is little work being done on the utilization of physical characteristics of RGBD sensors in VO/SLAM. In contrast, the feature uncertainty modeling is widely used in SLAM research employing the extended Kalman filter. In these frameworks, the covariance from the sensor measurement model is propagated to feature model for both 2D laser sensors [39] and 3D vision [6]. In the work of Oskiper et al. [34] the uncertainty of the observed features (called “landmarks”) is explicitly modeled taking into account the stereo imagery processing pipeline and using the method from [26]. However, as far as we know, only Dryanovski et al. [8] formulated an uncertainty model of point features used to register a Kinect sensor pose in relation to a map of features which are estimated using Kalman filter. This model, based on the Gaussian mixture, was motivated by experimental assessment of the Kinect sensor depth measurements uncertainty. In graphbased SLAM, Endres et al. [9] applied the depth measurement model from [20] in the motion estimates verification procedure for their featurebased posegraph RGBD SLAM. The possibility of using the Mahalanobis distance that takes into account the uncertainty instead of the Euclidean distance in the posetopose motion estimate computation is also mentioned in [9], but the paper provides no clear description of the uncertainty model being used. Conversely, the featurebased RGBD visual odometry system presented in [18] minimizes feature reprojection error in the image space in order to compute the posetopose motion estimates. This approach implicitly takes into account the fact that the uncertainty increases with range. Nguyen et al. [31] applied a depth uncertainty model of Kinect in a VO system that employs dense depth data. Although the improved accuracy of the recovered trajectories was shown in [31], and the dense depthbased approach generally achieves impressive results in terms of environment map reconstruction [44], it cannot model uncertainty of all the individual range measurements and then propagate this uncertainty to the dense map.
The RGBD SLAM formulation used in this paper is similar to the structure from motion problem in computer vision, which is commonly solved applying the Bundle Adjustment (BA) [42]. Advanced, keyframebased BA variants, such like the one implemented in PTAM [21], may be used for online mapping and motion estimation in robotics [4]. However, while there were some efforts to incorporate the uncertainty of feature points in the BA, much of the computer vision literature simply assumes onepixel Gaussian noise in the location of features [17]. For example, Konolige and Agrawal [22] use such uncertainty model in the posebased SLAM utilizing visual imagery. Ozog and Eustice [35] demonstrate that accounting for the uncertainty of the relativepose transformation, computed using the Haralick’s method [16], in the twoview sparse BA improves the relative motion estimation between two image frames. Our approach can be considered similar to BA because the trajectory of the sensor and feature position are simultaneously optimized over sensor measurements. Such an approach is not only superior to the posebased RGBD SLAM formulation [9], but also gives better accuracy and robustness than dense/direct methods in visual SLAM [11] and RGBD SLAM [19]. The direct methods are more prone to image distortions and artifacts due to such factors as the rolling shutter or automatic white balance because they need to model the whole image acquisition process that influences the pixel intensities [10]. Conversely, the featurebased approach employed in PUT SLAM enables this system to take great advantage from modeling of the uncertainty, because the uncertainty of features directly influences the strength of the constraints in optimization. As so far, other featurebased RGBD SLAM systems of similar architecture, such as [12] and [41], did not attempt to model this uncertainty.
RGBD SLAM with a map of features
We consider spatial uncertainty models of point features treating the SLAM algorithm itself as a “black box” that processes the measurements (constraints) and depends on the provided description of the “importance” of these measurements in the form of uncertainty model. Although the general structure of the BAbased SLAM algorithms is similar with respect to the main data processing components, there is no SLAM standard, generic architecture. Therefore, we describe here the architecture of our PUT SLAM system [2, 3]. This brief description should make it easier to understand some of the mechanisms that are responsible for the spatial uncertainty of the features (e.g., matching and RANSAC), but is also necessary to introduce the software used to investigate the behavior of features, which is based on PUT SLAM (see Sect. 5).
The architecture of the PUT SLAM system is presented in Fig. 2. PUT SLAM is a BAbased RGBD SLAM that maintains a map of the environment. The map consists of a sequence of camera poses and a set of 3D features. The socalled factor graph used for optimization and camera state estimation is formulated from the map data. The vertices in the graph are camera poses and observed features. These vertices are connected by measurements (featuretopose constraints). The graph optimization is running in a separate thread. The map and factor graph are synchronized after each iteration of the optimization process. This architecture requires more attention on implementation, but allows to efficiently use a multicore CPU.^{Footnote 1}
The point features \(\mathbf{f}_j\) and camera poses \(\mathbf{c}_i\) are represented by vertices in the factor graph. The constraints (measurements) are represented by edges in the graph. The measurement \(\mathbf{m}_{ij}\) between the ith pose and the jth feature is represented by the 3D edge \(\mathbf{t}_{ij}\in {\mathbb {R}}^3\). The rigid body transformation (odometry) between two camera poses denoted as i and k is represented by the edge \(\mathbf{o}_{ik}\in \mathrm{\mathbf{SE}(3)}\), where \(\mathbf{SE}(3)\) is the special Euclidean group that defines rotation and translation of a rigid body with respect to six degrees of freedom. To find the optimal sequence of the sensor poses \(\mathbf{c}_1,\ldots ,\mathbf{c}_n\) and feature positions \(\mathbf{f}_1,\ldots ,\mathbf{f}_m\) the following cost function is minimized:
where \(\mathbf{e}(\mathbf{c}_i,\mathbf{f}_j,\mathbf{m}_{ij})\) is an error function computed for the estimated and measured pose of the vertex. The measurement \(\mathbf{m}_{ij}\) is 3D transformation \(\mathbf{t}_{ij}\) for featuretopose or SE(3) transformation \(\mathbf{o}_{ij}\) for posetopose constraints. However, the posetopose constraints are added to the graph only if the number of matches between map features and features from the current frame is below a given threshold [2]. Posetopose constraints stabilize the factor graph optimization process in case of an insufficient number of featuretopose constraints. The g\(^2\)o general graph optimization library [24] with the implementation of Preconditioned Conjugate Gradient method (PCG) is used to solve (1).
The accuracy of each featuretopose is represented in the factor graph by an information matrix \({\varvec{\Omega }}_{i,j}^{t}\). The information matrix can be obtained by inverting the covariance matrix of the measurement. The information matrix \({ \varvec{\Omega }}_{i,j}^{o}\) for the posetopose edge in the factor graph is set to an identity matrix, as in [9].
SLAM frontend
We implemented a frontend of the SLAM algorithm to verify our method on reallife RGBD sequences. We use standard procedures for feature detection, description [38], matching, and tracking [1]. The frontend starts from detection of salient features on the RGBD frame. The set of detected features is used to estimate frametoframe sensor displacement. The PCG solver used in the backend requires a good initial guess. Therefore, we provide a reliable sensor displacement guess from the VO pipeline. We implemented a fast VO algorithm [33], which is independent of the map structure. In our investigations, we consider two configurations of the VO pipeline. In the first one, the associations between two consecutive RGB frames are found using SURF descriptors. However, the SURF descriptors are slow to compute and match [38]. Thus, we alternatively implemented fast sparse optical flow tracking with the Kanade– Lucas–Tomasi (KLT) algorithm [33]. In this case, ORB keypoint detector is used. Regardless of the method used to establish associations between features belonging to two frames in the sequence, the SE(3) transformation is computed from the paired 3D points. To estimate the camera motion from this set of paired features, we apply the Umeyama algorithm [43] and preemptive RANSAC to remove outliers. The camera pose estimated from VO pipeline is used as an initial guess for the camera pose in the graph optimization. The constraints between new camera pose and features in the map are obtained by matching of features from the last frame and features projected from the map. Again, we use RANSAC, to determine the set of inliers. The features that are observed, but cannot be associated with the features projected from the map are added to the map, extending the environment model to newly discovered areas.
Spatial uncertainty modeling
In the BAbased approach to RGBD SLAM, the importance of constraints defined by the measurements of feature positions is defined by their information matrices, that are directly related to the spatial uncertainty of these features. To show the advantages of using anisotropic uncertainty model in the BAbased formulation of SLAM, we consider a simplified 2D case presented in Fig. 3. The position of a single feature is measured from two different positions of the camera. For each measurement, we draw the real uncertainty ellipsoid (filledin with a gradient color). The measured position of the feature is inside the error ellipse for each measurement. To find the position of features from noisy measurements \(\mathbf{t}_{11}\) and \(\mathbf{t}_{21}\), we construct a graph. The information matrices of the links in the graph \({\varvec{\Omega }}_{11}^t\) and \({\varvec{\Omega }}_{21}^t\) are computed from the inverse of the covariance matrix. The computed uncertainty (dashed line ellipses) differs to the real uncertainty of measurements. However, the optimization with anisotropic uncertainty model gives the results at least one order of magnitude better than minimization with identity information matrices [5]. In the first case, the Mahalanobis distance is minimized (1), while in the second example the distance in (1) degrades to the Euclidean distance, which is then minimized.
The classic approach to capture the spatial uncertainty in features used by SLAM systems is to propagate the uncertainty of sensor measurements through the data processing pipeline into the covariance matrices of the features [39]. This approach was also used with RGBD sensors [8] and was the first one we attempted to use in our BAbased PUT SLAM to test whether an anisotropic uncertainty model improves the accuracy of estimation. However, realizing that this approach is insufficient in a system that uses RGBD data processed in a complicated and nondeterministic (due to RANSAC) pipeline, we propose to develop new uncertainty models that accumulate the spatial uncertainty introduced in the frontend. The new models are implemented by analytically modeling the distribution of features observed a posteriori, rather than by propagating the uncertainty from the sensor model. This new idea allows us to develop feasible uncertainty models even without the knowledge of the processing pipeline details. We also do not need to approximate the uncertainty propagation through nondifferentiable or even nonanalytical processing stages, as in the classic method [16].
For instance, the distribution of feature measurements is significantly influenced by the RANSAC procedure, which is used in the frontend to remove outlier matches. The influence of RANSAC on the spatial uncertainty of measurements is presented in Fig. 4. The set of features is observed from two different sensor poses (Fig. 4a, b). The uncertainty of measurements is modeled using an anisotropic model. However, if RANSAC is used to remove outliers, as in the frontend of PUT SLAM, the distribution of measurements is changed (Fig. 4c). In RANSAC, the Umeyama algorithm [43] is used to find a transformation between two sensor poses. Outlier measurements which are not consistent with the found transformation are removed from the set of inliers (feature \(\mathbf{f}_3\) in Fig. 4c). The RANSAC outlier threshold is defined here as the Euclidean distance between the expected and measured position of the feature. Thus, all the inlier measurements are inside a sphere defined by the RANSAC outlier threshold. In this case, the uncertainty can be modeled also as a sphere, and the information matrix for each feature should be set to the identity matrix in the graph.
In these investigations, we use the synthetic ICLNUIM data set [15], as we need perfect ground truth sensor trajectories to isolate the spatial uncertainty introduced by processing the RGBD frames in the frontend. Moreover, the ICLNUIM offers an insight into the nature of the noise introduced into the sensory data. Hence, using this dataset it is possible to draw clear conclusions as to the dependencies between the characteristics of the sensory data uncertainty and the behavior of the uncertainty model.
Uncertainty propagation from the sensor model
The error of depth measurements based on the PrimeSense technology increases with the distance to the observed object (axial noise) [20]. The uncertainty of measurements is also influenced by the lateral resolution of the sensor (lateral noise) [25, 31]. The uncertainty model of depth measurements can be generalized for both Kinect and Xtion devices [14]. Moreover, the spatial uncertainty in the location of the RGBD point feature depends on the inaccuracy introduced by the keypoint detector used in the frontend of the SLAM system [36]. Thus, the spatial uncertainty model of measurements is anisotropic and cannot be captured by the identity information matrices commonly applied for graph optimization in the backend of SLAM systems.
To compute the information matrix \({\varvec{\Omega }}_{ij}^{t}\) of a featuretopose constraint, we have to determine the covariance matrix of the RGBD point feature. The feature uncertainty model based on the concept of propagating the uncertainty from the sensor measurements is based on the approach proposed by Park et al. [36]. The sensor model defines the relation between the position of a feature in the image, and position of this feature in the 3D space:
where \(x_c\), \(y_c\) define the position of the optical axis on the image plane, and \(f_x\), \(f_y\) are focal lengths of the camera. The covariance matrix \(\mathbf{C}_{p_{(3\times 3)}}\) of each discovered feature is computed as follows:
where \(\mathbf{J}_{p_{(3\times 3)}}\) is Jacobian of (2) computed with respect to the intrinsic coordinates of the point feature u, v and d, while \(\mathbf{C}_{k_{(3\times 3)}}\) is the covariance matrix of feature measurement u, v and depth d. Following the approach common in computer vision, we assume that the measurements of u and v coordinates in the camera are independent. This assumption and the fact that the uncertainty in d is caused by a physically different measurement channel allow us to write \(\mathbf{C}_{k}\) as a diagonal matrix:
As demonstrated by Park et al. [36], the variance values \(\sigma _u\), \(\sigma _v\) can be considered constant for the given camera parameters and the chosen feature detection algorithm. However, the variance \(\sigma _d\) of depth d measurement increases with the distance from the camera. We use the approximation of \(\sigma _d\) found experimentally by Khoshelham and Elberink [20]:
where \(k_1,\ldots ,k_4\) are constants (\(k_1=0.57\), \(k_2=0.89\), \(k_3=0.42\), \(k_4=0.96\)). The components given by equations (3), (4) and (5) sum up to a simple \(\mathbf{C}_p\)model of the spatial uncertainty of a point feature. This model is viewdependent, i.e., the shape of the uncertainty ellipsoid for a given feature changes when it is seen by the sensor from a different viewpoint. A 3D visualization of the \(\mathbf{C}_p\)model is presented in Fig. 5.
Normalbased uncertainty model
Realizing that propagating the uncertainty of the sensor measurements to the uncertainty of features is not enough to capture the uncertainty sources in the SLAM frontend, we analyzed the distribution of point features produced by PUT SLAM using the reverse SLAM tool. In order to isolate the effects of imperfect detection of point features, we started with the ICLNUIM sequences without depth noise. We noticed that most measurements of the feature positions are located on the object surfaces and form flat ellipsoids (Fig. 6b and compare Fig. 10a). The minor axis of the ellipsoids is correlated with the normal vector to the surface. To capture this distribution of measurements, we propose the normalbased uncertainty model \(\mathbf{C}_n\). The normalbased uncertainty model is viewindependent. It means that the global shape of the uncertainty ellipsoid does not depend on the position of the camera.
To determine the ellipsoid of the uncertainty model, we compute a normal vector to the point located in the feature coordinates \(f_{u,v}\). To this end, we used depth image only [3]. Then, we compute the rotation matrix \(\mathbf{R}\). The z axis of the coordinate system R coincides with the surface normal n (Fig. 6a). The x and y axes are selected to form a righthanded coordinate system (Fig. 6b). The covariance matrix \(\mathbf{C}_n\) is defined as:
where \(\mathbf{S}\) is a scaling matrix. In this model, we scale the minor axis of the ellipsoid z, which is related to the surface normal n. The scaling coefficient \(S_{\mathrm{z}}\) is in the range (0,1). The diagonal elements of the matrix \(S_{\mathrm{x}}\) and \(S_\mathrm{y}\) are set to 1. Example uncertainty ellipsoids of the normalbased uncertainty model \(\mathbf{C}_n\) for selected 3D point features are presented in Fig. 6c.
Gradientbased uncertainty model
The \(\mathbf{C}_n\) uncertainty model leverages the role of visual localization of the point features in the RGB frames. However, the uncertainty of the feature location depends not only on the performance of the features detector implemented in the SLAM frontend pipeline and the quality of the RGB images, but also on the local structure of the scene. During visual analysis of the distribution of point features, we observed that features located in the vicinity of strong intensity gradients (photometric edges) slide along lines defined by these gradients. Thus, we propose another uncertainty model, which includes the observed behavior. In the proposed uncertainty model, the major axis of the ellipsoid is located along a photometric edge defined by strong intensity gradient in the RGB image (Fig. 7b and compare Fig. 10b). To compute the uncertainty matrix, we detect the RGB edge using a 3\(\times \)3 Scharr kernel. The direction of the edge in 3D space is computed using the depth data. Then, the procedure is similar to the procedure presented for the normalbased uncertainty model. We construct the rotation matrix \(\mathbf{R}\) representing the local coordinate system. The z axis is related to the RGB gradient vector (Fig. 7a). The x axis of the coordinate system is located on the RGB edge (Fig. 7a and b). The RGB edge might be related to the edge of an object or a photometric edge on a flat surface. The covariance matrix \(\mathbf{C}_g\) is computed using (6). We scale the x, y and z axes of the ellipsoid using the scaling matrix \(\mathbf{S}\). Gradientbased uncertainty ellipsoids (according to the \(\mathbf{C}_g\)model) for selected 3D point features are presented in Fig. 7c.
Tools that give insight into the uncertainty of features
Experiments in Simulation
This section desribes two software environments (programs) that make it possible to observe and measure some variables and quantities that are hard, or even impossible, to be observed and measured in a typical SLAM system. The common idea behind these tools is to replace particular components of the investigated system (here PUT SLAM) by quivalent modules that use synthetic or groundtruthbased measurements. This idea allows us to investigate particular aspects of the spatial uncertainty in SLAM, ignoring those aspects, that normally are out of control in fully experimental work. As demonstrated in our previous work [5], simulationbased experiments allow us to isolate errors introduced by wrong feature correspondences or multiplicated features, and to focus on the analysis of spatial uncertainty introduced by RGBD measurements. The simulation environment replaces the real PUT SLAM frontend, providing the backend with features and measurements (constraints) that are free from qualitative errors introduced by a real frontend [2]. In the simulator, we define the environment with a set of 3D point features described by unique identifiers. The identifiers are used in the matching of the features observed from various viewpoints. The position measurement of a selected feature is generated by randomly drawing a point from the uncertainty distribution ellipsoid defined around the nominal position of a feature according to the chosen uncertainty model. To generate a sequence of measurements for graph optimization, we simulate the motion of the sensor along the given trajectory. Then, we compute the position of each feature in the camera frame \({}^c\mathbf{f}_j\):
where \(\mathbf{c}_i\) is the global sensor pose and \(\mathbf{f}_j\) is the global position of the jth feature. To check whether the feature is within the range of the sensor, we compute the projection of the 3D feature on the image plane \([u,v,d]^T\):
If the projection of the 3D point lies on the image plane (640\(\times \)480) and within the range of the sensor, the covariance matrix is computed using (3). In the simulation experiments, we use the \(\mathbf{C}_p\)model to define the uncertainty.
In the first experiment, we simulated a boxlike environment, which represents a room (5.5\(\times 5.5\times \)5.5 m). We randomly generate 1000 point features on the surface of each wall. The camera moves along a rectangular reference trajectory inside the room (Fig. 8). At each corner of the room, the sensor is rotated by \(90^\circ \) in small increments so the optical axis of the sensor (z axis) is in the direction of forward motion. This configuration of the sensor is most common, but at the same time, it is the most challenging one for a SLAM system [5]. This is caused by the fact that the usually large axial uncertainty of RGBD sensor measurements is, in this case, directed most of the time along the estimated trajectory of the robot. Moreover, when the robot rotates it observes significantly less common features on consecutive images. As we have observed in [5], a ceilingfacing camera is more advantageous, if appropriate features are available. We have presented a more thorough analysis of the features observation scenarios in the earlier research [5].
The trajectories obtained in simulation in the room environment are shown in Fig. 8. When VO method is used, the position drift is not canceled by a loop closure procedure. The error accumulates and the estimated trajectory is far from the trajectory of the sensor. The application of SLAM paradigm (i.e., loop closure) allows canceling the drift when the features from the beginning of the experiment are reobserved (loop closure).
An experiment in simulation allows us not only to assess the accuracy of the trajectories but also to compare the trajectories and featurebased maps obtained with and without the uncertainty model. We use simulations to confirm the hypothesis, that modeling of spatial uncertainty in featurebased PUT SLAM improves the accuracy. To obtain statistics, we run the simulation 100 times for each investigated configuration. The quantitative results are computed according to the methodology introduced in [41], using the Absolute Trajectory Error (ATE) and Relative Pose Error (RPE) metrics. The distance between corresponding points of the estimated and ground truth trajectories is measured by ATE, whereas the RPE metric reveals the local drift of the trajectory. We can obtain the translational or rotational RPE, taking, respectively, the translational or rotational part of the homogeneous matrix computed when comparing the sensor poses along the trajectories. As ATE compares absolute distances between two rigid sets of points representing the estimated and the ground truth trajectory, it is always expressed as a metric distance. With the uncertainty model, the ATE error is 4 times smaller while the RPE error is about 4.5 times smaller (Table 1). Details of the simpler localization systems: VO and posebased SLAM included in Table 1 for comparison have been described in [1], while the approach used to run these methods in the simulation was the same as for PUT SLAM.
It is worthwhile noticing that even though the error is smaller for PUT SLAM with uncertainty model in comparison with PUT SLAM without uncertainty modeling, the standard deviation is greatly higher. In the series of 100 trials, the one solution from 100 optimization procedures is incorrect. One camera pose from the whole trajectory optimized by g\(^2\)o is incorrect, and the mean ATE error of the trajectory is significantly higher [5]. We notice this behavior when the camera observes only a flat wall and the number of posetofeatures measurements is small. In this case, the graphbased backend sometimes finds a solution, which is incorrect but satisfies measurements from the given camera pose. This situation is difficult to detect in practice. Thus, whenever the number of measurements is smaller than a threshold an additional posetopose constraint is added to the graph to stabilize the optimization process in the backend.
In the next simulation, we use the kt1 sensor trajectory from the ICLNUIM dataset (Fig. 9) to confirm the results in a more realistic scenario. The positions of features are obtained offline from the PUT SLAM frontend processing the ICLNUIM office sequence of RGBD frames. The 86 extracted features are fixed in the simulator and augmented by unique IDs that enable the simulator to perfectly associate the observed features to the map. The results summarized in Table 2 confirm that the featurebased PUT SLAM yields more accurate trajectory estimate than the posebased SLAM implemented as in [1]. Moreover, the anisotropic feature uncertainty model allows PUT SLAM to achieve much smaller ATE than it was possible using identity matrices.
The simulation experiments give us also the possibility to verify the accuracy of the obtained map (positions of features). Unfortunately, the available benchmark datasets produced using real RGBD sensors contain ground truth sensor trajectories, but no ground truth for the map. Hence, a quantitative comparison of the map is not possible, and to demonstrate the accuracy gains in the location of map features, we also use the synthetic ICLNUIM dataset. In the simulation, we compute the mean squared error (MSE) for (known and fixed) features in the map. The results are presented in Table 3. VO and posebased SLAM give similar results. The MSE value is greater than the noise introduced by individual measurements. This comes from the fact that a feature position is measured in the current sensor frame and the sensor pose error accumulates. Because the sensor pose drifts, also the estimated positions of features differ significantly from their real positions. However, in the BAbased PUT SLAM, the positions of features are jointly optimized with positions of the sensor. This allows us to reduce the MSE errors for features and improve the quality of the map. Hence, by introducing the spatial uncertainty model of features the featurebased PUT SLAM achieves much smaller estimation errors not only for the sensor trajectory but also for the map of features.
Reverse SLAM: looking closer at the features
We demonstrated in the simulation that an anisotropic spatial uncertainty model of features improves significantly the accuracy of estimation in factor graphbased SLAM. However, we found it difficult to apply the approach to uncertainty modeling proposed by Park et al. [36] in the actual PUT SLAM working with real data. When the \(\mathbf{C}_p\)model is used, the optimization in g\(^2\)o becomes unstable or returns worse results than the optimization with identity matrices. This suggests that the \(\mathbf{C}_p\) covariance matrix does not represent correctly the uncertainty of real measurements.
In practice, the uncertainty of features is also influenced by the tracking/matching algorithm [33]. Therefore, in order to investigate how the proposed uncertainty model fits to the distribution of real measurements we have developed the reverse SLAM tool.^{Footnote 2} In the reverse SLAM tool, we are looking for the distribution of feature measurements that is obtained in the frontend when SLAM returns a perfect trajectory. Thus, we move the sensor along the known trajectory and run the frontend of the SLAM. At each frame, we provide the real position of the camera instead of its estimate. Finally, the sensor noise and image processing errors accumulate in the representation of features, not in the trajectory error. Assuming that an accurate ground truth trajectory is available, a distribution of measurements (3D positions of features) for each feature in the map can be computed. Unfortunately, datasets obtained using real RGBD sensors and external motion capture techniques typically suffer from synchronization issues between the RGBD data and the ground truth trajectory. For instance, in the TUM RGBD Benchmark [41] data from the Vicon motion capture system, and data from the Kinect sensor are not well synchronized due to the different sampling frequencies [23]. This dataset cannot be used in reverse SLAM tool because errors that usually manifest themselves only when computing benchmarking values for the particular trajectory [30], in reverse SLAM may change significantly the distribution of the features, hence leading to wrong conclusions as to the uncertainty model. Therefore, we use the synthetic ICLNUIM dataset [15] in experiments with reverse SLAM. This dataset provides not only perfect ground truth trajectories of the sensor, but also perfect synchronization between the RGBD frames and trajectory points.
The output from the reverse SLAM tool gives us the information about the real distribution of feature measurements in PUT SLAM. The uncertainty model obtained from the reverse SLAM tool contains not only information about the sensor noise but also about the accuracy of image processing and RANSACbased outliers rejection in the frontend.
Using the reverse SLAM tool, we can analyze the distribution of measurements in the 3D space. The example distributions of measurements are presented in Fig. 10. Using the visualization tool, we can analyze the distribution of measurements for each feature in the map. We noticed that some measurements are located on the surfaces of the objects (Fig. 10a). We can also compute statistical properties of the distribution. The standard deviation of measurements in z axis (normal to the surface) is 0.005 m which is 62% smaller than in y axis and 130% smaller than in x axis. Another type of distribution is presented in Fig. 10b. In this case, the measured positions of the selected feature are located along the computer cable. The standard deviation of measurements along the y axis is 39% larger than in x axis and 96% larger than in the z axis.
It is also possible to verify distribution visually by projecting measurements on the image plane. In Fig. 11, we present distribution of measurements projected on the first image from the ICLNUIM office/kt1 sequence. It is visible that the measurements slide along RGB (photometric) edges forming small clusters. This suggests that despite the spatial uncertainty reshaping by the RANSAC outlier rejection mechanism we can still find a beneficial anisotropic uncertainty model of the point features.
Analysis of quantitative results
Evaluation of the Uncertainty Models
The reverse SLAM tool that we have developed allows us to compute the distributions of point features depending on the assumed uncertainty model. The reverse SLAM computes a spatial uncertainty ellipsoid from the actual distribution for each feature included in the map. To facilitate the analysis of these ellipsoids, we introduce the \(d_\varepsilon \) coefficient:
where \(\sigma _x\), \(\sigma _y\), and \(\sigma _z\) stand for the standard deviations of measurements along the local x, y and z axis of the uncertainty model, respectively. The coefficient defined by (9) gives the relation between the length of the uncertainty ellipsoid major axis and the length of its minor axes. Larger positive values of \(d_\varepsilon \) indicate that the uncertainty is anisotropic and should be properly captured by the Mahalanobis distance used in (1). However, if \(d_\varepsilon \) approaches 0 the uncertainty is almost isotropic. Thus, it can be sufficiently modeled as a sphere, which is then represented by an identity matrix in (1). A negative value of \(d_\varepsilon \) means that the expected major axis of the uncertainty ellipsoid is shorter than its minor axes.
Table 4 gives exemplary numerical results of applying the reverse SLAM tool to the ICLNUIM office environment with the kt1 trajectory. Uncertainty ellipsoids were computed for the \(\mathbf{C}_n\)model and the \(\mathbf{C}_g\)model. The ICLNUIM dataset contains two variants of the depth data: noiseless (ideal) and with simulated Kinectlike noise. The results obtained from our reverse SLAM tool were considerably different for these two variants of the RGBD data. For the noiseless depth data, the distributions of feature points were best captured by the \(\mathbf{C}_n\)model. This is explained by the fact that in the absence of depth errors the spatial uncertainty of point features is caused mainly by errors in the location of keypoints. In contrast, the \(\mathbf{C}_g\)model was better for the ICLNUIM data with synthetic noise indepth measurements. Apparently, in this case, the spatial uncertainty of features depends also on the depth noise. The fact that the gradientbased uncertainty model fits best, in this case, is explained by the increased depth errors on edges of objects. In a real sensor based on the PrimeSense technology, this effect is related to the dependency inbetween pixels in the depth image [13].
Impact of the uncertainty models on the SLAM accuracy
The analysis of the spread of point features using the reverse SLAM tool allowed us to gain some intuition about the usefulness of our uncertainty models. However, this intuition had to be verified by investigating how the \(\mathbf{C}_n\)model and \(\mathbf{C}_g\)model behave in real SLAM. To this end, we tested these two uncertainty models in the regular PUT SLAM software, which computed the ATE and RPE metrics that allowed us to assess the impact of the uncertainty models on SLAM accuracy. Both variants of PUT SLAM were used in these tests—the one with the matchingbased VO, and the other one based on tracking for VO implementation.
To facilitate the analysis of the ATE and RPEbased numerical results, we introduce another coefficient, which indicates the relative trajectory accuracy improvement due to the proposed model:
where \(\mathrm{E}_I\) and \(\mathrm{E}_C\) are the error values (ATE RMSE or RPE RMSE) for the identity matrices and the uncertaintybased matrices, respectively.
Results obtained for the ICLNUIM sequence are provided in Table 5. For both variants of PUT SLAM (the matchingVO and trackingVO), the ATE RMSE is decreased significantly by using the \(\mathbf{C}_n\)model on the sequence with noiseless depth data. Moreover, the standard deviation of the resulting ATE metric decreased by applying the uncertainty model. However, when a sequence with simulated depth noise was used, the achieved accuracy improvement was much smaller. Apparently, the \(\mathbf{C}_n\)model is inadequate in this case. The application of the \(\mathbf{C}_g\)model resulted in an improvement of the ATE metric for the trackingVO variant, while the accuracy improvement for the matchingVO PUT SLAM was rather insignificant. These results are explainable in light of our observations made using the reverse SLAM tool. When noise is present in the depth data, the dominant source of errors in the location of point features is sliding of these points along edges. The trackingVO variant uses ORB keypoints that are more prone to dislocation along edges because the detection in ORB is less repeatable than in the SURF keypoints used in the matchingVO variant [38]. Hence, the sliding effect is more profound in the trackingbased version and can be to some extent compensated by a proper model of the anisotropic uncertainty.
Applicationoriented evaluation
Finally, we have investigated how the proposed approach to uncertainty modeling influences the accuracy of SLAM trajectory estimation using five real RGBD data sequences from the TUM RGBD Benchmark, recorded with Kinect or Xtion sensors. As mentioned before, the reverse SLAM tool cannot be used to determine the spatial uncertainty of measurements from such a dataset. Therefore, the \(\mathbf{C}_g\)model was applied in these tests, as the previous results strongly suggested that it is the only uncertainty model that provides beneficial improvements in g\(^2\)o optimization in the presence of depth measurements noise, which is unavoidable in real Kinect/Xtion data. However, this model should be properly parametrized for the new data type. To this end, we found the relation between the parameters of the uncertainty model and the mean accuracy achieved in a series of experiments with PUT SLAM. We modified the standard deviations along local axes of the \(\mathbf{C}_g\) uncertainty model and registered errors of the obtained sensor trajectories. For each parameter value, we performed 100 experiments to compute statistics (Fig. 12).
Moreover, we extended the applied uncertainty model considering the fact that the variance of Kinect depth measurements increases with the distance from the camera. Thus, we scale the \(\mathbf{C}_g\) uncertainty ellipsoid by the factor \(d_{\mathrm{scale}}\), which depends on the distance from the camera d (depth measurement) and a scaling parameter \(S_u\):
Finding optimal parameters for such an uncertainty model is extremely timeconsuming. Hence, we did not perform an exhaustive search of the uncertainty model parameters. Instead, we applied a simplified, hierarchical procedure. At first, we found the optimal value of the scale along the RGB gradient of the model (\(S_z\)) (cf. Fig. 12a and b). Then, we modified the scale along the y axis (\(S_y\)) to find the best value of the parameter. The \(S_x\) value is set to 1. Finally, the dependence between the \(S_u\) scale factor and the SLAM RPE trajectory accuracy was determined (Fig. 12e and f). We searched for the best parameters of the model using the TUM fr1_desk sequence. Then, the parametrized uncertainty model was verified on several other sequences, that we considered as representative for various SLAM tasks and environment types. The parameters of the obtained model are: \(S_z=0.8\), \(S_y=1.25\). The \(S_u\) value is set to 5.0 for matchingbased and to 1.8 for the trackingbased version of PUT SLAM.
In the experiments on the real RGBD data, we use the RPE RMSE metric [41]. In our implementation of the featurebased RGBD SLAM, the position of the camera is connected (through point features) only with a small set of the previous camera poses (keyframes). The probability that distant camera poses is included in the graph used by the backend decreases with the distance from the current sensor pose. This comes from the fact that our system closes loops only when the sensor drift is small, as the PUT SLAM does not contain appearancebased loop closure implementation. Thus, very long trajectories can drift regardless of the accurate local measurements, and the influence of the feature uncertainty model on the trajectory accuracy is best represented by the RPE RMSE value.
Table 6 summarizes the RPE RMSE metric mean values and standard deviations yielded by the SLAM for five sequences from the TUM dataset. Results obtained using the \(\mathbf{C}_g\)model are compared to the accuracy achieved by using the default identity matrices in g\(^2\)o. It is worth noting that in the literature often only the best results are provided, without any statistics. However, as the SLAM frontend employs RANSAC, thus being not fully deterministic, we computed the statistics for 100 trials, in order to provide convincing numerical results.
The accuracy improvement due to employing the anisotropic uncertainty model varies from sequence to sequence. It is worth mentioning that the feature position error for single Kinect measurement can be higher than 10 cm and our implementation of SLAM reaches few centimeters accuracy on trajectories of the length of about 10 m. Hence, the trajectory error of PUT SLAM is almost ten times smaller than the errors of individual depth measurements. Additional improvement of the accuracy is difficult to obtain. However, by application of the uncertainty model in the backend optimization the RPE RMSE error can be reduced by more than 50% (fr2_desk sequence in Table 6). For other trajectories, the improvement of accuracy varies from 4.83 to 28.8%. The worst improvement is obtained for fr3_long_office sequence with the trackingbased version of the SLAM (4.83%). The sparse optical flow tracking with the Kanade–Lucas–Tomasi (KLT) algorithm and ORB point detector is very fast, but sensitive to rapid camera motions due to motion blur in RGB images. When the tracked keypoints drift from their real positions, also the 3D features in the map accumulate drift. In PUT SLAM, this drift can be compensated only locally, because of the lack of appearancebased loop closures. Such loop closures are implemented in some of the stateoftheart SLAM systems, such as ORBSLAM2 [30], and allow these systems to achieve better trajectory estimation accuracy than the accuracy obtained in PUT SLAM on looped trajectories. However, these problems are more specific to the particular SLAM architecture [2], than to the role of the spatial uncertainty model in the optimization backend, and are beyond the scope of this paper.
To show the improvement of the trajectory estimation using uncertainty modeling, we enlarge some region of the trajectories obtained on the fr2_desk sequence (Fig. 13). Without uncertainty modeling, the noisy measurements of the feature positions propagate to the sensor position. As a result, the camera trajectory is noisy. When the proposed uncertainty model is applied, the optimizationbased backend can utilize measurements with low uncertainties and ignore uncertain measurements. The obtained trajectory is smooth for both, tracking and matchingbased PUT SLAM (Fig. 13a and b).
Whereas we focused on the local improvement of the trajectory, PUT SLAM with the proposed uncertainty model \(\mathbf{C}_g\) provides also globally consistent trajectories, which is proved by small ATE RMSE values obtained using the TUM RGBD benchmark. We do not attempt to compare PUT SLAM trajectory estimation accuracy against other SLAM approaches, as the ATE values computed for the entire benchmark sequences depend on a number of factors not related to the uncertainty model of the features (or to a lack of such a model). Among these factors, the loop closure mechanism seems to be most important. As we recently have shown elsewhere [32], the loop closure and relocalization function of a SLAM system can compensate the trajectory drift that mounted because of inaccurate matching of the point features. Hence, we only show on few example sequences from the TUM RGBD Benchmark that PUT SLAM, despite its rather simple architecture, yields globally consistent and accurate trajectories. The best ATE RMSE error for fr3_long_office sequence is 0.022, which is better than ORBSLAM2 [30] and SlamDunkSIFT [12] and the best ATE RMSE error for fr1_desk2 sequence is 0.030 (0.031 SubMap BA [41], 0.048 ElasticFusion [45]).
Moreover, in order to demonstrate that with the selflocalization accuracy achieved by PUT SLAM using the new uncertainty model accurate reconstruction of a dense environment map is possible, and we produced a surfelbased visualization using the FastFusion algorithm [40] (Fig. 14). The mapping algorithm is integrated with our software and can run in real time along with PUT SLAM, as demonstrated for the fr3_long_office sequence in the video material accompanying this paper.
Conclusions
This article contributed a novel methodology for the modeling of spatial uncertainty of point features in featurebased RGBD SLAM. Considering the featurebased approach employing a factor graph as the state of the art in visual and RGBD SLAM, our work contributed:

the reverse SLAM tool concept and implementation; this tool is useful to analyze the distribution of feature measurements in both the 3D and image space and to identify the most suitable uncertainty model.

new uncertainty models of point features, which incorporate not only the axial and lateral spatial uncertainty caused by the PrimeSense technology RGBD sensors, but take into account image processing uncertainties in the whole SLAM frontend pipeline;

an indepth analysis of the application of feature uncertainty models in the factor graphbased formulation of SLAM, which demonstrates accuracy gains on real RGBD data sequences due to using more elaborated spatial uncertainty models;
We demonstrated in simulation that the accuracy of featurebased RGBD SLAM can be improved by accurate identification of anisotropic uncertainty model of measurements. Then, we proposed the application of reverse SLAM tool to analyze the distribution of uncertainty measurements in actual RGBD SLAM working on synthetic data. The identified uncertainty models are used to compute information matrices of constraints in factor graph optimization of PUT SLAM, which serves as an example implementation for the family of factor graphbased SLAM systems that is targeted in our research. The use of the reverse SLAM revealed that the spatial uncertainty in point features is influenced not only by the sensor properties, but also by the RGB image processing and the iterative procedure for estimation of SE(3) transformation (RANSAC). We consider this an important conclusion from the presented research, as it explains why the uncertainty model based on the sensor noise (\(\mathbf{C}_p\)model [36]) works well in simulation, but eventually fails in tests with the actual PUT SLAM. In the simulations, there is no actual image processing involved, and the \(\mathbf{C}_p\)model based upon Kinect sensor characteristics improves the results at least one order of magnitude, compared to the same BAbased SLAM algorithm without the uncertainty model. However, this approach does not work with the actual PUT SLAM, that introduces additional uncertainty in the frontend data processing. The model of the uncertainty used in the backend graph optimization is device specific and specific to the frontend pipeline. However, we introduce the general reverse SLAM tool, which allows us to determine the spread of feature measurements taking into account not only the sensor model, but the whole feature processing pipeline in the SLAM frontend. The proposed procedure can be used for other sensors and frontends of the SLAM. The visual and numerical analysis of measurements distribution using the reverse SLAM tool allowed us to propose two uncertainty models that catch the noise resulting from the whole data processing in the frontend. The normalbased uncertainty model assumes that the measured positions of features are scattered on the surface of the objects. The gradientbased uncertainty model explains the increased errors along edges.
Finally, we demonstrated experimentally on the TUM RGBD Benchmark that the proposed gradientbased model is appropriate for real RGBD data. The accuracy improvement is higher for the trackingVO variant of the PUT SLAM frontend. The accuracy improvement for this variant of PUT SLAM is important, as the trackingbased variant is much faster (20 to 40 frames per second) than the matchingbased variant, but was less accurate in most tests. As the two variants of our frontend differ in the type of point features detector and differ in the details of processing, we were able to demonstrate that the internal architecture of the SLAM frontend indeed influences the uncertainty of features. The opensource PUT SLAM developed in our earlier research served here only as our research vehicle and basis for the reverse SLAM tool, but the developed models and research methodology should be suitable for other factor graphbased RGBD SLAM systems, as long as they rely on depth data accuracy and use a similar backend architecture.
As the available RGBD sensors based on structured light are very similar with respect to performance and uncertainty characteristics [14], the presented uncertainty models should suit all of them: Kinect, Xtion, Carmine, and the recent RealSense. Although some characteristics differ for RGBD sensors using the timeofflight principle, which may influence the uncertainty of features (e.g., smaller depth noise along sharp edges [23]), the methodology of developing the uncertainty model with the use of the reverse SLAM tool is still valid for those sensors. Hence, we believe that the simulation/experimental framework based on the reverse SLAM tool will enable us to quickly develop uncertainty models suitable for the next generation of RGBD sensors.
Among our further research directions, there is also the work on uncertainty models for other types of constraints, including posetopose and posetoplane measurements in factor graph optimization. We are also interested in developing a procedure that determines a suitable uncertainty model of each individual feature according to the local characteristics of the RGBD data.
Notes
 1.
Source code is at https://github.com/LRMPUT/PUTSLAM/tree/release.
 2.
Video is available at https://www.youtube.com/watch?v=tP5l5IDBYLw.
References
 1.
Belter, D., Nowicki, M., Skrzypczyński, P.: On the performance of posebased RGBD visual navigation systems. In: Cremers, D., Reid, I., Saito, I., Yang, M.H. (eds.) Computer Vision—ACCV 2014, Volume 9004 of the Series Lecture Notes in Computer Science, pp. 407–423. Springer, Cham (2015)
 2.
Belter, D., Nowicki, M., Skrzypczyński, P.: Accurate mapbased RGBD SLAM for mobile robots. In: Reis, L.P., Moreira, A.P., Lima, P.U., Montano, L., MuñozMartinez, V. (eds.) Robot 2015: Second Iberian Robotics Conference, Advances in Robotics, vol. 2, pp. 533–545. Springer, Cham (2016)
 3.
Belter, D., Nowicki, M., Skrzypczyński, P.: Improving accuracy of featurebased RGBD SLAM by modeling spatial uncertainty of point features. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 1279–1284. Stockholm, Sweden (2016)
 4.
Belter, D., Skrzypczyński, P.: Precise selflocalization of a walking robot on rough terrain using parallel tracking and mapping. Ind. Robot Int. J. 40(3), 229–237 (2013)
 5.
Belter, D., Skrzypczyński, P.: The importance of measurement uncertainty modeling in the featurebased RGBD SLAM. In: Proceedings of the 10th International Workshop on Robot Motion and Control, pp. 308–313. Poznań, Poland (2015)
 6.
Davison, A.J., Reid, I., Molton, N., Stasse, O.: MonoSLAM: realtime single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
 7.
Di Leo, G., Liguori, C., Paolillo, A.: Covariance propagation for the uncertainty estimation in stereo vision. IEEE Trans. Instrum. Meas. 60(5), 1664–1673 (2011)
 8.
Dryanovski, I., Valenti, R., Xiao, J.: Fast visual odometry and mapping from RGBD data. In: Proceedings of IEEE International Conference on Robotics & Automation, pp. 2305–2310. Karlsruhe, Germany (2013)
 9.
Endres, F., Hess, J., Sturm, J., Cremers, D., Burgard, W.: 3D mapping with an RGBD camera. IEEE Trans. Robot. 30(1), 177–187 (2014)
 10.
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)
 11.
Engel, J., Schöps, T., Cremers, D.: LSDSLAM: largescale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision—ECCV 2014, pp. 834–849. Springer, Cham (2014)
 12.
Fioraio, N., Di Stefano, L.: SlamDunk: affordable realtime RGBD SLAM. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) Computer Vision—ECCV 2014 Workshops, pp. 401–414. Springer, Cham (2015)
 13.
Gesto Diaz, M., Tombari, F., RodriguezGonzalez, P., GonzalezAguilera, D.: Analysis and evaluation between the first and the second generation of RGBD sensors. IEEE Sens. J. 15(11), 6507–6516 (2015)
 14.
GonzalezJorge, H., Riveiro, B., VazquezFernandez, E., MartínezSánchez, J., Arias, P.: Metrological evaluation of Microsoft Kinect and Asus Xtion sensors. Measurement 46(6), 1800–1806 (2013)
 15.
Handa, A., Whelan, T., McDonald, J.B., Davison, A.J.: A benchmark for RGBD visual odometry, 3D reconstruction and SLAM. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 1524–1531. Hong Kong (2014)
 16.
Haralick, R.: Propagating covariance in computer vision. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 493–498. Jerusalem, Israel (1994)
 17.
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)
 18.
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGBD mapping: using kinectstyle depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)
 19.
Kerl, C., Sturm, J., Cremers, D.: Robust odometry estimation for RGBD cameras. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 3748–3754. Karlsruhe, Germany (2013)
 20.
Khoshelham, K., Elberink, S.: Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 12(2), 1437–1454 (2012)
 21.
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of the International Symposium on Mixed and Augmented Reality, pp. 225–234. Nara, Japan (2007)
 22.
Konolige, K., Agrawal, M.: FrameSLAM: from bundle adjustment to realtime visual mappping. IEEE Trans. Robot. 24(5), 1066–1077 (2008)
 23.
Kraft, M., Nowicki, M., Schmidt, A., Fularz, M., Skrzypczyński, P.: Toward evaluation of visual navigation algorithms on rgbd data from the first and secondgeneration kinect. Mach. Vis. Appl. 28(1), 61–74 (2017)
 24.
Kümerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: g\(^2\)o: a general framework for graph optimization. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 3607–3613. Shanghai, China (2011)
 25.
Mallick, T., Das, P., Majumdar, A.: Characterizations of noise in kinect depth images: a review. IEEE Sens. J. 14(6), 1731–1740 (2014)
 26.
Matthies, L., Shafer, S.A.: Error modeling in stereo navigation. IEEE J. Robot. Autom. 3(3), 1255–1262 (1987)
 27.
Miura, J., Shirai, Y.: An uncertainty model of stereo vision and its applications to visionmotion planning of robot. In: 13th International Joint Conference on Artificial Intelligence, pp. 1618–1623. Chambery (1993)
 28.
Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Generic and realtime structure from motion using local bundle adjustment. Image Vis. Comput. 27(8), 1178–1193 (2014)
 29.
MurArtal, R., Montiel, J.M.M., Tardós, J.D.: ORBSLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
 30.
MurArtal, R., Tardós, J.D.: ORBSLAM2: an opensource SLAM system for monocular, stereo and RGBD cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)
 31.
Nguyen, C., Izadi, S., Lovell, D.: Modeling kinect sensor noise for improved 3D reconstruction and tracking. In: Proceedings of the Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 524–530. Zürich, Switzerland (2012)
 32.
Nowicki, M., Belter, D., Kostusiak, A., Čížek, P., Faigl, J., Skrzypczyński, P.: An experimental study on featurebased SLAM for multilegged robots with RGBD sensors. Ind. Robot Int. J. 44(4), 428–441 (2017)
 33.
Nowicki, M., Skrzypczyński, P.: Combining photometric and depth data for lightweight and robust visual odometry. In: Proceedings of the European Conference on Mobile Robots, pp. 125–130. Barcelona, Spain (2013)
 34.
Oskiper, T., Chiu, H.P., Zhu, Z., Samaresekera, S., Kumar, R.: Stable visionaided navigation for largearea augmented reality. In: Proceedings of the IEEE Virtual Reality Conference, pp. 63–70. Singapore (2011)
 35.
Ozog, P., Eustice, R.: On the importance of modeling camera calibration uncertainty in visual SLAM. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 3777–3784. Karlsruhe, Germany (2013)
 36.
Park, J.H., Shin, Y.D., Bae, J.H., Baeg, M.H.: Spatial uncertainty model for visual features using a kinect sensor. Sensors 12(7), 8640–8662 (2012)
 37.
Scaramuzza, D., Fraundorfer, F.: Visual odometry: Part I the first 30 years and fundamentals. IEEE Robot. Autom. Mag. 18(4), 80–92 (2011)
 38.
Schmidt, A., Kraft, M., Fularz, M., Domagala, Z.: Comparative assessment of point feature detectors and descriptors in the context of robot navigation. J. Autom. Mobile Robot. Intell. Syst. 7(1), 11–20 (2013)
 39.
Skrzypczyński, P.: Spatial uncertainty management for simultaneous localization and mapping. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 4050–4055. Rome, Italy (2007)
 40.
Steinbrücker, F., Sturm, J., Cremers, D.: Volumetric 3D mapping in realtime on a CPU. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 2021–2028. Hong Kong (2014)
 41.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGBD SLAM systems. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots & Systems, pp. 573–580. Vilamoura, Portugal (2012)
 42.
Triggs, B., McLauchlan, P., Hartley, R., Fitzgibbon, A.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, S., Szeliski, R. (eds.) Vision Algorithms: Theory and Practice. Lecture Notes in Computer Science, vol. 1883, pp. 298–372. Springer, Cham (2000)
 43.
Umeyama, S.: Leastsquares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 80–92 (1991)
 44.
Whelan, T., Johannsson, H., Kaess, M., Leonard, J., McDonald, J.: Robust realtime visual odometry for dense RGBD mapping. In: Proceedings of the IEEE International Conference on Robotics & Automation, pp. 5704–5711. Karlsruhe, Germany (2013)
 45.
Whelan, T., SalasMoreno, R.F., Glocker, B., Davison, A.J., Leutenegger, S.: ElasticFusion: realtime dense slam and light source estimation. Int. J. Robot. Res. 35(14), 1697–1716 (2016)
Author information
Additional information
This work was supported by the Polish National Science Centre grant funded according to the decision DEC2013/09/B/ST7/01583. M. Nowicki received a doctoral scholarship from the Polish National Science Centre under the Grant 2016/20/T/ST7/00396.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 28504 KB)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Belter, D., Nowicki, M. & Skrzypczyński, P. Modeling spatial uncertainty of point features in featurebased RGBD SLAM. Machine Vision and Applications 29, 827–844 (2018) doi:10.1007/s0013801809369
Received:
Revised:
Accepted:
Published:
Issue Date:
Keywords
 SLAM
 Uncertainty model
 Bundle adjustment
 Factor graph optimization