1 Introduction

Due to the rapid growth of technologies, pedestrian navigation has become widely accessible in the recent years. In the developed countries, smartphones are no longer considered as luxury items and are owned by the majority of population. The sensors installed in modern mobile devices, such as global positioning system (GPS) receiver, accelerometer, compass, gyroscope, and camera, provide a broad field of methods that can be applied for mobile navigation.

Satellite-based GPS is widely used in various navigational devices and applications. Being available on most modern smartphones, with a help of additional context information (e.g., map-based graphical representation of a city area), GPS can provide assistance in navigation. However, in large cities, where tall buildings block or reflect satellite signals, the positioning error of GPS is measured in meters [1]. Such an error does not allow using GPS to navigate blind or visually impaired people, since stepping off the sidewalk to the car lanes is dangerous and can be harmful. Even using differential correction1, GPS alone is not sufficiently accurate to guide pedestrians in urban environments, because there are no distinct roads but narrow to broad paths to walk on.

Some other approaches use the number of steps detected by an accelerometer, reference points, and a mobile compass for navigation assistance. Fallah et al. [2] presented a successful example of this method. However, their system is designed for indoor environments, where maps are very accurate and clear landmarks (e.g., corners and doors) are available.

More recently, radio frequency identification (RFID) technology has found its use in the research area of navigation. One of the latest systems applied to navigation of visually impaired was proposed by Varpe and Wankhade [3]. On the user side, they have applied a mobile RFID reader, a transceiver for transmitting the tag’s information, and an audio device to provide feedback to the user. To identify walking routes, RFID passive tag network has been employed on the path. Although the accuracy of such systems yields a precision of 1-m scale when a dense RFID tag configuration is used, it requires additional objects (i.e., RFID tags), which makes this technology costly and not easily adoptable for new environments [4].

An alternative method of user navigation with the help of smartphones is currently being developed [5]. This method makes use of image processing for navigation. This technique has been utilized mostly in robotics [6], but there are some other adaptations of it for indoor and outdoor pedestrian navigation as well [710]. Firstly, given tracks are reconstructed as sparse 3D point clouds using structure from motion (SfM) [11] and stored in a database. Secondly, with an interactive app running on a smartphone, query images are acquired in order to retrieve the location and direction of the camera (i.e., the pedestrian), which is a required component for navigation.

Scale invariant feature transform (SIFT) [12] features are extracted from the images taken with the client application. The features are reconstructed in the form of points in the 3D cloud. Comparing thousands of points from the model with the current query photograph is computationally expensive. Together with storage limitation, this causes the necessity of removing outliers from 3D data without affecting the positioning accuracy.

In this paper, we analyze outlier removal in generated 3D point clouds for pedestrian navigation. Our hypothesis states that it is possible to maintain positioning accuracy while reducing the number of outliers in a reconstructed 3D model. These developments are part of the smartphone-based system designed for navigation of visually impaired people [5].

2 State of the art

According to the definition of Grubbs [13], an outlying observation, or outlier, is “one that appears to deviate markedly from other members of the sample in which it occurs.” Outliers in a 3D point cloud may be of different nature. Firstly, they may result from errors occurring during the reconstruction process, such as inherent inaccuracies in feature detection, false matching, and errors in estimation of fundamental and projection matrices. Second, non-static environment objects (e.g., cars, chairs and tables of street cafes, advertising and market stalls) add noise to the reconstruction.

In SfM 3D point clouds, outlier removal is possible in two stages. First, within the bundle adjustment, erroneous matches are usually discarded by the random sample consensus (RANSAC) [14] or its extensions, progressive sample consensus (PROSAC) [15], and preemptive RANSAC [16]. In order to do a robust estimation of parameters in terms of reconstruction, the following steps are repeated iteratively: (i) a seed group of matches is randomly selected; (ii) transformation from the seed group is computed; (iii) inliers to this transformation are found; (iv) if the number of inliers is sufficiently large, the least-squares estimate of the transformation on all of the inliers is recomputed. The transformation with the largest number of inliers is kept. With a sufficient number of inliers (more than 50%) and correctly chosen parameters, this method gives a good estimation of matches.

Nonetheless, the overall outcome model sometimes is not “clean” due to the inherent inaccuracies in feature detection, false matching, and errors in estimation of fundamental and projection matrices. This leads to the necessity of performing an additional step of outlier detection in the reconstructed 3D point cloud. In most vision-based city reconstruction approaches, outliers are removed only within the reconstruction process, and no “cleaning” techniques are applied to the generated point clouds [1720]. That is explained by the visualization purpose of their reconstruction.

Taglioretti et al. [21] evaluated the performance of localization depending on the selected outlier removal method during the bundle adjustment. The forward search method [22] proved to be superior. However, the problem of additional outlier removal in SfM 3D point clouds has not been evaluated from the perspective of localization task before. In order to identify possible applicable techniques, we observed existing outlier detection approaches.

Based on Hodge and Austin [23], outlier detection approaches are categorized as

  • distribution-based,

  • depth-based,

  • clustering-based,

  • distance-based,

  • density-based, and

  • connectivity-based.

In distribution-based methods, the bulk of observations is estimated robustly by a suitable model distribution. Outliers are then defined as observations, which are unlikely to be generated by the distribution [24].

In depth-based approaches, data objects are organized in layers in the data space, with the expectation that shallow layers are more likely to contain outlying data objects than the deep layers [25].

In clustering-based techniques, a cluster of small size can be considered as clustered outliers [26].

In the approach by Knorr and Ng [27], an object in a dataset is a distance-based outlier if at least a given fraction of the other objects in the dataset lies at a distance greater than some given threshold. This approach does not make any assumptions about the data distribution and has better computational efficiency than depth-based methods, especially in large datasets.

In density-based methods, the relative density of a point compared to its neighbors is computed as outlier score. Using this approach, one can effectively identify local outliers in datasets with diverse clusters [28]. Breunig et al. [29] proposed a density-based approach relying on the local outlier factor (LOF) of each object, which is depending on the local density of its neighborhood. The neighborhood is defined by the distance to the M(p)th nearest neighbor. The value M(p) is predefined. It corresponds to the minimum number of points used in the calculation of density.

Approaches from classification [23] sometimes can be combined into more complex methods. Thus, a mixture of density- and clustering-based approaches, in this paper, is referred to as connectivity-based approach.

Outlier removal outside the bundle adjustment in completely built SfM point clouds has not been addressed explicitly before. However, there are some approaches designed for laser-scanned point clouds. Such clouds are usually more accurate and consist of a higher number of points. We believe, nevertheless, that the principles of outlier removal in laser-scanned point clouds also work for SfM point clouds and, therefore, review here some approaches designed for laser-scanned point clouds.

In 2006, Sotoodeh [30] presented a LOF-based algorithm for outlier detection in laser-scanned point clouds. The author justifies the selection of a density-based algorithm due to its unconstrained behavior to the preliminary knowledge of the scanned scene and its independence from the varying density of the points. The method was able to detect most of the expected outliers in the scene; however, it was not robust against clusters of outliers. For that reason, in 2007, the author proposed a modified version of his algorithm based on hierarchical clustering [31]. The modified algorithm runs in two phases: in the first stage, it removes relatively large-scale erroneous measurements based on Euclidean minimum spanning tree edges. In the second phase, it detects and removes the outliers that might not be as obvious as the first ones but according to the scanned object surfaces, they are considered as wrong measurements. The algorithm was tested on terrestrial point clouds and returned a satisfying result: both, single and clustered outliers were removed. However, in some cases, user interaction was still required to determine whether a cluster is an outlier or an object. An additional drawback is a run-time complexity of O(n 3), which makes the method inefficient for working with datasets containing thousands of points.

Luo and Liao [32] proposed outlier detection in laser point clouds extending distance- and density-based approaches. Their algorithm changes 3D data to 2D by slicing and projection and employs a KD tree to index the projected points. The authors use the local distance-based outlier factor (LDBOF) defined by Zhang et al. [33] as the outlier judgment criterion. LDBOF uses the relative location of an object to its neighbors to determine the degree to which the object deviates from its neighborhood. The authors claimed higher efficiency compared to the algorithms of Sotoodeh [30, 31]. However, they also mention the necessity of finding more robust parameters [32].

Recently, Wang et al. [34] designed a connectivity-based pipeline for outlier filtering and noise smoothing in low-quality point clouds from outdoor scenes. They first detect sparse outliers applying a scheme based on the relative density deviation of the local neighborhood and the average local neighborhood, providing a scoring strategy that includes a normalization to become independent from the specific data distribution. In order to remove further small dense outliers, a clustering method is used. According to the authors, detection is capable of removing all types of outliers without any user interactions.

3 Outlier removal applied to 3D point clouds

City-scale 3D point clouds are large, arbitrary datasets, and, therefore, the methods claiming computational efficiency were preferred over others. Another important criterion for selection of an outlier removal method was the ability of a method to be performed without any additional user interaction. Thus, the first approach of Sotoodeh [30] and the pipeline of Wang et al. [34] were implemented and applied to our datasets with some parameter adjustments.

While the density-based method runs in a linear time, the second part of the connectivity-based approach, performed by agglomerative hierarchical clustering, has the run-time complexity of O(n 3). To assess the potential of computational speedup, an original distance-based method of outlier detection in 3D point clouds is proposed.

3.1 The novel distance-based approach

We adopt the notion of distance-based outliers proposed by Knorr and Ng [27] for data-mining applications: “An object in a dataset is an outlier if at least a fraction of the objects in this dataset lies in a larger distance from this object.” Our approach is based on the assumption that points belonging to building wall structures have normal distribution. Thus, we apply a double-threshold scheme: firstly, we reduce the impact of infrequent points in the model, the relative distances from which to the other points in the model are comparatively large. After eliminating such points, we estimate the second filtering factor based on the global mean over mean distances of each point’s neighborhood.

Given a point set P={p 1,…,p n } (n is the number of points), outlier elimination is performed as follows:

  1. 1.

    At the beginning, for each point p i ,(i=1,…,n), the k-nearest neighbors N ( p i )={q 1,…,q k }⊆P are determined. The value of k=32 was selected for our approach by visual inspection as described in the following subsection.

    The function returns the set of indexes of a point’s k-nearest neighbors and their distances to the point.

  2. 2.

    For each point p i , the so called k-distance, denoted as D k (p i ), is defined as the distance d(p i ,p j ), where p j N ( p i ) is the neighbor farthest away in p i ’s k-neighborhood—in other words, the longest distance among the distances from p i to its k-nearest neighbors.

  3. 3.

    Then, for each point p i , the average distance of its neighborhood \(\overline {d}(p_{i})\) is calculated as

    $$ \overline{d}(p_{i}) = \frac{\sum \limits_{q_{j} \in N(p_{i})} d(p_{i}, q_{j})}{k} $$

    Then, the standard deviation of the neighborhood distances is estimated as

    $$ \sigma = \sqrt{\frac{\sum \limits_{i=1}^{n} (\overline{d}(p_{i})-\overline{\mathit{D}})^{2}}{n}} $$

    where \(\overline {\mathit {D}}\) is the mean value of all \(\overline {d}(p_{i})\).

  4. 4.

    Subsequently, the point cloud is filtered so that all points that meet the condition \(\overline {d}(p_{i}) \geq 10\sigma \) are eliminated. Having the point cloud filtered initially, the average distance \(\overline {\mathit {D}}\) is recalculated with respect to the points left in the model. Then, the refined value \(\overline {\mathit {D}}\) together with D k (p i ) is used for the final filtering phase: the points, for which the condition \(D_{k}(p_{i}) \geq 3 \overline {\mathit {D}}\) holds, are removed from the model. The remaining points are considered inliers.

All parameters were empirically derived, considering a set of constraints described further.

3.2 Constraints for parameters used in the proposed method

The first qualitative characteristic of outlier removal is a level of noise preserved in the model afterwards. The noise level stands for the relative number of points or point clusters remaining in the model after outlier detection, although they should have been removed. This characteristic is particularly important for aligning models and maps, which is a part of our image-based navigation system merging separate model fragments in the same coordinate space. A high level of noise can affect the alignment, as the outlying points can drag a model towards the wrong walls.

Removing as many outliers as possible, the main constraint for parameters adjustment (e.g., number of nearest neighbors, filtering coefficients) was retainability of model’s structure, or, in other words, presence of all significant walls in the model after outlier removal.

This constraint is important for navigation, because we are interested in covering large area. At the same time, the correctness of the model’s alignment, again, highly depends on the footprint structure, so that, in some cases, even an additional small wall can resolve ambiguity of scaling parameters and thus the right model placement. Therefore, it is rather important to have the majority of walls preserved after the outlier removal step.

For each outlier removal method, we achieved a trade-off between the level of noise and model’s retainability by adjusting the parameters. The parameter adjustment was performed on point clouds of different density through iterative testing using different combinations of parameters: in each test case, we compared the number of point clusters outside the facade (due to the small model size, it was possible to count them manually) and evaluated the completeness of facades. For our models, the local optimum was achieved with the described set of parameters. However, it may happen that further parameter adjustment might be required for the models of different density.

4 Experimental setup

4.1 Dataset

Evaluation was performed on a dataset recorded at the downtown of Maastricht, the Netherlands. The dataset results from 7 walks with a recording device (iPhone 5 (Apple Inc., USA) with acquisition application running on it) attached with a chest mount utility to the body of the person acquiring images (Fig. 1). Within a walk, images were acquired sequentially every second. A total of 3291 images were recorded. All recordings differ in date, time, and weather condition.

Fig. 1
figure 1

Data acquisition and navigation. A smartphone is attached with a chest-mount to the user (on the left). For positioning, the user holds an interactive cane, connected to the system via Bluetooth interface and providing navigational clues in a form of a haptic feedback. Data is transferred to a computer using wireless network connection. Consent to use the photograph was obtained

The route passes by several landmarks in the center of Maastricht. The main characteristics of the location are a large number of pedestrians, high vehicle traffic, and narrow streets and houses located close to the road. Additionally, the route’s appearance changes most during spring and summer, as street cafes are active and numerous shops and stores are constantly changing decorations in and around showroom windows.

Processing with VisualSFM [11] resulted in a dataset of 17 models. Each model represents a reconstructed set of building walls as a sparse 3D point cloud. The models contain from 200 to 12,792 points.

4.2 Preparation of test models

Inspired by the approaches of Strecha et al. [35] and Untzelmann et al. [36], we aligned all models to the OpenStreetMap [37]. To evaluate our initial hypothesis that positioning accuracy is maintained while reducing outliers in a reconstructed model, we selected from our dataset a reference model that allows the best automatic alignment to the real world coordinates (Fig. 2).

Fig. 2
figure 2

Alignment of a model to the OpenStreetMap outline. Green points belong to wall structures; red line is a camera path

The selected model contains 11,650 points and 374 cameras. This model was then reconstructed again by tenfold cross-validation: all images used in the reference model were randomly partitioned into 10 sub-samples of equal size. For each new reconstruction, a newly selected single sub-sample containing 10% of original images was used as test data; the remaining 90% of images were used to reconstruct a model.

4.3 Testing process

To test the hypothesis, the following sequence of steps was applied to eight test reconstructions consisting of the largest amount of points:

  1. 1.

    Align each model to the map to estimate their scaling factors relatively to the real-world coordinate system.

  2. 2.

    Align the test reconstruction to the reference reconstruction. For that, we apply the estimated scaling parameters to the test and the reference models.

  3. 3.

    Estimate the translation between the models by calculating the difference between the models’ centroids.

  4. 4.

    Refine translation and rotation by applying the iterative closest point (ICP) algorithm [38].

  5. 5.

    Estimate a position of each image not used for the reconstruction and record the matching time. To estimate the location of an image, SIFT features are extracted from it. Correspondences between the features and points in a 3D point cloud are determined. Since some of the found correspondences are matching outliers, the pose estimation procedure is wrapped in a RANSAC loop. RANSAC picks a random subset of matches and uses them to generate a hypothesis about the pose. It then tests the hypothesis against the full set. If the number of matches is large enough, RANSAC terminates returning the set of inliers and a pose estimated from them.

  6. 6.

    Use the corresponding positions of the reconstructed images from the reference model to estimate the localization error of each image. The error is calculated as the distance between the estimated position and the reference position in 2D (as we localize the user in 2D, the z-component is omitted).

  7. 7.

    Apply the three outlier removal methods to the aligned test reconstruction. Repeat 5 and 6 with the resulting models.

To measure the matching time, we conducted the localization experiment 10 times on each of the test cases. All tests were computed on a single core of a PC equipped with the Intel Core i7 CPU running at 2.00 GHz.

4.4 Performance measures

Firstly, we observe the performance of outlier removal methods themselves according to the percent of points removed P r by each method and in terms of time T o required for a method to remove outliers.

Secondly, we evaluate the performance of localization process. For that, we distinguish between efficiency and quality indicators. Our goal is to achieve a trade-off between those two groups.

Efficiency indicators refer to performance in terms of processing time and memory requirements and estimate matching time T m (in seconds) and model’s size S m (in KB) accordingly. In order to show the changes in performance caused by the application of a certain outlier removal method, we introduce the parameters for changes in matching time Δ T m0j and space requirements Δ S m0j , defined as

$$ \Delta T_{m0j} = \frac{T_{m0}-T_{mj}}{T_{m0}} \times 100\% $$
$$ \Delta S_{m0j} = \frac{S_{m0}-S_{mj}}{S_{m0}} \times 100\% $$

where j=1,…,4 corresponds to a model in a test case. A test case contains four models: one model before outlier removal and three after different outlier removal methods applied.

Quality indicators describe localization performance associated with a certain model.

Let n be a total number of test images associated with a certain tested model. Given a test image contained in the reference model, an image is considered as matched if it is possible to reconstruct its position p in the tested model. Accordingly, n m is the total number of matched images in the model. A match is considered as correct if the positioning error, estimated as a distance between a reconstructed position p and its corresponding position p 0 in the reference model, is less than a threshold τ

$$ \left\Vert p_{0} - p \right\Vert < \tau $$

We set τ=1.6 m (2–3 human steps).

The number of correct matches n c is estimated as

$$ n_{c}=\sum\limits_{i=1}^{n_{m}}\left[\left\Vert p_{0i} - p_{i} \right\Vert < \tau \right] $$

Then, the matching rate R is calculated as the ratio of the number of correct matches n c and the total number of images n

$$ R = \frac{n_{c}}{n}\times100\% $$

The matching error E is the average value of all positioning errors of the correct matches:

$$ E = \frac{\sum\limits_{i=1}^{n_{m}}\left\Vert p_{0i} - p_{i} \right\Vert \left(\left\Vert p_{0i} - p_{i} \right\Vert < \tau \right)}{n_{c}} $$

Based on these two indicators, we estimate weighted matching error E w , which is used as an ultimate indicator for the quality of localization

$$ E_{w} = wE $$

where w is a weighting coefficient of a certain model.

For each jth model in a test case, where j=1,…,4, the coefficient w j is calculated as follows

$$ w_{j} = 1 - \frac{R_{j}-\text{min}\{R_{1}, \ldots, R_{4}\}}{100\%} $$

In fact, the ICP alignment of a test model to the reference model might contain an error up to 1 m. Thus, the absolute values of localization measurements might not be precise. However, as we use always the same alignment, the positioning errors are estimated in the same coordinate system within a test case; hence, the correct estimate of relative errors is possible. As we are interested in comparing the quality of localization, our final quality indicator is

$$ \Delta E_{w0j} = E_{w0}-E_{wj} $$

where E w0 is the weighted localization error associated with the reference model, and E wj (j=1,…,3) are the corresponding weighted errors in localization using the models after the outlier removal methods applied.

We run one-way analysis of variance (ANOVA) on the entire sample of positioning errors to see whether the changes in positioning performance are significant or not, depending on the outlier method applied.

5 Results

5.1 Outlier removal

According to visual inspection, each of the approaches is able to reduce noise while preserving the model structure (Fig. 3). Comparing to the original models containing sparse outliers, the outcomes of all outlier removal methods look clean. Some wall fragments containing relatively fewer feature points than other parts might be missing; however, the basic structures are always preserved.

Fig. 3
figure 3

Comparison of outlier removal effect on 3D model. a Original 3D model footprint and 3D footprints after b density-based, c connectivity-based, and d distance-based outlier removal methods applied. For a better visibility, the models are rotated upright according to the pre-computed model’s gravity vector

On average, the density-based method classified the biggest number of points (33.3% of the initial number) as outliers, while the smallest result was obtained by the distance-based method (10.2%) (Table 1).

Table 1 Evaluation results. Δ T m0j , Δ S m0j , and Δ E w0j are calculated with the Eqs. 3, 4, and 11, respectively

Regarding the outlier removal time, on average, our distance-based approach (Fig. 4, blue) outperforms the density-based approach (Fig. 4, red) to around 45% for all models regardless of the number of points they contain. The computational time of connectivity-based approach (Fig. 4, green) grows in a polynomial way with increase of model’s size. Hence, for a model consisting of about 10,000 points, outlier removal will take approximately 7 s.

Fig. 4
figure 4

Runtime performance T o of outlier removal methods. Outlier removal was applied to 15 models with different sizes

5.2 Computation and storage requirements

The experiment has shown that in all cases, the reduction of outliers leads to the noticeable improvement in matching time T m (Fig. 5, top-left panel) and has a positive impact on model’s size S m (Fig. 5, top-right panel), comparing to the performance associated with a model before outlier removal.

Fig. 5
figure 5

Evaluation results associated with the models. (i) Original model before outlier removal and the models after (ii) density-based, (iii) connectivity-based, and (iv) distance-based methods applied. Average matching time is the average of T mj returned by each jth test case; average file size—the average of all S mj . Average error of localization is the average of all E j defined by Eq. (8), and average weighted error is the average of all E wj defined by Eq. (9)

The benefits in matching time Δ T m0j and storage requirements Δ S m0j are proportional to the number of points P r removed from the model (Table 1).

5.3 Quality of localization

For the extreme case (the density-based approach, removing 33.3% of points from the model), the probability to locate an image with a precision up to 1.6 m was 70%. Using this threshold, the absolute error values were below 0.56 m for all of the cases (Fig. 5, bottom-left panel).

The average localization error resulted as the lowest (0.51 m) for our outlier removal method (Fig. 5, blue bar on the bottom-left panel). At the same time, taking into account the matching rate, the relative weighted localization error tended to increase for the methods classifying a greater number of points as outliers compared to the reference model (Table 1). The ANOVA test with 3 degrees of freedom applied on the entire set of positioning errors resulting in the F value of 0.32 and P value of 0.8 has shown that there is no evidence in difference in the mean values of positioning errors depending on the outlier removal method.

6 Discussion

The problem of outlier removal in photogrammetric point clouds in the context of image-guided localization has not been studied exhaustively before. This study encourages using outlier removal in the applications, where matching time and storage requirements are important constraints for usage. Within our study, two approaches initially designed for point clouds generated with a laser scanner have been implemented and shown applicable for photogrammetric point clouds, too. Hence, we assume that our distance-based approach designed and tested with photogrammetric point clouds is also applicable for laser-scanned point clouds.

The average error of our localization is 0.56 m (Fig. 5, red bar on the bottom-left panel) including the loss in quality of 8 cm (Table 1) after outlier removal. Furthermore, this value additionally accumulates an error gained in the process of alignment to the reference model, which we are unable to extract from the final result. Comparing our results with the usual performance of GPS, when the positioning error can be up to several meters, we consider the loss in quality of 8 cm as reliable and acceptable. The ANOVA test confirms those losses as insignificant.

Together with the fact that the conducted experiment has shown obvious benefits of outlier removal in terms of matching time and space requirements, it makes us believe that our initial hypothesis holds.

Outlier removal can be applied to numerous tasks of image-based navigation, such as navigation of blind, navigation in the environments where GPS is unavailable (e.g., indoor) or unreliable (e.g., narrow streets with tall buildings), and recognition of landmarks and virtual tours. Not only user-oriented positioning tasks may benefit from outlier removal but also, for example, it may find its use in video-based tracking tasks in medical applications (e.g., colonoscopy, bronchoscopy, panendoscopy). Furthermore, outlier removal is good for applications requiring scene visualization.

In this work, we have evaluated three methods. However, it is not that easy to select the one suitable method for universal use. The superior method certainly is application-dependent. Thus, if a navigational system is equipped with supporting sensors (e.g., accelerometer, gyroscope) and algorithms (e.g., landmarks-based positioning correction) allowing for the adjustment of positioning results, then the fastest method shall be chosen (the density-based method). Otherwise, depending on the required precision, a robuster method would be preferable (our distance-based method). The connectivity-based approach returns also good results; however, due to its cubical algorithmic complexity, the approach is not suitable for the applications requiring iterative outlier removal in the point clouds containing hundreds of thousands of points.

The distance-based method leads to benefits in computational time and storage requirements of about 10%. In defense of the feasibility of using this method, we can say that using this 10% of time improvement, it is possible to match 10% more images, which will lead to a robuster positioning. However, for a better justification, a user study with a working prototype is required. It is necessary to investigate user reaction on the system’s performance in terms of the tolerance for waiting time and positioning error. This will be addressed in the future work.

Another future task is incorporation of outlier removal into the bundle adjustment process. Iteratively applying outlier removal after each new nth image (e.g., n=100) might decrease the number of erroneously reconstructed models.

Furthermore, from the perspective of increasing the efficiency of mobile image-based navigation, we believe that a right choice of descriptors (e.g., SIFT, SURF [39], ORB [40], BRISK [41]) may also reduce computational time and models’ size. This is a subject of our additional study.

Another method for reducing the number of required matches, and thereby decreasing the time for localization, is pruning the search space. This will be achieved by reducing the points to an area within a certain range around the most likely position (e.g., based on prior position and trajectory). A careful evaluation will be needed to investigate the trade-off between positioning accuracy and matching time. An iterative approach with a growing region around the estimated position is also possible, as the most expensive calculation is the matching process. One can also use the direction from which a point is seen to further reduce the number of eligible points.

Image-based navigation has all chances to become available on a consumer level with a help of modern mobile devices. There are many ways of improving the technology and, with additional optimizations, the task of image-guided navigation has a chance to be performed in real time. Moreover, with the further hardware development, all computational complexity can be shifted to a mobile device, and the models can be stored in the device’s memory, which will eliminate the bottleneck of wireless communications between the device and the server and will enable the technology usage when device is offline.

7 Conclusions

We managed to prove our hypothesis that outlier removal in 3D point clouds is beneficial for image-guided mobile navigation. Reduction of the number of points in the models yields to computational speedup and also enables to store more models on a single device, while the changes in positioning accuracy remain unchanged.

8 Endnote

1 http://www.gdgps.net/