# An Epipolar Geometry-Based Approach for Vision-Based Indoor Localization

## Abstract

Indoor positioning is getting more and more attention and research. We propose an epipolar geometry-based method for vision-based indoor localization using images. It needs an image collected in the positon that is aiming to localize. It uses SURF to pick up the feature points and filtrate them to remain good ones and get rid of bad ones. The good feature points are used to match the feature points in the database. (The feature points are selected by the images whose positions are already known). We use the matched feature points to calculate the essential matrix that include the translation information and rotary information. Then we can complete the localization by the relationship between the query image and the images in the database. What’s more we use the feature points to replace the images to build the database aiming to reduce the space and speed up the localization.

## Keywords

Indoor localization Epipolar geometry SURF Essential matrix## 1 Introduction

Localization system (an aggregation of interconnected or device to determine spatial coordinates) is able to ensure the simultaneous observation for four satellites at anytime and anywhere and then collects the longitude and latitude of the observation point to achieve the purpose of navigation. The technology allows the cars, ships, airplanes and human beings to arrive at the destination safely and accurately according to the measurement route.

Nowadays, people cannot live without positioning service. The majority of positioning system are outdoor localization, however, indoor positioning service has been the hot project to research due to the large demand.

According to the scientific investigations, 80% external spatial information is related to visual sense and human beings have their own method to digest and absorb the huge amount visual information. In fact, the cerebral cortex has processed and analyzed the information as while as the information is collected by the eyes. The visual information is translated into the neural impulse signal by the photoreceptor cell and then is passed into the cortex through signaling nerve fibers, at last, the useful information is selected to human beings. With the development of the picture processing technology, the computers are vested the function of human eyes, using the computers to process the visual information collected by human eyes, therefore the front edge technology, computer vision emerges.

One heated application in computer vision science is image-based localization. It can classify all the proposed methods in two groups. In one category, researchers take advantage of the landmarks (such as logos) present in the environment to estimate the camera matrix and extract the query location [3, 4]. These methods can only be applied in environments where highly detectable logos are present and their 3D coordinates measurement is possible. Another category includes the works that use a stored image database annotated with position information of the cameras, i.e. image fingerprinting-based methods such as [9]. Upon receiving the query image, feature extraction and matching (using features such as SURF [10], SIFT, corners, etc.) between the query image and the all database images are performed.

Calculating the essential matrix is of vital importance in the system. First we can calculate the fundamental matrix (unnormalised essential matrix) by matched feature points. The linear algorithm is the easiest and the simplest one to estimate the fundamental matrix among the various algorithms. One of the most famous linear algorithms is the ‘eight-point algorithm’ proposed by Loguet-Higgens in 1981. The ‘eight-point algorithm’ took less time but the algorithm was very sensitive to the noise as a result it was difficult to be applied in practice. Hence, one kind of non-linear algorithm was given by Faugeras in 1992, which was more stable and precise than ‘eight-point algorithm’. After Faugeras, Hartley improved ‘eight-point algorithm’ by using ‘adjust eight-point algorithm’, which means applying ‘eight-point algorithm’ to calculate the fundamental matrix after standardizing the pre-matching points through the method of normalization. Therefore, the ‘adjust eight-point algorithm’ was widely applied because the robustness and the sensitivity to the noise were reduced. Additionally, ‘seven-point algorithm’ was discovered to estimate the fundamental matrix by seven pairs of the matching points. The algorithm was also sensitive to the noise though it declined the calculated amount. So we prefer the ‘adjust eight-point algorithm’ after the comparison.

## 2 Selecting and Matching Interest Points

There are several algorithms to select and to match the feature points in the query images and the images in the database, such as SIFT, SURF and so on.

SIFT (Scale-invariant Feature Transform) is an algorithm used to detect the part-feature in an image. It get the feature of the images by selecting interest points and related scale descriptor and orientation descriptor, and works well. This algorithm is not only characterized by scale invariance, but even if it changes the angle of rotation, the image brightness or the angle of camera, it can still get a good test result.

SURF (Speeded-up Robust Features) was built on the basis of sift, which not only improves computing speed but also makes it more secure and robust. The extraction of feature points is related to the properties of the image obtained, and also relates to the feature point matching method. Commonly used feature extraction are angular point features (such as Harris operator), line features (image edge detection), local area (spots), invariant features (such as scale invariant features). Given the interior factors such as complex background environment, illumination, this paper put forward by the Bay rapid scale invariant feature extraction algorithm of SURF, robustness to illumination changes and image changes must be better to extract the feature points, and scale invariance better relative to Harris, SIFT relatively low time complexity [8, 9, 10].

### 2.1 SURF Interest Points’ Selecting and Matching

### 2.2 SURF Feature Descriptors

In order to make the characteristic have better rotational invariance, it is necessary to give each feature a main direction, the concrete method is: (1) The Haar wavelet response of each point need calculating in a circular region of six times the radius of this interest point. (2) We add all the Haar wavelets’ d*x* and d*y* in the region of \( \frac{\pi }{3} \), up to a new vector \( \left( {m_{w} ,\theta_{w} } \right) \), where

\( m_{w} = \sum\limits_{w} {{\text{d}}x} + \sum\limits_{w} {\text{dy}} \), \( \theta_{w} = \arctan \left[ {\frac{{\sum\limits_{w} {{\text{d}}x} }}{{\sum\limits_{w} {{\text{d}}y} }}} \right] \), the longest vector represent the main direction. We set up the coordinate system based on the interest point and the main direction, and take the four adjacent squares, computing the vector *V*. \( V = \left[ {\begin{array}{*{20}c} {\sum {{\text{d}}x} } & {\sum {\left| {{\text{d}}x} \right|} } & {\sum {{\text{d}}y} } & {\sum {\left| {{\text{d}}y} \right|} } \\ \end{array} } \right] \). All the 16 vectors add up to a 64 d vector, called a feature descriptor, which is seldom influenced by rotational invariance.

### 2.3 Interest Points Matching

In the equation, m is on behalf of the dimension. The smaller D, the higher the similarity. Because the need of interest points to calculate the fundamental matrix are only 8 pairs. We can choose the ratio of the nearest distance and next distance to filtrate the best 8 pairs of interest points (the ratio smaller, the matches better).

## 3 The Epipolar Geometry in Localization

After searching, matching and screening the interest points, we have enough quality material to computing the epipolar geometry between the query image and images in the database. Because the location of images in the database is already known, we can get the coordinate of query image finally.

### 3.1 A Brief Introduction of Epipolar Geometry

Epipolar geometry displays the geometrical relationship between the two images in the same scene. It is independent from the structure of the scene and relies on the camera parameters. Therefore, epipolar geometry is the inherent projective properties between the two images. Epipolar geometry is able to be widely applied to the domains including images matching and three-dimensional reconstruction. During the images matching process, the substantial purpose of the algorithm is to recover epipolar geometry.

The stereo vision of epipolar geometry has the same start point and objective of that entitles the function of human eyes to the computers and intelligent robots. Hence, the operating principle of epipolar geometry stereo vision positioning is similar to that of human eyes which means that people take a picture of a specific scenery and match the character points of existing pictures taken from different angles in the photo gallery. At last, 3D geometry information is supposed to be restored by calculating the position deviation between image pixels through triangulation principle. Epipolar geometric measurement is based on the parallax to obtain 3D information by the triangulation principle.

Assume a specific scene that a point X has its projections in the two cameras respectively in the three-dimensional space. The left view projection image is called the left view image and the right view projection image is called right view image. Let C and C′ be the optical centers for the two cameras respectively, x and x′ be the image points of the point X. The line connects optical center C and C′ called the baseline and the line of CC′ crosses the two pictures at points e and e′ known as epipole. C and C′ are coplanar and the plane called epipolar plane. The intersection of epipolar plane and image plane l and l′ called the epipolar. X (x′) are also on the epipolar plane \( \pi \) and the image plane simultaneously, therefore, l (l′) must pass through x (x′). Therefore, x (x′) can be found on the epipolar l l′ and it is not necessary to search the x (x′) on the whole image plane. This provides an important epipolar constraint reducing the searching space of the corresponding points from 2 dimensions to 1 dimension. When the points in three-dimensional space move from one place to the other place, all the generated epipolars pass through epipoles e(e′) which are the intersection points of baseline and image plane.

### 3.2 The 8-Point Algorithm

^{T}Fx = 0, namely x′ transposed to multiply F, multiplied by the result of X is 0, then F is the fundamental matrix on the left image to the right image, as can be seen from the formula on the basic matrix is a direction, the right to the left of the fundamental matrix is F

^{T}. F has the following special properties:

- (1)
The rank of F is 2.

- (2)
As a \( 3 \times 3 \) matrix F has 7 degrees of freedom.

Normally a \( 3 \times 3 \) matrix has 9 degrees of freedom. Because of a constant factor and 0-value determinant, F has 2 less degrees of freedom. In detail, if F is a fundamental matrix, kF is also a fundamental matrix. So the fundamental matrix is not unique, and naturally it reduces a degree of freedom.

According the analysis in the introduction (Sect. 1), we choose ‘adjust eight-point algorithm’. The advantage of the eight-point algorithm is that it is linear, easy to implement, and computes faster.

_{i}, x

_{i}′(i = 1, 2…, 8) for 8 pairs of interest points and \( f_{ij} (1\, \le \,i\, \le \,3,1\, \le \,j\, \le \,3) \) for F. (x′)

^{T}Fx = 0. m

_{i}can be normalized as \( \left( {u_{i} ,v_{i} ,1} \right) \). The we get Eq. 4.

*f*, A

*f*= 0.

There is only a difference of constant factor among all the f, so we add a constraint: \( \left\| f \right\|\,\text{ = }\,1 \). So f is the eigenvector of the minimum eigenvalue of *A*^{T}*A*. \( A = UDV^{T} \) (SVD decomposition). \( V = \left[ {\begin{array}{*{20}c} {v_{1} } & {v_{2} } & {v_{3} } & {v_{4} } & {v_{5} } & {v_{6} } & {v_{7} } & {v_{8} } & {v_{9} } \\ \end{array} } \right] \), \( f = v_{9} \). Finally the F is obtained.

### 3.3 Subsequent Steps

_{g}, we have:

## 4 Experimental Results

In our experiments, database images were taken inside a room at the locations depicted in Fig. 4 using a cellphone camera. There are 4 lines in total, meaning that we calculate the epipolar geometry between the query images and four images in the database. It’s obvious that we need at least two lines to complete the localization. But if one line is wrong, the result will be badly affected. We need more lines to make the result more stable, but we are going to have a larger calculation.

Average error in different conditions (cm)

Scenario | Random matches | Screened matches |
---|---|---|

1 | 227 | 96 |

2 | 158 | 75 |

3 | 164 | 74 |

4 | 142 | 76 |

## 5 Conclusion

In the paper we proposed an Epipolar geometry-based method for fine location estimation in vision-based localization applications applicable to pose-annotated databases. We use SURF algorithm to select and match the interest points as the material to calculate the fundamental matrix. Then we use 8-point algorithm and some other methods based on epipolar geometry. After obtaining the positioning result, we analyzed the factor that affect the average error and make a tradeoff in choosing the number of lines. We also get the conclusion that screening the matches in SURF algorithm is of vital importance. Finally we control the average error in an acceptable range, and get a good result.

## References

- 1.Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell.
**26**(6), 756–770 (2004)Google Scholar - 2.Yang, J., Chen, L., Liang, W.: Monocular vision based robot self-localization. In: 2010 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1189–1193. IEEE (2010)Google Scholar
- 3.Muramatsu, S., Chugo, D., Jia, S., Takase, K.: Localization for indoor service robot by using local-features of image. In: ICCAS-SICE, pp. 3251–3254. IEEE (2009)Google Scholar
- 4.Bay, H., Ess, A., Tuytelaars, T., et al.: Speeded-up robust features (SURF). Comput. Vis. Image Underst.
**110**(3), 346–359 (2008)Google Scholar - 5.Nicole, R.: Title of paper with only first word capitalized. J. Name Stand. Abbrev. (in press)Google Scholar
- 6.Yorozu, Y., Hirano, M., Oka, K., Tagawa, Y.: Electron spectroscopy studies on magneto-optical media and plastic substrate interface. IEEE Transl. J. Magn. Japan
**2**, 740–741 (1987). Digests 9th Annual Conference of Magnetics Japan, p. 301, 1982Google Scholar - 7.Young, M.: The Technical Writer’s Handbook. University Science, Mill Valley (1989)Google Scholar
- 8.Horaud, R., Conio, B., Leboulleux, O., Lacolle, B.: An analytic solution for the perspective 4-point problem. Comput. Vision Graph. Image Proces.
**47**(1), 33–44 (1989)Google Scholar - 9.Wang, J., Zha, H., Cipolla, R.: Coarse-to-fine vision-based localization by indexing scale-invariant features. IEEE Trans. Syst. Man Cybern. Part B Cybern.
**36**(2), 413–422 (2006)Google Scholar - 10.Liqin, H., Caigan, C., Henghua, S., et al.: Adaptive registration algorithm of color images based on SURF. Measurement
**66**, 118–124 (2015)Google Scholar - 11.Harris, J.M., Nefs, H.T., Grafton, C.E.: Binocular vision and motion-in-depth. Spat. Vis.
**21**(6), 896–899 (2014)Google Scholar - 12.Tourap, A.M.: Enhanced predictive zonal search for single and multiple frame motion estimation. In: Visual Communications and Image Processing (2012)Google Scholar
- 13.Olson, C.F., Abi-Rached, H., Ye, M., Hendrich, J.P.: Wide-baseline stereo vision for mars rovers. In: Proceedings Of the 2003 IEEE/RSJ International Conference On Intelligent Robots And Systems, vol. 2, pp. 1302–1307, October 2003Google Scholar