Keywords

1 Introduction

Sensors that provide wider FoV are preferred in the robotic applications, as both navigation and localization can benefit from the wide FoV [1,2,3]. This paper presents a novel camera system which provides scenes of 360\(^{\circ }\) FoV and the depth information concurrently. The system minimizes the use of equipment, image-data and it has the ability to acquire sufficient information of the scene. The robotic applications of wide FoV camera system are mainly mapping [4, 5], object tracking [6], video surveillance [7,8,9], virtual reality [10] and structure from motion [11, 12].

1.1 Proposed System

The proposed vision system consists of two fisheye cameras (omnidirectional camera) [13, 14] with each has 185\(^{\circ }\) FoV. The cameras are placed back to back so that they cover the whole 360\(^{\circ }\) of the scene. A high-resolution stereo vision camera, named ZED [16] is placed in front of the rig so that its baseline is in parallel with the baseline of the fisheye cameras. Figure 1 shows the proposed system and the predicted FoV from camera rig.

Fig. 1.
figure 1

The front view of the proposed vision system. The illustration (a) and (b) are the predicted FoV viewed from top and right side of the camera rig. The gray color are referred to the FoV of fisheye cameras. The red and green colour are referred to the FoV of stereo vision camera. (Color figure online)

Fig. 2.
figure 2

Block acquisition contains the proposed camera system with the images of fisheye and ZED cameras. Block calibration consists of calibration method to estimate intrinsic and extrinsic parameters. Block fusion consists of the method to fuse all images onto unit sphere. Block result consist of final result with image from fisheye and ZED cameras fused together.

The major contributions of this paper are:

  1. 1.

    We proposed an omni-vision system alongside with a stereo-camera, which offers immense information on 360\(^{\circ }\) FoV of the environment as well as detailed depth information.

  2. 2.

    A new camera calibration method taking the advantages of Unified Camera Model representation has been proposed, which outperforms the state-of-the-art methods.

  3. 3.

    An Interior Point Optimization algorithm (IPO) based on pure rotation matrix estimation approaches has been proposed to fuse the two fisheye and ZED images, which offers seamless images stitching results.

  4. 4.

    A projective distortion has been proposed to be added to the ZED image before projecting onto the unit sphere, which results in enhancing the quality of the overlapping image.

2 Methodology

2.1 Unified Spherical Camera Model

The unified spherical camera model has been proposed by Geyer [14] and Barreto [17]. The image formation of dioptric camera was effected by the radial distortion. Due to that, the point on the scene is nonlinear with the point in the dioptric image. The 3D points \(\chi \) are projected to the image point \(\mathbf x \) using the pin-hole camera model representation. It is also considered the representation of a linear and non linear transformation mapping function which is depending on the type of camera. The model was extended by Mei [18] and an omnidirectional camera calibration toolbox [23] has been developed. This model has been used as a reference to map the image on the unified spherical model. All points m are projected to the image plane using K, which is a generalized camera projection matrix. The value of f and \(\eta \) should be also generalized to the whole system (camera and lens).

$$\begin{aligned} \small p=Km= \left[ \begin{array} {ccr}f_1\eta &{} f_1\eta \alpha &{} u_0 \\ 0 &{} f_2\eta &{} v_0 \\ 0 &{} 0 &{} 1 \end{array} \right] m , \end{aligned}$$
(1)

where the \([f_1,f_2]^T\) are the focal length, \((u_0,v_0)\) are the principal point and \(\alpha \) is the skew factor. By using the projection model, the point on the normalized camera plane can be lifted to the unit sphere by the following equation:

$$\begin{aligned} \small \hslash ^{-1}(m)= \begin{bmatrix} \frac{\xi +\sqrt{1+(1-\xi ^2)(x^2+y^2)}}{x^2+y^2+1}x\\ \frac{\xi +\sqrt{1+(1-\xi ^2)(x^2+y^2)}}{x^2+y^2+1}y\\ \frac{\xi +\sqrt{1+(1-\xi ^2)(x^2+y^2)}}{x^2+y^2+1}-\xi \end{bmatrix}, \end{aligned}$$
(2)

where the parameter \(\xi \) quantifies the amount of radial distortion of the dioptric camera.

2.2 Camera Calibration Using Zero-Degree Overlapping Constraint

A new multi-camera setup is proposed, where two 185\(^{\circ }\) fisheye cameras are rigidly attached in opposite direction to each other. Since the fisheye camera has more than 180\(^{\circ }\) of FoV, the proposed setup contributes an overlapping area along the periphery of the two fisheye cameras. Taking the advantage of overlapping FoV of the two fisheye cameras, we propose a new fisheye camera calibration using the constraint on overlapping zero-degree with Unified Camera Model under the following assumption:

  • If \(\xi \) is estimated correctly, the 180\(^{\circ }\) line of the fisheye camera should ideally lay on the zero degree plane of the unit sphere. See Figs. 3 and 4.

  • A correct calibration (registration) of multi-fisheye camera setup contributes to a correct overlapping area.

Fig. 3.
figure 3

Experimental setup to calibrate the value of \(\xi \). The baseline of the camera rig, noted as b (from left fisheye lens to right fisheye lens) is measured and two parallel lines with the same distance to each other as well as a centre line is drawn on a pattern. The rig is faced and aligned in front of the checkerboard pattern such that the centre line touches the edges of both fisheye camera images.

Fig. 4.
figure 4

The left image was projected with initial estimate of \(\xi \). The 180\(^{\circ }\) lines should ideally lay on the zero plane. After the iterative estimation of \(\xi \), the 180\(^{\circ }\) linear now lay on the zero plane.

Pure Rotation Registration. One of the major objective of our setup is to produce a high quality 360\(^{\circ }\) FoV unit sphere which ables to handle the visualization. A common way to do this is to calibrate the camera setup such that the relative poses between the cameras are known. Let the features from the left and right fisheye cameras (projected onto the unit sphere) be denoted as \(\mathbf{x }^{Lf}\) and \(\mathbf{x }^{Rf}\), respectively. The transformation between the two fisheye cameras is noted as \({T}\in R^{4\times 4}\), such that:

$$\begin{aligned} \mathbf{x }^{Lf} = {R} \mathbf{x }^{Rf} , \end{aligned}$$
(3)

The estimation of the transformation matrix, T as discussed in [19, 21, 22] contains the rotation R and also the translation t.

In our method, we used a pure rotation matrix to solve this problem by enforcing the transformation matrix contains zero translation [20], which represented as:

$$\begin{aligned} \underset{R}{min} \sum ^n_{i=1} \varPsi ({\Vert \mathbf x ^{Lf} - {R} \mathbf x ^{Rf} \Vert }), \quad \mathrm {s.t.} \quad RR^\mathsf {T} = 1, \mathrm {det}(R) = 1 , \end{aligned}$$
(4)

where R is the desired pure rotation matrix, \(\varPsi (\cdot )\) is the Huber-Loss function for robust estimation. By solving the above equation, a pure rotation matrix that minimizes the registration errors between fusion of the two fisheye cameras. Here, we adopt the Interior Point Optimization algorithm (IPO) to solve the system.

Fusion of Perspective Camera onto Unit Sphere. A new method has been proposed to fuse perspective image onto the unit sphere. A major difference between perspective and spherical images is the existence of distortion which deforms the object on the scene. The direct matching point from the perspective is deficient to match the features on the unit sphere. This due to the characteristic of unit spheres which has several levels of distortion. The projective distortion parameters have been proposed to be added to the perspective image plane before projecting onto the unit sphere.

$$\begin{aligned} \small {\begin{matrix} P=K^{-1}\cdot H \cdot I, \begin{bmatrix} x\\ y\\ 1 \end{bmatrix}=K^{-1}\cdot H \cdot I, \end{matrix}} \end{aligned}$$
(5)

where, H is a projective distortion parameter. K is a camera matrix. I is an image frame.

$$\begin{aligned} \small \begin{bmatrix} x\\ y\\ 1 \end{bmatrix}= \begin{bmatrix} f_{x}&\delta&\upsilon _{o}\\ 0&f_{y}&\nu _{o}\\ 0&0&1 \end{bmatrix} \cdot \begin{bmatrix} h_{11}&h_{12}&h_{13}\\ h_{21}&h_{22}&h_{23}\\ h_{31}&h_{32}&h_{33} \end{bmatrix} \cdot \begin{bmatrix} \upsilon \\ \nu \\ 1 \end{bmatrix}, \end{aligned}$$
(6)

the value x, y and \(\xi \) (\(\xi =0\) for the perspective camera) are replaced into mapping function (in Eq. (2)) for projecting onto the unit sphere.

Fusion of Multi-camera Images. In our multi-camera setup, the fusion of fisheye cameras alongside with the ZED camera based on a unified model representation can be achieved in a similar manner.

Let \(\mathbf x ^z\) and \(\mathbf x ^{Lf}\) be the feature correspondences (mapped from \(\chi ^z\) and \(\chi ^{Lf}\)) on a Unified Sphere. The fusion of the ZED camera and the fisheye camera can be framed as a minimization problem of the feature correspondences on a unified sphere, which is defined as:

$$\begin{aligned} \underset{R}{\text {argmin}} \sum ^n_{i=1}\varPsi \left( \left\| \mathbf x ^{Lf}-\mathbf x ^{Z}({R}) \right\| _{2} \right) , \end{aligned}$$
(7)

where \(\varPsi (\cdot )\) is the Loss function for the purpose of robust estimation, while

$$\begin{aligned} \small \chi (\theta _{x,y,z})= R({\theta }_{x,y,z}) \begin{bmatrix} x_{s}&.&.&.&.&x_{s}^{n} \\ y_{s}&.&.&.&.&y_{s}^{n} \\ z_{s}&.&.&.&.&z_{s}^{n} \end{bmatrix}, \end{aligned}$$
(8)

stands for the registration of ZED camera sphere points to the left fisheye camera (the reference), where \(R({\theta }_{x,y,z}) \) is the desired pure rotation matrix with estimated rotation angles \(\theta _{x,y,z}\). This can be solved on a similar manner solving Eq. (4), by applying an IPO algorithm.

2.3 Epipolar Geometry of Omnidirectional Camera

The epipolar geometry for an omnidirectional camera has been studied [17] and it was originally used as a model for catadioptric camera. The study was then extended to the dioptric or fisheye camera system. Figure 5 shows the epipolar geometry of fisheye camera. Lets consider the two positions of a fisheye camera observed from point P in the space. The points \(P_{1}\) and \(P_{2}\) are the projection point P onto the unit sphere image in two different fisheye’s positions. The points P, \(P_{1}\), \(P_{2}\), \(O_{1}\) and \(O_{2}\) are coplanar, such that:

(9)

where, \(O_{1}^{2}\) and \(P_{1}^{2}\) are the coordinates of \(O_{1}\) and \(P_{1}\) in coordinate system \(O_{2}\). The transformation between system \(X_{1}, Y_{1}, Z_{1}\) and \(X_{2}, Y_{2}, Z_{2}\) can be described by rotation R and translation t. The transformation equations are:

$$\begin{aligned} {\begin{matrix} O_{1}^{2}=R\cdot O_{1}+t=t, P_{1}^{2}=R\cdot O_{1}+t, \end{matrix}} \end{aligned}$$
(10)

\(O_{1}^{2}\) is the pure translation. By substituting (10) in (9) we get:

$$\begin{aligned} P_{2}^{T}EP_{1}=0 , \end{aligned}$$
(11)

where \(E=[t]_{\times }R\) is the essential matrix which consists of rotation and translation. In order to estimate the essential matrix, the points correspondence pairs on the fisheye images are stacked into the linear system, thus the overall epipolar constraint becomes:

$$\begin{aligned} Uf=0 \,,\, where\,\, U=[u_{1}, u_{2},\ldots , u_{n}]^{T}, \end{aligned}$$
(12)

and \(u_{i}\) and f are vectors constructed by stacking column of matrices \(P_{i}\) and E respectively.

$$\begin{aligned} P_{i}=P_{i}P_{i}^{'T} , \end{aligned}$$
(13)
$$\begin{aligned} E=\begin{bmatrix} f_1&f_4&f_7 \\ f_2&f_5&f_8 \\ f_3&f_6&f_9 \end{bmatrix}. \end{aligned}$$
(14)

The essential matrix can be estimated with linear least square by solving Eqs. (12) and (13), where \(P_{'}^{i}\) is the projected point which corresponds to \(P_{2}\) of the Fig. 5, U is \(n\times 9\) matrix and f is \(9\times 1\) vector containing the 9 elements of E. The initial estimated essential matrix is then utilized for the robust estimation of essential matrix. An iterative reweighted least square method [15] is proposed to re-estimate the essential matrix of omnivision camera. This assigns minimal weight to the outliers and noisy correspondences. The weight assignment is performed by the residual \(r_{i}\) for each point.

$$\begin{aligned} r{i}=f_{1}x'_{i}x_{i}+f_{4}x'_{i}y_{i} +f_{7}x'_{i}z_{i}+f_{2}x_{i}y'_{i}+f_{5}y_{i}y'_{i}+f_{8}y'_{i}z_{i}+f_{3}x_{i}z'_{i}+f_{6}y_{i}z'_{i}+f_{9}z_{i}z'_{i}, \end{aligned}$$
(15)
$$\begin{aligned} err \rightarrow \min _{f}\sum _{i=1}^{n}\left( w_{Si}f^{T}u_{i} \right) ^{2} , \end{aligned}$$
(16)
$$\begin{aligned} w_{Si}=\frac{1}{\nabla r_{i}} ,\, where \,\, r_{i}=(r_{xi}^{2}+r_{yi}^{2}+r_{zi}^{2}+r_{xi'}^{2}+r_{yi'}^{2}+ r_{zi'}^{2})^\frac{1}{2}, \end{aligned}$$
(17)

where \(w_{Si}\) is the weight (known as Sampson’s weighting) that will be assigned to each set of corresponding point and \(\nabla r_{i}\) is the gradient; \(r_{xi}\) and so on are the partial derivatives found from Eq. (15), as \(r_{xi}= f_{1}x'_{i}+f_{2}y'_{i}+f_{3}y'_{i}\).

Fig. 5.
figure 5

The diagram of epipolar geometry of fisheye camera for 3D reconstruction.

Once all the weights are computed, U matrix is updated as follow: \(U=WU\).

where W is a diagonal matrix of the weights computed using Eq. (16). The essential matrix is estimated at each step and forced to be of rank 2 in each iteration. The procrustean approach is adopted here and singular value decomposition is used for this purpose.

3 Experimental Results

3.1 Estimation of Intrinsic Parameters

The unknown parameters \(f_{1}\), \(f_{2}\), \(u_{0}\), \(v_{0}\) and \(\xi \) of fisheye cameras are estimated using the camera calibration toolbox provided by Mei [23]. The fisheye images are projected onto the unit sphere using the Inverse Mapping Function defined in Mei’s projection model. Figure 6(a) shows the two dimension(2D) images from a fisheye camera is projected onto the unit sphere.

Fig. 6.
figure 6

(a) The 2D fisheye image is projected onto unit sphere. (b) The selected points (green and red point) aren’t aligned together. (c) After use IPO algorithm, the green and red points are aligned with their respective point. (Color figure online)

3.2 Estimation of Extrinsic Parameters

Rigid Transformation Between Two Fisheyes Image. The overlapping features are taken along the periphery on the left and right fisheye images. The selected points are projected onto the unit sphere. The rigid 3D transformation matrix are estimated using the selected overlapping features. The IPO algorithm is used to estimate the rotation between the set of projected points. Figure 6(b) shows the set of projected points (green-left fisheye and red-right fisheye) aren’t aligned. After using IPO algorithm, the selected points are aligned together as shown in Fig. 6(c).

The rotation matrix is parameterized in terms of Euler angles and cost function is developed that minimize the Euclidean distance between the reference (point projections of left camera image) and the three dimensional points from the right camera image. The transformation results using Singular Value Decomposition (SVD) though are very close to pure rotation. It assumed that translation is also as a parameter to align the set of points. Figure 7 shows the fusion result. The points on the hemispheres that are beyond the zero plane are first eliminated. Then the transformation is applied on the hemisphere of the right fisheye camera and the point matrices are concatenated to get a full unit sphere.

Fig. 7.
figure 7

(a) The image from fisheye and ZED cameras are fused together. (b) The fusion result after applying the projective distortion on the ZED image. Focusing to the border between ZED and unit sphere, the ZED image is perfectly over-lapped on the unit sphere.

Rigid Transformation Between a Fisheyes and ZED Camera. The same procedures are used to estimate the transformation matrix between the image from ZED camera and two hemispheres.

As shown in Fig. 7(a), the RGB images from ZED camera are overlapped onto the unit sphere. It is also recovered the scale between the ZED and fisheye cameras.

The fusion has been enhanced by adding the projective distortion to the ZED image. Figure 7(b) shows that the result is much better after handling the distortion on the fisheye images.

3.3 Estimation the Three Dimensional Registration Error

The computation of registration error during mapping on the unit sphere is done to prove the registration method. The Root Means Square Error (RMSE) is used to calculate the error. The rigid 3D transformation matrix and parameter \(\xi \) which was obtained from calibration are used to determine the residual error of the point pairs registration on the unit sphere. Three methods have been compared:

  1. 1.

    IPO: Our method - The pure rotation estimated using feature matches and IPO algorithm.

  2. 2.

    SVD: The transformation matrix is estimated using features matches with SVD [21].

  3. 3.

    CNOC: Calibration Non Overlapping Cameras, Lébraly [19].

The image sequences were taken in several different environments. The feature points were selected on the overlapping area. The same data set are used in all three methods. Figure 8 shows that the proposed method has the lowest registration errors.

Fig. 8.
figure 8

The registration error estimation used three different methods. The images sequences have been taken inside and outside of the building. The average registration error using proposed method is 0.1612 (inside building) and 0.1812 (outside building).

3.4 3D Reconstruction Using the Camera Rig

The goal of triangulation is to minimize the distance between the two lines toward point P in 3D the space. This problem can be expressed as a least square problem.

$$\begin{aligned} \min _{a,b}\left\| aP_{1}-bRP_{2}-t\right\| , \end{aligned}$$
(18)
$$\begin{aligned} \begin{bmatrix} a^{*}\\ b_{*} \end{bmatrix}=\left( A^{T}A \right) ^{-1}A^{T}t, A=\left[ P_{1}-RP_{2} \right] , \end{aligned}$$
(19)

By referring to Fig. 5, looking from the first pose, point P, the line passing through \(O_{1}\) and \(P_{1}\) can be written as aP and the line passing through \(O_{2}\) and \(P_{2}\) can be written as \(bRP_{2}+t\), where \(a,b \in \mathbb {R}\), P is a world coordinate point. \(O_{1}\) and \(O_{2}\) are the camera center for pose 1 and 2. \(P_{1}\) and \(P_{2}\) are the point P on the unit sphere at pose 1 and 2. R and t are the rotation and translation between the two poses.

The 3D point P is reconstructed by finding the middle point of the minimal distance between the two lines. It can be computed by;

$$\begin{aligned} P_{k}=\frac{a^{*}P_{1}+b^{*}RP_{2}+t}{2} \,\,,\, where \,,\, k = 1, 2, 3, 4 \end{aligned}$$
(20)

Figure 9 shows the features matching points. All the points are selected manually. For the future works, an automated features matching points algorithm will be developed using the existing features descriptor. Figure 10 shows the results of feature matching and scene reconstruction algorithm developed following the spherical model of the camera.

Fig. 9.
figure 9

The features matching points between two different poses of fisheye cameras

Fig. 10.
figure 10

The front view (left) and top view (right) of the three dimensional reconstruction scenes

4 Conclusions

This paper proposed a new camera system which has 360\(^{\circ }\) FoV and detail depth information at anterior. The two fisheye cameras each 180\(^{\circ }\) FoV are placed back to back to obtain 360\(^{\circ }\) FoV. A stereo vison camera is placed perpendicular to obtain depth information at anterior. A novel camera calibration method taking advantages the Unified Spherical Model has been introduced to calibrate multi camera system. A pure rotation matrix based-on IPO algorithm has been used to fuse images from multi camera setup by exploiting the overlapping area. The result are reduced the registration error and enhance the quality of image fusion. The 3D reconstruction based on the spherical representation has been estimated using the proposed system.