1 Introduction

The epipolar constraint is one of the fundamental geometry constraints in computer vision. It relates the rigid transformation between two cameras with different external parameters [13, 21] and correspondences between points in the two images. It is one of the most common tools for scene reconstruction, known as passive techniques, i.e., two cameras looking at the same scene from different points of view. The epipolar constraint has been used in many other applications, such as visual odometry [34].

For many years, authors focused on perspective cameras to build this stereo pair [13]; see Fig. 1a. However, these cameras have, among several disadvantages, a limited field of view. To overcome this, some authors have developed new camera systems. Special emphasis has been given to omnidirectional cameras, which get a larger field of view from a combination of perspective cameras with mirrors and/or fisheye lenses [23, 31, 32], or multi-perspective camera systems [15, 17]. Most of these devices are non-central (see [42]). Other types of imaging sensors have been proposed. Nevertheless, the perspective camera model cannot model most of them due to their physical constraints. Examples include pushbroom cameras [11] or refractive imaging devices [43]. These kinds of cameras are called non-central. To handle both central or non-central systems, we consider a pair of camera systems that are parameterized by the general camera model [10, 24, 41]. In this model, camera pixels are mapped into generic 3D straight lines in the world. If all of these 3D lines intersect in a single point, the system is central (Fig. 1a); otherwise, it is non-central (Fig. 1b).

For a central camera stereo problem, one can define a \(3\times 3\) matrix that encodes the epipolar constraint, i.e., the incident relation between the projection lines of both cameras [13, 21]. This matrix is called essential. For the case of general camera systems, such a matrix cannot be used because the central constraints are not satisfied. Instead, it can be proved that there is a \(6\times 6\) matrix that expresses the incident relation between the projection lines of two general camera systems [25, 36, 40]. Such a matrix is called generalized essential and has a particular block structure, involving rotation and skew-symmetric matrices of order 3. Often, computer vision problems wherever this matrix has to be determined are affected with noise, and this may result in a matrix of order 6 that fails to fit the structure characterizing the generalized essential matrices. In these cases, one needs to find the closest generalized essential matrix from a generic \(6\times 6\) matrix. From a mathematical point of view, this is a nonlinear constrained matrix optimization problem. The fact that it is a non-convex problem raises many difficulties to find a global minimum.

Examples of applications requiring the estimation of a generalized essential matrix include: (i) the computation of the minimal relative pose for general camera models [39]; (ii) the estimation of the camera motion using multi-perspective camera systems [15, 17]; (iii) general structure-from-motion algorithms [28, 38]; and (iv) the estimation of the camera absolute pose using known coordinates of 3D straight lines [26]. Since the particular structure of generalized important matrices involves many nonlinear constraints (a rotation matrix and its product with a skew-symmetric matrix), finding the correct parameters of these matrices may slow down significantly the algorithms. An important advantage of having a method to approximate general essential matrices from generic \(6\times 6\) matrices is to allow us to get rid of some of those constraints (or, at least, to reduce the tolerance of these constraints in the optimization processes) to turn the algorithms faster.

Fig. 1
figure 1

Central versus general camera systems (a, b), respectively. This paper addresses the general case, i.e., camera system does not have a single view point

In this paper, we show that the problem under investigation can be reformulated as an optimization problem with only orthogonal constraints. Then, we present an efficient algorithm to find a solution. Also, we give theoretical arguments to demonstrate why the problem has a global minimum (some open questions will be pointed out). We recall that methods for problems with orthogonal constraints are available in the literature [5, 8, 14, 22, 44], but their difficulties depend significantly on the objective function. In our problem, the objective function is not easy to handle, raising many challenging issues. To address them, we resort to techniques from matrix theory, and optimization on manifolds.

Suppose we have estimated an \(6\times 6\) matrix \({{\mathbf {A}}}\) representing the epipolar geometry for general cameras, linearly obtained from the incident relation between of a set of inverse projection rays \({{\mathbf {l}}}_i^{({{\mathcal {L}}})}\) and \({{\mathbf {l}}}_i^{({{\mathcal {R}}})}\); i.e., building from the geometric constraints shown in Fig. 1b without imposing the structure of \(\varvec{{\mathcal {E}}}\). The method proposed in this paper aims at approximating a generalized essential matrix \(\varvec{{\mathcal {E}}}\), from a general \(6\times 6\) matrix \({{\mathbf {A}}}\), with the assumption that the latter was estimated by ignoring some of the generalized essential constraints. Our motivation for developing such a method is twofold. When, for some reason, the estimation of \({{\mathbf {A}}}\) does not consider some of the generalized essential matrix constraints or ignores them altogether (such as DLT techniques), methods such as ours are helpful to obtain a real generalized essential matrix. When a large tolerance for the generalized essential constraints is used to speed up the computation of \({{\mathbf {A}}}\), methods like ours can be utilized to correct the result. Experiments illustrating the latter situation are presented in Sect. 6.

For the central case, we recall that similar approximation techniques have been used for pose estimation. In [12], the author estimates a real essential matrix with respect to the Frobenius norm by firstly using DLT techniques to compute a rough estimative to the \(3\times 3\) essential matrix. The author proved that this method performs almost as well as the best iterative algorithms, being faster in many cases. More recently, other methods have been developed using similar approximation techniques for the central perspective camera; see, for instance [45, 46]. We believe that the method we are proposing in this paper will contribute to the development of similar techniques, but for general camera models. We recall that the goal of this work is not to propose a technique for the problem of estimating a generalized essential matrix from a set of projection lines (as did in [18] for general camera systems and [16] for multi-camera systems) but, instead, to estimate \(\varvec{{\mathcal {E}}}\) from a previously computed \({{\mathbf {A}}}\) by other techniques. The list of contributions of this paper are:

  1. 1.

    It is the first work that investigates the fitting of general essential matrices from general \(6\times 6\) matrices;

  2. 2.

    We define the problem at hand and present a natural formulation of it by defining a nonlinear constrained optimization problem;

  3. 3.

    Based on the problem definition, we derive an unconstrained optimization version that can be used to solve the problem at hand;

  4. 4.

    We change the initial formulation and write the problem as a function depending only on rotation unknowns;

  5. 5.

    We use the formulation derived in the previous item to obtain an unconstrained optimization problem; and

  6. 6.

    We propose an efficient method that iterates in the space of orthogonal matrices to obtain a solution to the problem. We prove that this method is the most efficient.

1.1 Outline of the Paper

This paper is organized as follows. We start by revisiting the similar, but simpler, problem of finding the closest essential matrix arising in central camera systems. Then, we formulate mathematically the problem under investigation in this work and explain how it could be solved straightforwardly. In Sect. 4, we reformulate the original problem and derive two solutions for it. In Sect. 5, our method is compared with direct solutions using synthetic data. Results with the motivation on the use of our technique are shown in Sect. 6. Experimental results are discussed in the respective experimental sections. Finally, in Sect. 7, some conclusions are drawn.

2 Notation

Small regular letters denote scalars, e.g., a; small bold letters denote vectors, e.g., \({{\mathbf {a}}}\in {\mathbb {R}}^{n}\); and capital bold letters denote matrices, e.g., \({{\mathbf {A}}}\in {\mathbb {R}}^{n\times m}\). In a matrix \({{\mathbf {A}}}\), \({{\mathbf {A}}}_{(i:j,k:l)}\in {{\mathbb {R}}}^{(j-i+1)\times (l-k+1)}\) denotes the submatrix composed by the lines from i to j and columns from k to l. We represent general projection lines using Plücker coordinates [37]:

$$\begin{aligned} {{\mathbf {l}}}\in {{\mathbb {R}}}^6 \sim \begin{bmatrix} {{\mathbf {d}}}^T&{{\mathbf {m}}}^T \end{bmatrix}^T, {\text {such that}}\;\; {{\mathbf {d}}}^T {{\mathbf {m}}} = 0 \end{aligned}$$
(1)

where \({\mathbf {d}}\in {{\mathbb {R}}}^3\) and \({\mathbf {m}}\in {{\mathbb {R}}}^3\) are the line direction and moment, respectively. The operator \(\sim \) denotes a vector that can be represented up to a scale factor.

The hat operator represents the skew-symmetric matrix that linearizes the cross product, i.e., \({{\mathbf {a}}} \times {{\mathbf {b}}} = \widehat{{{\mathbf {a}}}}{{\mathbf {b}}}\). \(\left| \left| {{\mathbf {X}}} \right| \right| \) denotes the Frobenius norm (also known as \({{\mathcal {L}}}_2\)-norm) of the matrix \({{{\mathbf {X}}}}\), which can be defined as a function of the trace:

$$\begin{aligned} \left| \left| {{{\mathbf {X}}}} \right| \right| ^2={\text {trace}}\left( {{{\mathbf {X}}}}^T{\mathbf {X}}\right) . \end{aligned}$$
(2)

\(\mathcal {SO}(3)\) stands for the group of rotation matrices of order 3, i.e., the group of orthogonal matrices with determinant equal to 1. To conclude, \(\text {diag}(a_1,a_2,\dots ,a_n)\) denotes an \(n\times n\) diagonal matrix, with diagonal entries equals to \(a_1,a_2,\dots ,a_n\), \(\text {expm}({{\mathbf {A}}})\) represents the matrix exponential of \({{\mathbf {A}}}\), and the asterisk symbol in \({{\mathbf {A}}}_*\) denotes a minimizer of an optimization problem.

3 Problem Definition and Direct Solutions

In this section, we start by defining the problem of fitting generalized essential matrices from general \(6\times 6\) matrices and present two straightforward methods to solve it.

3.1 Problem Definition

In order to better understanding the generalized essential matrix parameterization, we first revisit the more straightforward case of the regular essential matrix corresponding to Fig. 1a. An essential matrix aims at representing the incident relation between two projection lines of two cameras looking at the same 3D points in the world. The rigid transformation between both coordinate systems is taken into account.

Without loss of generality, we assume that both cameras are represented at the origin of each coordinate system, ensuring that all the 3D projection lines of each camera pass through the origin of the respective camera coordinate system. Under this assumption, one can represent 3D projection lines by the respective directions, here denoted as \({\mathbf {d}}^{({{\mathcal {L}}})}_i\) and \({\mathbf {d}}^{({\mathcal {R}})}_i\) for \(i{\text {th}}\) 3D projection lines that must intersect in the world, where \(({{\mathcal {L}}})\) and \(({\mathcal {R}})\) represent the left and right rays, respectively. Moreover, we assume that the transformation between both cameras is given by a rotation matrix \({\mathbf {R}}\in \mathcal {SO}(3)\) and a translation vector \({\mathbf {t}}\in {{\mathbb {R}}}^{3}\). Using this formulation, an essential matrix \({\mathbf {E}}\in {{\mathbb {R}}}^{3\times 3}\) is defined by:

$$\begin{aligned} \left. {{\mathbf {d}}}_i^{({{\mathcal {L}}})}\right. ^T {\mathbf {E}}{\mathbf {d}}_i^{({\mathcal {R}})} = 0, \; \text {such that} \;\; {\mathbf {E}} \doteq \widehat{{\mathbf {t}}}{{\mathbf {R}}}. \end{aligned}$$
(3)

Hence, the problem of finding the closest essential matrix \({{{\mathbf {X}}}}_*\) from a general \({{\mathbf {A}}}\in {{\mathbb {R}}}^{3\times 3}\) may be formulated as:

$$\begin{aligned} \mathop {\hbox {argmin}}\limits _{{{{\mathbf {X}}}}}\, \Vert {{\mathbf {A}}} - {{{\mathbf {X}}}}\Vert \ , \text{ s.t. }\ {{{\mathbf {X}}}}\in {{\mathbb {R}}}^{3\times 3}\ {\text {is an essential matrix}}. \end{aligned}$$
(4)

It has an explicit solution (see, for instance, Theorem 5.9 in [21]):

$$\begin{aligned} {{{\mathbf {X}}}}_*= {{\mathbf {U}}}\ {\text {diag}}(\sigma ,\sigma ,0)\ {{\mathbf {V}}}^T, {\text { with }} \sigma \doteq \frac{\lambda _1 + \lambda _2}{2} \end{aligned}$$
(5)

where the \(3\times 3\) orthogonal matrices \({{\mathbf {U}}}\) and \({{\mathbf {V}}}\) and scalars \(\lambda _1,\lambda _2\) are given by the singular value decomposition of \({{\mathbf {A}}}\):

$$\begin{aligned} {{\mathbf {A}}} = {{\mathbf {U}}}\ \text {diag}(\lambda _1,\lambda _2,\lambda _3)\ \mathbf {V}^T. \end{aligned}$$
(6)

It turns out that, for the general case (the one addressed in this paper) and since the perspective constraints are not verified, we cannot represent the 3D projection lines only with its directions [10]. One has to parameterize these 3D projection lines using a general 3D straight lines representation (unconstrained 3D straight lines), see Fig. 1b. Let us represent lines of both cameras as \({{\mathbf {l}}}^{({{\mathcal {L}}})}_i\) and \({{\mathbf {l}}}^{({\mathcal {R}})}_i\), represented in each coordinate system and parameterized by Plücker coordinates (Sect. 2). The incident relation between both sets of 3D projection lines is given by:

$$\begin{aligned} \left. {{\mathbf {l}}}_i^{({{\mathcal {L}}})}\right. ^T \varvec{{\mathcal {E}}}\ {{\mathbf {l}}}_i^{({\mathcal {R}})} = 0, \; \text {such that} \;\; \varvec{{\mathcal {E}}}\in {\mathcal {X}}, \end{aligned}$$
(7)

where \(\varvec{{\mathcal {E}}}\in {{\mathbb {R}}}^{6 \times 6}\) is a generalized essential matrix [25, 36, 40] and \({\mathcal {X}}\) denotes the space of generalized essential matrices:

$$\begin{aligned} {\mathcal {X}}=\left\{ \begin{bmatrix} \widehat{{\mathbf {t}}}\,{{\mathbf {R}}} &{}\quad {\mathbf {R}} \\ {{\mathbf {R}}} &{}\quad {\mathbf {0}} \end{bmatrix}{:}\ {{\mathbf {R}}} \in \mathcal {SO}(3), \widehat{{\mathbf {t}}}\ {\text {is skew-symmetric}}\right\} . \end{aligned}$$
(8)

As observed in (8), a generalized essential matrix has a particular form. It is built up by block matrices that depend on rotation and translation parameters. Likewise the estimation of essential matrices, in many situations \(\varvec{{\mathcal {E}}}\) is estimated without enforcing the respective constraints. This happens, in particular, when using DLT techniques [13] or iterative schemes that do not take into account all the constraints, to speed up the respective process.

One of the goals of this paper is to estimate a real generalized essential matrix [i.e., a matrix satisfying the constraints associated with (8)] that is closest to a general \({{\mathbf {A}}}\in {{\mathbb {R}}}^{6\times 6}\) with respect to the Frobenius norm. Formally, this problem is formulated as:

$$\begin{aligned} \begin{aligned}&\mathop {\hbox {argmin}}\limits _{{{{\mathbf {X}}}}}&\left| \left| {{{\mathbf {X}}}}-{{\mathbf {A}}} \right| \right| ^2 \\&\text {subject to}&{{{\mathbf {X}}}} \in {\mathcal {X}}. \end{aligned} \end{aligned}$$
(9)

Next, we present some naive approaches to solve (9).

3.2 A Direct Solution to the Problem in (9)

The crucial part in solving (9) is to ensure that \({{{\mathbf {X}}}}\) belongs to the space of solutions \({\mathcal {X}}\). There is, however, a direct way of ensuring this, which can be derived directly from the constraints in (8). These constraints are associated with the fact that \({{{\mathbf {X}}}}\) can be built up by stacking both \({{\mathbf {R}}}\) and \(\widehat{{\mathbf {t}}}{{\mathbf {R}}}\), allowing us to rewrite the problem as:

$$\begin{aligned} \begin{aligned}&\mathop {\hbox {argmin}}\limits _{{{{\mathbf {X}}}}}&\left| \left| {{{\mathbf {X}}}}-{{\mathbf {A}}} \right| \right| \\&\text {subject to}&{{{\mathbf {X}}}}_{(1{:}3,4{:}6)}^T {{{\mathbf {X}}}}_{(1{:}3,4{:}6)} = {{\mathbf {I}}} \\&&{{{\mathbf {X}}}}_{(4{:}6,1{:}3)} - {{{\mathbf {X}}}}_{(1{:}3,4{:}6)} = {\mathbf {0}} \\&&{{{\mathbf {X}}}}_{(4{:}6,4{:}6)} = {\mathbf {0}} \\&&{{{\mathbf {X}}}}_{(1{:}3,1{:}3)} {{{\mathbf {X}}}}_{(1{:}3,4{:}6)}^T + {{{\mathbf {X}}}}_{(1{:}3,4{:}6)} {{{\mathbf {X}}}}_{(1{:}3,1{:}3)}^T = {\mathbf {0}}. \end{aligned} \end{aligned}$$
(10)

The main issues associated with the problem in (10) are related to the large amount of constraints. In total, 33 constraints are involved, being many of them quadratic. As we shall see in the experimental results, this will increase significantly the computational time required to fit the generalized essential matrix. To eliminate the high amount of constraints, in the next subsection, we reformulate the problem in (9) as an unconstrained one.

3.3 Unconstrained Formulation of (9)

In this subsection, we derive an unconstrained formulation of the problem in (9). Let \({\mathcal {SE}}(3)\) stand for the special Euclidean Lie group of motions in \({{\mathbb {R}}}^3\), that is, the group of matrices of the form

$$\begin{aligned} {{\mathbf {T}}} = \begin{bmatrix} {{\mathbf {R}}} &{}\quad {{\mathbf {t}}} \\ {{\mathbf {0}}} &{}\quad 1 \end{bmatrix}. \end{aligned}$$
(11)

The corresponding Lie algebra (i.e., the tangent space at the identity) is denoted by \({\mathfrak {se}}(3)\), which consists of matrices of the form

$$\begin{aligned} \varvec{\xi }(\varvec{w},\varvec{v}) = \begin{bmatrix} \widehat{\varvec{w}} &{} \varvec{v} \\ {\mathbf {0}} &{} 1 \end{bmatrix}, \end{aligned}$$
(12)

where \(\varvec{w},\varvec{v}\in {{\mathbb {R}}}^3\). It is well known that the exponential map transforms a matrix in \({\mathfrak {se}}(3)\) to a matrix in \({{\mathcal {SE}}}(3)\):

$$\begin{aligned} {\text {expm}}{:}\,{\mathfrak {se}}(3)\mapsto {\mathcal {SE}}(3), \end{aligned}$$
(13)

such that \({\mathbf {T}} = \text {expm}(\varvec{\xi }(\varvec{w},\varvec{v})).\) Now, by inserting (12) and (13) in (9), we can rewrite the problem as:

$$\begin{aligned} \begin{aligned}&\mathop {\hbox {argmin}}\limits _{(\varvec{w},\varvec{v})}&\left| \left| \text {expm}(\varvec{\xi }(\varvec{w},\varvec{v}))-{{\mathbf {A}}} \right| \right| \\ \end{aligned}, \end{aligned}$$
(14)

thus resulting in a nonlinear unconstrained problem with six unknowns. Although at first sight this problem might look easy to solve, we stress that the exponential map of a \(4\times 4\) matrix may be computationally expensive; an efficient way of evaluating the map \(\text {expm}{:}\,{\mathfrak {se}}(3)\mapsto {\mathcal {SE}}(3)\) in Sect. 3.3 is given by computing \({{\mathbf {R}}}\) as shown in (35), where \(\mu = \Vert \varvec{w} \Vert \) and \({{\mathbf {Z}}} = \widehat{\varvec{w}}/\mu \), and \({\mathbf {t}}\) as:

$$\begin{aligned} {\mathbf {t}} = \left( {{\mathbf {I}}} + \tfrac{1-\cos {(\mu _k)}}{\mu _k}{{\mathbf {Z}}}_k^{T} + \tfrac{(\mu _k-\sin {(\mu _k)})}{\mu _k}\left( {{\mathbf {Z}}}_k^{T}\right) ^2\right) \varvec{v}. \end{aligned}$$
(15)

In [7] another closed formulae to \(\text {expm}(\varvec{\xi }(\varvec{w},\varvec{v}))\) may be found. In addition, the fact that the we are not optimizing directly in \({\mathbf {T}}\) may lead to a high number of iterations and a less accurate solution.

In the next section, we derive an efficient algorithm for solving (9) that exploits the particular features of the objective function and constraints.

4 An Efficient Solution

Now, we describe a method for an efficient approximation of the generalized essential matrix from a given \(6\times 6\) matrix, with respect to the Frobenius norm. Our approach, first, represents the problem independently of the translation parameters, which means that only orthogonal constraints will be involved (Sect. 4.1). Then, in Sect. 4.2, we use this new representation for deriving a new unconstrained optimization problem. Section 4.3 proposes an efficient algorithm to solve the reformulated optimization problem, which is our main contribution.

4.1 Reformulation of the Problem

Since our goal is to represent the problem independently from the translation parameters, we aim at eliminating the skew-symmetric constraints in (9). To ease the notation, we define the following submatrices: \({{\mathbf {A}}}_{11}\doteq {{\mathbf {A}}}_{(1{{:}\,}3,1{{:}\,}3)}\); \({{\mathbf {A}}}_{12}\doteq {{\mathbf {A}}}_{(1{{:}\,}3,4{{:}\,}6)}\); \({{\mathbf {A}}}_{21}\doteq {{\mathbf {A}}}_{(4{{:}\,}6,1{{:}\,}3)}\); and \({{\mathbf {A}}}_{22}\doteq {{\mathbf {A}}}_{(4{{:}\,}6,4{{:}\,}6)}\).

We start by finding a workable expression for the objective function in terms of \({{\mathbf {R}}}\) and \(\widehat{{\mathbf {t}}}\):

$$\begin{aligned} f({{\mathbf {R}}},{\mathbf {t}})\doteq & {} \quad \left\| {{{\mathbf {X}}}}-{{\mathbf {A}}}\right\| ^2 \nonumber \\= & {} \quad \left\| \begin{bmatrix} {{\mathbf {A}}}_{11} &{}\quad {\mathbf {A}}_{12} \\ {{\mathbf {A}}}_{21} &{}\quad {\mathbf {A}}_{22} \end{bmatrix} - \begin{bmatrix} \widehat{{\mathbf {t}}}{{\mathbf {R}}} &{}\quad {\mathbf {R}} \\ {{\mathbf {R}}} &{}\quad {\mathbf {0}} \end{bmatrix} \right\| ^2 \nonumber \\= & {} \Vert {{\mathbf {A}}}_{11}-\widehat{{\mathbf {t}}}{{\mathbf {R}}}\Vert ^2+\Vert {{\mathbf {A}}}_{12}-{{\mathbf {R}}}\Vert ^2 \nonumber \\&\quad +\,\Vert {{\mathbf {A}}}_{21}-{{\mathbf {R}}}\Vert ^2+\Vert {{\mathbf {A}}}_{22}\Vert ^2 \nonumber \\= & {} \Vert {{\mathbf {A}}}_{11}-\widehat{{\mathbf {t}}}{{\mathbf {R}}}\Vert ^2\nonumber \\&\quad +\,\text {trace}\left( ({{\mathbf {A}}}^T_{12}-{{\mathbf {R}}}^T)({{\mathbf {A}}}_{12}-{{\mathbf {R}}}) \right) \nonumber \\&\quad +\,\text {trace}\left( ({{\mathbf {A}}}^T_{21}-{{\mathbf {R}}}^T)({{\mathbf {A}}}_{21}-{{\mathbf {R}}}) \right) +\Vert {{\mathbf {A}}}_{22}\Vert ^2.\nonumber \\ \end{aligned}$$
(16)

Attending to the linearity of the \(\text {trace}\) function and the fact that \(\Vert {{\mathbf {R}}}^T{\mathbf {R}}\Vert ^2=\Vert {{\mathbf {I}}}\Vert ^2=3\), the following expression is obtained for the objective function:

$$\begin{aligned}&f({{\mathbf {R}}},{\mathbf {t}})=\Vert {{\mathbf {A}}}_{11}-\widehat{{\mathbf {t}}}{{\mathbf {R}}}\Vert ^2\nonumber \\&\quad -2\ \text {trace}\left( ({{\mathbf {A}}}_{12}+{\mathbf {A}}_{21})^T{{\mathbf {R}}}\right) +\alpha , \end{aligned}$$
(17)

where:

$$\begin{aligned} \alpha \doteq 6+\left\| \begin{bmatrix} {\mathbf {0}} &{}\quad {{\mathbf {A}}}_{12} \\ {\mathbf {A}}_{21} &{}\quad {{\mathbf {A}}}_{22}\end{bmatrix}\right\| ^2 \end{aligned}$$
(18)

is a constant. Let us denote \({\mathbf {M}}\doteq {{\mathbf {A}}}_{11}\) and \({\mathbf {N}}\doteq ({{\mathbf {A}}}_{12}+{\mathbf {A}}_{21})^T\). With respect to the Frobenius norm, it is well known and easy to show that the nearest skew-symmetric \(\widehat{{\mathbf {t}}}_*\) from a given generic matrix \({{\mathbf {B}}}\in {{\mathbb {R}}}^{3\times 3}\) is the so-called skew-symmetric part of \({{\mathbf {B}}}\) (check Theorem 5.3 in [6] with \({\mathbf {P}}={{\mathbf {I}}}\)):

$$\begin{aligned} \widehat{{\mathbf {t}}}_*=\frac{{{\mathbf {B}}}-{\mathbf {B}}^T}{2}. \end{aligned}$$
(19)

Hence, we can replace \(\widehat{{\mathbf {t}}} \doteq \nicefrac {\left( {\mathbf {M}}{{\mathbf {R}}}^T-({\mathbf {M}}{{\mathbf {R}}}^T)^T\right) }{2}\) in (17), yielding:

$$\begin{aligned} g({{\mathbf {R}}})\doteq \frac{1}{4}\ \left\| {\mathbf {M}}+{{\mathbf {R}}}{\mathbf {M}}^T{{\mathbf {R}}}\right\| ^2-2\text {trace}({\mathbf {N}}{{\mathbf {R}}})+\beta , \end{aligned}$$
(20)

where:

$$\begin{aligned} \beta \doteq 6+\left\| \begin{bmatrix}{\mathbf {0}} &{}\quad {{\mathbf {A}}}_{12} \\ {\mathbf {A}}_{21} &{}\quad {{\mathbf {A}}}_{22} \end{bmatrix} \right\| ^2+\frac{1}{2}\Vert {{\mathbf {A}}}_{11}\Vert ^2. \end{aligned}$$
(21)

Writing again the Frobenius norm in terms of the trace of a matrix gives:

$$\begin{aligned} g({{\mathbf {R}}})=\frac{1}{2}\ \text {trace}\left( ({\mathbf {M}}^T{{\mathbf {R}}})^2\right) - 2\ \text {trace}\left( {\mathbf {N}}{{\mathbf {R}}}\right) +\beta , \end{aligned}$$
(22)

with \(\beta \) being the constant given in (21). This allows a new reformulation of the problem (9) as:

$$\begin{aligned} \begin{aligned}&\mathop {\hbox {argmin}}\limits _{{{\mathbf {R}}}}&g({{\mathbf {R}}}) \\&\text {subject to}&{{\mathbf {R}}} \in \mathcal {SO}(3), \end{aligned} \end{aligned}$$
(23)

which has only orthogonal constraints.

Before we present our efficient solution to the problem at hand, we propose an unconstrained version of the problem in (23).

4.2 An Unconstrained Formulation of (23)

We proceed similarly as in Sect. 3.3. Let \(\varvec{w}\in {{\mathbb {R}}}^3\) and consider its corresponding skew-symmetric matrix \(\widehat{\varvec{w}}\in {\mathfrak {so}}(3)\). Using the Lie mapping, one can write

$$\begin{aligned} \text {expm}{:}\,{{\mathfrak {so}}}(3)\mapsto \mathcal {SO}(3), \ \text {such that} \ \ {{\mathbf {R}}} = \text {expm}(\widehat{\varvec{w}}). \end{aligned}$$
(24)

Now, by using (24), the problem in (23) can be rewritten as:

$$\begin{aligned} \begin{aligned}&\mathop {\hbox {argmin}}\limits _{\varvec{w}}&g(\text {expm}(\widehat{\varvec{w}})), \end{aligned} \end{aligned}$$
(25)

which is unconstrained. Efficient techniques to compute the exponential map in (24) are available [see (35)]. However, the fact that we are not iterating directly in the space of rotation matrices will bring inconvenient issues such as the requirement of a large number iterations.

In the next subsection, we aim at tackling these issues by proposing an efficient solution to the problem (23).

4.3 An Efficient Solution to (23)

Many optimization problems with orthogonal constraints have been investigated in the last two decades; see, for instance [1, 5, 8, 22, 44]. The right framework to deal with this kind of problems is to regard them as optimization problems on matrix manifolds. Tools from Riemannian geometry, calculus on matrix manifolds, and numerical linear algebra are required. Similar techniques can be used in our particular problem (23), because the set of rotation matrices \(\mathcal {SO}(3)\) is a manifold. It also has a structure of Lie group (see [2]) and is a compact set. We recall that this latter property guarantees the existence of at least a global minimum for (23); see Part III of [19]. However, the complicated expression of the objective function \(g({{\mathbf {R}}})\) turns hard to find an explicit expression for those global minima. Besides, the lack of convexity of our problem (neither the objective function nor the constraints are convex) may only guarantee the approximation of local minima.

It turns out, however, that some numerical experiments (not shown in this paper) suggest that the approximation produced by Algorithm 1 is indeed the global one. We could observe this because different initial guesses \({{{\mathbf {X}}}}_0\) led to convergence to the same matrix. Unfortunately, a theoretical confirmation that Algorithm 1 always converges to a global minimum is still unknown. Nevertheless, it can be guaranteed that when \({{\mathbf {A}}}\) is close to being a generalized essential matrix (this happens in many practical situations, as shown later in Sect. 5), a local minimizer for (23) will be very close to the global one. Now, we provide more insight into this claim. Let us consider the original formulation (9), \(\widetilde{{{{\mathbf {X}}}}}\) a local minimizer, and \({{{\mathbf {X}}}}_*\) a global minimizer. Assume that:

$$\begin{aligned} \Vert {{\mathbf {A}}}-\widetilde{{{{\mathbf {X}}}}}\Vert =\epsilon , \end{aligned}$$
(26)

for a certain positive value \(\epsilon \). Since \(\Vert {{\mathbf {A}}}-{{{\mathbf {X}}}}_*\Vert \le \epsilon \), we have:

$$\begin{aligned} \Vert \widetilde{{{{\mathbf {X}}}}}-{\mathbf {X}}_*\Vert= & {} \Vert \tilde{{{{\mathbf {X}}}}}-{{\mathbf {A}}}+{\mathbf {A}}-{{{\mathbf {X}}}}_*\Vert \nonumber \\\le & {} \Vert \widetilde{{{{\mathbf {X}}}}}-{{\mathbf {A}}}\Vert +\Vert {\mathbf {A}}-{{{\mathbf {X}}}}_*\Vert \nonumber \\\le & {} 2\epsilon , \end{aligned}$$
(27)

which shows that:

$$\begin{aligned} 0\le \Vert \widetilde{{{{\mathbf {X}}}}}-{\mathbf {X}}_*\Vert \le 2\epsilon . \end{aligned}$$
(28)

This means that if \(\epsilon \approx 0\) then \(\widetilde{{{{\mathbf {X}}}}}\approx {\mathbf {X}}_*\). For instance, if \(\epsilon =10^{-1}\), then \(\widetilde{{{{\mathbf {X}}}}}\approx {\mathbf {X}}_*\) with an error not greater that \(2\times 10^{-1}\). Note that both \(\widetilde{{{{\mathbf {X}}}}}\) and \({\mathbf {X}}_*\) are generalized essential matrices.

Once a local minimum \({{\mathbf {R}}}_*\) of (23) has been computed, the corresponding skew-symmetric matrix \(\widehat{{\mathbf {t}}}_*\) needed in (9) will be:

$$\begin{aligned} \widehat{{\mathbf {t}}}_*=\frac{{\mathbf {M}}{{\mathbf {R}}}_*^T-\left( {\mathbf {M}}{{\mathbf {R}}}_*^T\right) ^T}{2}. \end{aligned}$$
(29)

Hence, the required nearest generalized essential matrix from \({{\mathbf {A}}}\) will be given by:

$$\begin{aligned} {{{\mathbf {X}}}}_*= \begin{bmatrix}\widehat{{\mathbf {t}}}_*{{\mathbf {R}}}_*&{}\quad {\mathbf {R}}_*\\ {{\mathbf {R}}}_*&{}\quad {\mathbf {0}} \end{bmatrix}. \end{aligned}$$
(30)

The algorithm we will propose is of the steepest descent type. Loosely speaking, these methods are essentially based on the property that the negative of the gradient of the objective function points out the direction of fastest decrease. For more details on general steepest descent methods, see [19, Sec. 8.6] or [35, Ch. 3]. In our situation, one has to account the constraint that \({{\mathbf {R}}}\) must be a rotation matrix and so our algorithm will evolve on the manifold \(\mathcal {SO}(3)\). Hence we have to resort to steepest descent methods on matrix manifolds (see [2, Ch. 3]).

For particular manifolds, tailored methods exploiting its unique features have been proposed by many authors. For instance, for the complex Stiefel manifold, Manton [22, Alg. 15] proposed a modified steepest descent method based on Euclidean projections, where the Armijo’s step-size rule calculates the length of the descent direction. This method has been adapted and improved by Abrudan et al. [1, Table II] for the manifold of unitary matrices, where geodesics replace the Euclidean projections. When dealing with manifolds that are Lie groups, geodesics are defined upon the matrix exponential, which is a much studied matrix function [27].

The strategy adopted for the steepest descent algorithm to be described below has been inspired in [1, 22], and we refer the reader to those papers for more technical details. The particular nature of the objective function (22) is exploited in order to improve the efficiency of the method. In particular, we propose a specific technique to choose an initial guess (thus reducing the number of iterations) and avoid expensive methods based on Schur decompositions and Padé approximation for the computation of matrix exponentials.

Before displaying the steps of our algorithm, one needs to find the Euclidean and the Riemannian gradients of the objective function. After some calculations (see [20] for formulae on derivatives of the \(\text {trace}\) function), the Euclidean gradient of g at (22) is:

$$\begin{aligned} \nabla g({{{\mathbf {X}}}})\doteq {\mathbf {M}}{{{\mathbf {X}}}}^T{\mathbf {M}}-2{\mathbf {N}}^T, \end{aligned}$$
(31)

and the Riemannian gradient is:

$$\begin{aligned} \text {grad}\, g({{{\mathbf {X}}}})\doteq \nabla g({{{\mathbf {X}}}})-{\mathbf {X}}\nabla g({{{\mathbf {X}}}})^T{\mathbf {X}}. \end{aligned}$$
(32)

Note that \(\text {grad}\, g({{{\mathbf {X}}}}){\mathbf {X}}^T\) is a “tangent vector” that is actually a skew-symmetric matrix. Geodesics on \(\mathcal {SO}(3)\) (i.e., curves giving the shortest path between two points in the manifold) can be defined through the matrix exponential as:

$$\begin{aligned} G(t)=G(0)\,\ \text {expm}(t {\mathbf {S}}), \end{aligned}$$
(33)

where \({\mathbf {S}}\in {{\mathbb {R}}}^{3\times 3}\) is a skew-symmetric matrix representing a translation and t is a real scalar. Algorithm 1 summarizes the main steps of our method.

Algorithm. In a few words, Algorithm 1 starts with an initial approximation \({{{\mathbf {X}}}}_0\in \mathcal {SO}(3)\), finds the skew-symmetric matrix \(\left( \text {grad}\, g({{{\mathbf {X}}}})\right) {\mathbf {X}}^T\) (the gradient direction on the manifold), and performs several steps along geodesics until convergence. The positive scalar \(\mu _k\) controls the length of the “tangent vector” and, in turn, the overall convergence of the algorithm. To find an almost optimal \(\mu _k\), the algorithm uses the Armijo’s step-size rule, as performed in [22].

figure c
Fig. 2
figure 2

Comparison between the methods presented in this paper against general optimization techniques applied to (10), as a function of the variation of the Noise Level. The Tolerance value is set to \(10^{-9}\) and \(10^{-6}\). We evaluate the methods for both the computational speed and the number of iterations, (a)–(d), for the Tolerance level of \(10^{-9}\) and \(10^{-6}\), respectively (Color figure online)

Computational Remarks. Now, we draw our attention to some essential computational remarks about the algorithm.

  • As the algorithm runs, the function g, which involves the computation of traces of products of matrices, is called several times. Note that the efficient computation of \(\ \text {trace}({{\mathbf {A}}}{{\mathbf {B}}})\) does not require matrix products. Instead, it can be carried out through the formula:

    $$\begin{aligned} \text {trace}({{\mathbf {A}}}{{\mathbf {B}}})= \sum _{i,j}{({{\mathbf {A}}}\circ {{\mathbf {B}}}^T)}_{(i,j)}, \end{aligned}$$
    (34)

    where the operator \(\circ \) denotes the Hadamard product, i.e., entry-wise product. If \({{\mathbf {A}}}\) and \({{\mathbf {B}}}\) are matrices of order n, the direct computation of the matrix product \({{\mathbf {A}}}{{\mathbf {B}}}\) needs \(O(n^3)\) operations, while the trace at (34) just requires \(O(n^2)\);

  • The exponential of the \(3\times 3\) skew-symmetric matrix \(-\mu _k {{\mathbf {Z}}}_k\) in lines 12 and 19 of the algorithm can computed by means of the well-known Rodrigues’ formula [29]:

    $$\begin{aligned}&\text {expm}(-\mu _k{{\mathbf {Z}}}_k) \nonumber \\&\quad = {{\mathbf {I}}} + \sin {(\mu _k)}{{\mathbf {Z}}}_k^{T} + (1-\cos {(\mu _k)})\left( {{\mathbf {Z}}}_k^{T}\right) ^2, \end{aligned}$$
    (35)

    which involves (at leading cost) the computation of just one matrix product. The direct use of conventional \(\texttt {expm}(\cdot )\) functions based on the scaling and squaring method combined with Padé approximation would be much more expensive.

  • Note that the trace of any skew-symmetric matrix \({\mathbf {S}}\) is always zero and so:

    $$\begin{aligned} \det (\text {expm}({\mathbf {S}}))=\text {expm}(\text {trace}({\mathbf {S}}))=1. \end{aligned}$$
    (36)

    This guarantees that matrices \({{{\mathbf {X}}}}_k\) do not leave the rotation manifold \(\mathcal {SO}(3)\); and

  • To conclude, we remark that the choice of the initial guess \({{{\mathbf {X}}}}_0\) influences the running time of the algorithm. An obvious choice for \({{{\mathbf {X}}}}_0\) would be the identity matrix \({{\mathbf {I}}}\) of order 3. It turns out that other choices of \({{{\mathbf {X}}}}_0\) may reduce the number of iterations significantly in the algorithm. In our experiments, we have chosen as the initial guess the rotation matrix that maximizes \(2\text {trace}\left( {\mathbf {N}}{{\mathbf {R}}}\right) \) of the sum defining \(g({{\mathbf {R}}})\) in (22). We recall that this problem has an explicit solution based on the singular value decomposition of \({\mathbf {N}}\) (see [9, Sec. 12.4]):

    $$\begin{aligned} \widetilde{{{\mathbf {R}}}}={{\mathbf {U}}}\,\text {diag}\left( 1,1,\text {det}(\mathbf {UV}^T)\right) \,{{\mathbf {V}}}^T, \end{aligned}$$
    (37)

    with \({{\mathbf {U}}}\) and \({{\mathbf {V}}}\) being the orthogonal matrices arising in the singular value decomposition of \({\mathbf {N}}^T\), that is, \({\mathbf {N}}^T=\mathbf {UDV}^T\). Since, in general, \(g(\widetilde{{{\mathbf {R}}}})\le g({{\mathbf {I}}})\), it is expected that \({{{\mathbf {X}}}}_0=\widetilde{{{\mathbf {R}}}}\) will be more close to the minimizer than \({{{\mathbf {X}}}}_0={{\mathbf {I}}}\).

5 Implementation of Our Method

In this section, we compare Algorithm 1 with general optimization techniques applied to the direct formulation of the problem (10) and the methods presented in Sects. 3.3 and 4.2, using synthetic data. We run all these experiments a single thread Intel i7-5820K processor at 3.30 GHz and 16 GB of RAM.

Fig. 3
figure 3

Comparison between Algorithm 1 and the other methods, as a function of the Tolerance considered in the algorithms. The colors of the curves and associated algorithms are identified in Fig. 2. We have considered two distinct values to the Noise Level: \(10^{-1}\) and \(5\times 10^{-1}\), which causes the worst results for OUR method in terms of number of iterations but not in computational speed (Fig. 2). We evaluate the methods for both the number of iterations and computational speed, (b) & (d) and (a) & (c) (Color figure online)

5.1 Experimental Results

The method described in Algorithm 1 (here denoted as OUR) is tested against the following general optimization techniques:

  • so(3): Method based on solving the unconstrained problem in (25), by the Levenberg–Marquardt algorithm;

  • se(3): Method based on solving (14), by the Levenberg–Marquardt algorithm;

  • interior-point: Solution of (10), by the interior-point method [33];

  • sqp: Solution of (10) by the sequential quadratic programming method [4]; and

  • active-set: Solution of (10) by the active-set method [30].

All the algorithms, including OUR, were optimized (part of them have been implemented in C/C++) and can be accessed from MATLAB. All the results shown below were implemented in this framework.

For the dataset, we first generate random rotation matrices \({{\mathbf {R}}}\in \mathcal {SO}(3)\) and random translation vectors \({\mathbf {t}}\in {{\mathbb {R}}}^3\). With these rotation and translation elements, we build a generalized essential matrix \(\varvec{ \mathcal E}\in {\mathcal {X}}\), as defined in (7).

To carry out the experiments, we propose a variation of the deviation of a generic matrix in \({{\mathbb {R}}}^{6\times 6}\) from a true generalized essential matrix. The procedure is as follows: we first generate an error matrix \(\varvec{\varOmega }\in {{\mathbb {R}}}^{6\times 6}\), in which the respective elements are randomly generated from a normal distribution with standard deviation equal to the variable Noise Level, and then compute the “noisy” matrix as \({{\mathbf {A}}} = \varvec{{\mathcal {E}}} + \varvec{\varOmega }\). All the methods mentioned above are then applied.

Two tolerance values for the algorithms were selected, 10 & \(10^{-6}\), and we change the variable Noise Level from \(10^{-3}\) to \(10^1\). Results for both the computational speed and the number of iteration are displayed in Fig. 2a, c and b, d, respectively. For each value of the Noise Level, \(10^3\) random trials were generated.

In addition to the evaluation of the deviation from the generalized essential matrix constraints, we have tested the proposed method against the others as a function of the tolerance of the algorithms as well (here denoted as Tolerance value). For that purpose, we fixed a Noise Level equal to \(10^{-1}\) and to \(5\times 10^{-1}\),Footnote 1 and select a Tolerance value ranging from \(10^{-15}\) to 1. The results for both the computational speed and the number of iterations are shown in Fig. 3a–d, respectively. Likewise the previous case, for each level of tolerance, \(10^3\) random trials were generated.

Next, we discuss these experimental results.

5.2 Discussion

As observed in Fig. 2a, c, our method is significantly faster than the others. Its computational speed rarely changes as a function of the deviation from the generalized essential matrix constraints. In fact, as shown in Fig. 3a, c, which represent the worst scenario for OUR method, it can be seen that it performs \(10^3\) times faster than any other method with the exception of so(3), that uses our reformulation of the problem. While OUR requires around \(10^{-4}\) s, the so(3) method takes more than \(10^{-3}\) s. The experiments using the so(3) method show a similar behavior when compared with OUR (almost horizontal line), around \(10^1\) faster than the remaining methods. This shows that the reformulation of the problem proposed in Sect. 4.1 produces better results. This is a remarkable advantage of OUR method for applications requiring real-time computations, such as the camera relative pose estimation, which will be addressed later in Sect. 6. From these figures, we can also conclude that the use of Lie algebra techniques in se(3) does not improve the results when compared to standard constrained optimization techniques.

In Fig. 2b, d one can observe that, contrarily to the general optimization techniques interior-point, sqp, and active-set, the relationship between the number of iterations and Noise Level is nearly linear for OUR, so(3), and se(3). With the exception of the sqp method, for low levels of noise, in general OUR method requires less/similar number of iterations than any other technique. However, as described in the previous paragraph, the computational time of OUR is significantly lower for any Noise Level, independently of the number of the iterations. As pointed out in the previous sections, the use of Lie algebra techniques to represent rotations and transformations (methods so(3) and se(3), respectively) involves a significantly larger number of iterations to reach convergence.

In addition to the evaluation in terms of deviation from the true generalized essential matrices, we have also compared all the methods with respect to the tolerance. We have considered a Noise Level of \(10^{-1}\) and \(5\times 10^{-1}\), which causes the worst results for OUR method in terms of number of iterations but not in computational speed. These results are shown in Fig. 3. In Fig. 3a, c, we can observe that OUR is the fastest method and that so(3) also performs well. In contrast, the method se(3) gives the worst results.

Still about the evaluation in terms of Tolerance, Fig. 3b, d shows that the number of iterations varies in a similar fashion for all the methods. However, while OUR, interior-point, sqp, and active-set techniques perform similarly in terms of number of iterations, the so(3) and se(3) require much more iterations. This also evidences that the use of Lie algebra techniques to get unconstrained optimization problems may simplify the problem itself, but requires in general more iterations and more running time.

6 Results in Real Applications

This section includes practical examples illustrating how Algorithm 1 can be combined with other techniques to improve the results.

In Sect. 6.1 more advantages of using generalized essential matrix approximations are evidenced, in a relative pose problem for general catadioptric cameras. In Sect. 6.2 another application of Algorithm 1, using real data, will be shown. To conclude this section, in Sect. 6.3, we discuss the experimental results.

6.1 A Relative Pose Problem with Non-central Catadioptric Cameras

Let us consider a relative position estimation problem, using a non-central catadioptric camera, composed with a perspective camera and a spherical mirror [3, 42].

Fig. 4
figure 4

Representation of the simulated environment created for the evaluation. At a we show the 3D scene simulated, including the set of 3D points; the camera system; and the subset of the path that the camera must follow. b An example of the projection of the 3D points in one image frame, and its correspondence with the projected points in the previous frame

We synthetically generate a set of 3D points in the world (Fig. 4a) and, then, define a path for the camera. While the camera is following the path, we compute the projection of the 3D points onto the image of the catadioptric camera system [3] (Fig. 4b). Then, with the knowledge of the matching between pixels at consecutive image frames, we aim at computing the rotation and translation parameters, ensuring the intersection of the respective inverse projection lines resulting from the images of the 3D points in consecutive image frames in the world.

A general technique to handle this problem can be mathematically formulated as (\({{{\mathbf {X}}}}\) is a matrix in \({{\mathbb {R}}}^{6\times 6}\)):

$$\begin{aligned} \begin{array}{llll} &{} \mathop {\hbox {argmin}}\limits _{{{{\mathbf {X}}}}} &{} &{} h\left( {{{\mathbf {X}}}}\right) \\ &{} \text {subject to} &{} &{} {{{\mathbf {X}}}}_{(1{:}\,3,4{:}\,6)}^T {{{\mathbf {X}}}}_{(1{:}\,3,4{:}\,6)} = {{\mathbf {I}}} \\ &{} &{} &{} {{{\mathbf {X}}}}_{(4{:}\,6,1{:}\,3)} - {{{\mathbf {X}}}}_{(1{:}\,3,4{:}\,6)} = {\mathbf {0}} \\ &{} &{} &{} {{{\mathbf {X}}}}_{(4{:}\,6,4{:}\,6)} = {\mathbf {0}} \\ &{} &{} &{} {{{\mathbf {X}}}}_{(1{:}\,3,1{:}\,3)} {{{\mathbf {X}}}}_{(1{:}\,3,4{:}\,6)}^T + {{{\mathbf {X}}}}_{(1{:}\,3,4{:}\,6)} {{{\mathbf {X}}}}_{(1{:}\,3,1{:}\,3)}^T = {\mathbf {0}} \\ \end{array}, \nonumber \\ \end{aligned}$$
(38)

where

$$\begin{aligned} h\left( {{{\mathbf {X}}}}\right) = \sum _{i=0}^{N}{\left. {{\mathbf {l}}}_i^{\left( {{\mathcal {L}}}\right) }\right. ^T {{{\mathbf {X}}}}\ {{\mathbf {l}}}_i^{\left( {\mathcal {R}}\right) }}, \end{aligned}$$
(39)

\({{\mathbf {l}}}_i^{\left( {{\mathcal {L}}}\right) }\) and \({{\mathbf {l}}}_i^{\left( {\mathcal {R}}\right) }\) represent the matching between the image projection lines on left and right cameras, respectively, and N is the number of matching points.

We consider six distinct methods for the computation of the relative pose, based on the estimation of the generalized essential matrix:

  • Full: Denotes the relative pose estimation that aligns 3D straight lines in the world to ensure that they intersect, by the optimization problem (38). A tolerance of \(10^{-9}\) was considered for the constraints;

  • Without Constraints: Denotes a method similar to Full, i.e., the problem of (38). However, in this case, a different value of \(10^{-1}\) was considered for the tolerance of constraints;

  • OUR + WC: Consists in, first, estimating an initial solution using the Without Constraints method and, then, applying OUR method to estimate a true generalized essential matrix (Algorithm 1), with tolerance \(10^{-9}\) for the constraints;

  • interior-point + WC: Same as OUR + WC but now the approximation is given by solving (10) with the interior-point;

  • SQP + WC: Same as interior-point + WC, but with the approximation of (10) obtained with the sqp algorithm;

  • active-set + WC: Same as interior-point + WC but now (10) is solved by active-set.

The results for the distribution of the computational time required to compute each image frame are shown in Fig. 5 (box plot graph). These results are commented later in Sect. 6.3.

Fig. 5
figure 5

Results for the distribution of the computational time required for computing the camera relative pose, for each of the methods described in Sect. 6.1

Fig. 6
figure 6

Results obtained with real images. At the left, we show an example of an image acquired by the non-central catadioptric camera. On the right, we show the results for the trajectory (top view) computed with the FULL and OUR + WC methods, described in Sect. 6.1

Note that now we are dealing with a different optimization problem from (10), despite the constraints coincide.

6.2 Experiments with Real Data

To conclude the experimental results, we apply Algorithm 1 to an absolute pose estimation problem in the framework of general camera models, and known coordinates of 2D straight lines in the world coordinate system [26]. We consider a non-central catadioptric camera (with a spherical mirror) and moved the camera along a path in the lab.

3D lines in the world are associated with a set of pixels in the image (Fig. 6). The goal is to find the generalized essential matrix \(\varvec{{\mathcal {E}}}\) that aligns the 3D inverse projection lines from these pixels with the known 3D straight lines in the world, in order to guarantee their intersection.

This problem can be solved by the same strategy proposed in the previous subsection, i.e., by using the optimization problem of (38), but in this case with the following objective function:

$$\begin{aligned} h\left( {{{\mathbf {X}}}}\right) = \sum _{i=0}^{M}{\sum _{j=0}^{N_i}{\left. {{\mathbf {l}}}_{i}^{\left( {\mathcal {W}}\right) }\right. ^T {{{\mathbf {X}}}}\ {{\mathbf {l}}}_{i,j}^{\left( {\mathcal {C}}\right) }}}, \end{aligned}$$
(40)

where: \({{\mathbf {l}}}_i^{\left( {\mathcal {W}}\right) }\) represent the known 3D straight lines in the world; \({{\mathbf {l}}}_{i,j}^{\left( {\mathcal {C}}\right) }\) denote the inverse projection lines corresponding pixels that are images of the \(i{\text {th}}\) line; \(N_i\) are the number of image pixels that are images of the line i; and M is the number of known 3D straight lines in the world.

We consider the six methods proposed in Sect. 6.1. A comparison between the trajectories of OUR + WCFootnote 2 method and the FULL is shown in Fig. 6. The distribution of the required computational time for each frame is shown in Fig. 7.

6.3 Discussion of the Results

In this subsection, we discuss the experimental results shown in the previous subsections. We start by analyzing the results of the approximation of general \(6\times 6\) matrices, in which we compare the performance of the proposed method against the direct solution (these are the main results of this paper) in a non-central catadioptric relative camera pose. Next, we discuss the experiments with real data, in which we compare the performance of our approximation technique against the direct solution, in an absolute pose problem.

Fig. 7
figure 7

Distribution of the computational time obtained for the absolute pose problem using a real non-central catadioptric camera. We consider all the algorithms described in Sect. 6.1

In all these tests, we have imposed \(m = 100\) as the maximum number of iterations for Algorithm 1. It is worth noting that the algorithm has never reached such a number of iterations, which means that it always converged.

Notice that one of the main contributions of this paper (Sect. 4) is not to estimate the relative pose of an imaging device, but, instead, to find generalized essential matrices from general \(6\times 6\) matrices [that do not verify (10)].

Evaluation in a Non-central Catadioptric Relative Pose Problem. When considering the experiments carried out in Sect. 6.1, first, we conclude that increasing the tolerance of constraints on the FULL algorithm [i.e., not fully considering the underlying constraints of the generalized essential matrix (38)] and, then, recover a true generalized essential matrix, by Algorithm 1, leads to significant savings in computational time. See the comparison between FULL and all the other methods in Figs. 5 and 7 . It can be seen that the differences between OUR + WC and Without Constraints (the optimization without fully considering the underlying constraints of the generalized essential matrix) can be neglected, while this does not happen for the direct solution. We recall that the Without Constraints method does not produce a true generalized essential matrix, while the other ones do.

Besides, from Fig. 4, one can conclude that this procedure (compute \({{\mathbf {A}}}\) and then find the closest \({{{\mathbf {X}}}}\)) does not diminish the results significantly.

To conclude, one can see that estimating \({{\mathbf {A}}}\) and, then, find \({{{\mathbf {X}}}}\) that approximates \({{\mathbf {A}}}\) [see (9)] will result in a much faster algorithm than looking directly for \({{{\mathbf {X}}}}\).

Validation using Real Data. To conclude the experimental results, we validate the proposed fitting technique (Algorithm 1) against the direct solution [with the above mentioned general optimization techniques on the problem defined in (10)] in a real application of an absolute camera pose estimation, when using non-central catadioptric cameras, see Fig. 6.

From Fig. 7, we see that, while the use of the direct solution and all the three general optimization techniques will have an impact on the computation time (see the results for interior-point+WC, SQP+WC, and active-set against Without Constraints), the difference between OUR+WC and Without Constraints can be neglected, being much faster than the FULL technique.

Still from Fig. 6, one can conclude that approximating \({{{\mathbf {X}}}}\) (a true generalized essential matrix) from a general matrix \({{\mathbf {A}}}\) does not degrade significantly the results, being much faster than estimating \({{{\mathbf {X}}}}\) directly (that is shown by comparing FULL and OUR+WC in Fig. 7).

Although these tests have different goals (one is related with the relative camera pose and the other with the absolute pose), these results are very similar to the ones presented in Sect. 6.1, validating the results using synthetic data.

7 Conclusions

To the best of our knowledge, we are the first to investigate the problem of fitting generalized essential matrices, from general \(6\times 6\) matrices, and its implications. We have defined the problem and proposed some general constrained optimization techniques that can be used to determine a solution. One of the issues of the general optimization techniques is the large amount of constraints involved (33 constraints). To get rid of those constraints, two unconstrained formulations of the problem have been proposed. However, they have led to unsatisfactory results. Then, we have proposed an efficient technique by optimizing on matrix manifolds. More specifically, we define a constrained optimization problem on the rotation group. A suitable efficient iterative method, of steepest descent type, has been then developed to solve the problem, which ensures that each iteration lies on the manifold of rotation matrices. A large set of experiments has shown that such a method is the fastest.

We have also presented some results to show the advantages of using approximating techniques such as the one presented in this paper, in real applications. We evaluate our method against the direct solution in relative and absolute pose problems (the latter with real data), in which we prove that: (i) estimating \({{\mathbf {A}}}\) (general \(6\times 6\) matrix) and then fitting \(\varvec{{\mathcal {E}}}\) (a true generalized essential matrix) speed up significantly the required computational time; and (ii) there is no significant deterioration of the obtained results. We also concluded that, contrarily to the direct solution, when using our method, the required additional computational time (i.e., the computation time that is required after the estimation of \({{\mathbf {A}}}\)) can be neglected.