1 Introduction

A projective space can be reconstructed robustly from 2D correspondences across multiple uncalibrated images by one of the several proposed projective reconstruction methods [1113, 28, 29]. Projective space, however, does not contain sufficient information for human perception of the 3D scene. Upgrading from projective to Euclidean space is necessary for visualization or virtual navigation.

The upgrade process is referred to as ‘Euclidean reconstruction’ or self-calibration. The proposed algorithm in this paper is a kind of self-calibration where the camera intrinsic and extrinsic parameters are computed from 2D correspondences without any knowledge about the scene and cameras.

In this paper, we propose a new self-calibration algorithm for upgrading the reconstructed projective space to Euclidean space.

Three usual conditions that can be used for Euclidean reconstruction are:

  1. 1.

    the zero-skew constraint;

  2. 2.

    the unit aspect-ratio constraint and

  3. 3.

    the (partial) constant principal-point constraint.

These three constraints are formulated in the same framework such that the algorithm treats every view and constraint equally. Our proposed algorithm is flexible since the above three constraints can be customized for any specific situation.

The whole process is non-iterative with the runtime of this algorithm being proportional to the number of applied constraints.

The paper is organized as follows. We first provide a literature review on self calibration in Sect. 2. The relationship between the dual image of the absolute conic and the absolute dual quadric for camera calibration is briefly described in Sect. 3. The problem of Euclidean reconstruction is formulated in Sect. 4. The theory of the proposed algorithm is derived in Sect. 6.

The complete algorithm is provided in Sect. 7. Experimental results are given in Sect. 8. The proof of rank-4 properties is given in Sect. 9. Some additional constraints are provided in Sect. 10. Section 11 contains some concluding remarks.

2 Literature Review

In classical calibration methods, the camera intrinsic (i.e. K i ) and extrinsic (i.e. rotation, R i and translation, t i ) parameters are computed from images of a calibration board with known grid patterns. The intrinsic and extrinsic parameters can be estimated accurately. Self-calibration, a new trend of camera calibration, is to obtain the camera intrinsic and extrinsic parameters from point correspondences of unknown objects instead of known objects. Maybank and Faugeras [14] proved that self-calibration is possible when the intrinsic parameters are fixed over a sequence of images by solving the Kruppa equations [4, 14, 32], which are a set of non-linear constraints on the intrinsic parameters. However, the result is very sensitive to noise. Sturm [27] pointed out that the Kruppa equations will fail for some non-critical motions (which are non-degenerate configurations for other self-calibration methods). Gurdjos et al. [7], Sturm [26] also studied ‘artificial critical motion sequences’ for linear self-calibration algorithms [1, 7, 8, 18, 31].

Hartley [9] proposed a series of non-linear algorithms to reconstruct the Euclidean model and camera parameters from 2D correspondences by assuming constant intrinsic parameters. Pollefeys and Gool [17] proposed the modulus constraint to recover the affine space from projective space and solve for the dual images of the absolute conic to upgrade an affine space to a Euclidean space. This method relies on the assumption of constant intrinsic parameters. There are at most 64 solutions for the locations of the plane at infinity but not all of them are sensible solutions. It is costly to solve for the 64 possible solutions by the usual continuation method. A new derivation of the modulus constraint directly from three views is proposed by Schaffalitzky [22]. The number of feasible solutions is then reduced to 21 and a numerical algorithm is also provided for computing them efficiently. However, there is still the problem of deciding which one is the correct solution.

Fundamental matrices between any two views can be computed easily from a projective reconstruction. A simple non-linear method [15] decomposes the fundamental matrices into essential matrices by enforcing the property of the singular values (i.e. λ={e ,e ,0}) of an essential matrix iteratively. The problem of Euclidean reconstruction is formulated as a minimization problem directly parameterized in terms of all intrinsic parameters (and there is no constraint on the intrinsic parameters). More detail experimental results can be found in [5]. There is, however, a lack of proof of convergence and the result relies on a good initial guess.

Instead of direct parameterizations on intrinsic parameters, there are some implicit methods [1, 2, 8, 12, 16, 18, 20, 2325, 31] for recovering the projective distortion matrix to upgrade projective to Euclidean space. Triggs [31] proposed to estimate the absolute quadric of the projective space based on the assumption of constant intrinsic parameters. This non-linear method requires at least four views.

If the skew factors are zero and both the principal points and the aspect ratios are known, the projective distortion matrix H∈ℝ4×4 can be solved linearly [2, 8, 16, 18, 19].

The method proposed by Han and Kanade [8] needs at least 8 views to solve for the absolute quadric [31] linearly. Sainz et al. [21] further developed the method of [8] for enforcing the rank-3 property of the absolute quadric with the assumptions that all the cameras have zero-skew, unit aspect-ratio and known principal points. This rank-3 property can be obtained by solving a 4th order polynomial which is the determinant of a linear combination of two possible solutions in one parameter. Practically, there will not be exactly two possible solutions only when there is at present of noise. Also, the important positive semi-definite property of absolute dual quadric is also not enforced in their algorithm. Pollefeys et al. [19] proposed to formulate the linear approach of Han and Kanade [8] differently from [21] for the assumptions that all cameras have zero-skew, unit aspect-ratio and known principal points for at least 3 views. A direct parameterization is proposed to solve for the 5-unknown simplified absolute quadric which is based on the special choices of the first camera projection matrices in the projective and the upgraded Euclidean space. When there are only 2 views, the solution is determined up to a one parameter family of solutions. Similarly to [21], the rank-3 property constraint on the simplified absolute quadric can be imposed on solving the 1-parameter but there will be 4 possible solutions.

Heyden and Åström [12] introduced a camera with Euclidean image plane when the camera satisfies the two conditions: zero-skew and unit aspect ratio. They also proved that it is possible to upgrade projective to Euclidean space when the cameras have Euclidean image planes. Seo and Heyden [23] proposed another iterative linear algorithm to solve for the absolute dual quadric. This method needs a lot of iterations for convergence and the numerical stability is not considered. Seo and Hong [25] proposed a linear approach to estimate the absolute dual quadric by complex eigen-decomposition. As the method is only developed for zero-skew constraints, it will not make use of other available constraints (such as known aspect-ratio and principal point). Seo and Heyden [24] further proposed to alternatively estimate the absolute dual quadric by applying a linear method [18] and re-estimate the principal points. The applied linear method is proposed by Pollefeys et al. [18] to solve the absolute dual quadric linearly by assuming zero skew, known aspect-ratio and principal point.

Having the same assumptions on an image sequence with Euclidean image plane, Bougnoux [1] proposed a closed-form solution for calculating the focal lengths and the plane at infinity for upgrading to Euclidean reconstruction with a ‘visually perfect’ result. It is proved in [1] that this method has an ambiguity of the anisotropic homothety so that the upgraded 3D scene is only good by visual verification and the estimated intrinsic parameters are not accurate. The iterative methods suffer from the usual problem that the iterative algorithm should be initialized by a sufficiently accurate guess.

In this paper, we propose a new self-calibration algorithm to estimate the projective distortion matrix. The method of recovering the projective distortion matrix is formulated in a subspace framework. The proposed method is also based on solving the absolute dual quadric [10]. We unify most of the common constraints on intrinsic parameters (such as zero-skew constraint, unit aspect-ratio constraint, constant principal points etc.) within the same subspace framework. The proposed algorithm is simple and flexible for combining different assumptions in a single minimization problem. The derivations of different constraints for different assumptions will be provided. Some features of the proposed algorithm are:

  1. 1.

    a non-iterative algorithm;

  2. 2.

    views are treated equally and all constraints are treated equally;

  3. 3.

    options for combining different constraints.

3 Background

3.1 Dual image of the absolute conic and absolute dual quadric

In the Euclidean space, the absolute dual quadric is defined as

$${{\varOmega}}^{\ast }\sim \left[ \begin{array}{c@{\quad }c} {I}_3 & \mathbf{0}_{3 \times 1} \\ \mathbf{0}_{3 \times 1}^T & 0\end{array}\right] . $$

Let us denote a rigid transformation T as

$${T}=\left[ \begin{array}{c@{\quad }c} {R} & \mathbf{t} \\ \mathbf{0}_{3 \times 1}^T & 1\end{array}\right] $$

where R is a 3×3 rotational matrix and t is a 3×1 translation vector. The absolute dual quadric transformed by T can be expressed as

which shows that the absolute dual quadric is invariant to any rigid transformation. The absolute dual quadric can be projected to any camera and its image on the image plane is called the dual image of the absolute conic (DIAC), \({\omega}_{i}^{\ast }\) [10]. If the projection matrix of ith view is P i =K i [R i |t i ], its dual image of the absolute conic is given by

(1)

Hence, the dual image of the absolute conic \({\omega}_{i}^{\ast }\) is only related to the intrinsic parameters of the ith view.

4 Problem Formulation

A projective frame can be reconstructed from 2D correspondences across multiple views by projective reconstruction methods [1113, 28, 29]. We choose the method of Hung and Tang [13] to minimize 2D reprojection error. To upgrade the reconstructed projective frame (\(\hat{{P}}_{i}\) and \(\hat{\mathbf{X}}_{j} \)) to a Euclidean frame (\(\tilde{{K}}_{i}, \tilde{{R}}_{i}, \tilde{\mathbf{t}}_{i}\) and \(\tilde{\mathbf{X}}_{j}\)), metric constraints are applied to the reconstructed projective projection matrices \(\hat{{P}}_{i}\) to recover the projective distortion matrix, H∈ℝ4×4 so that all the upgraded projection matrices \(\tilde{{P}}_{i}=\hat{{P}}_{i}{H}\) can be decomposed as \(\tilde{{P}}_{i}=\tilde{{K}}_{i} [ \tilde{{R}}_{i}^{T}|{-}\tilde{{R}}_{i}^{T} \tilde{\mathbf{t}}_{i} ]\) and the Euclidean shape is then given by \(\tilde{\mathbf{X}}_{j}\sim {H}^{-1}\hat{\mathbf{X}}_{j}\). A camera matrix K i can be parameterized as

$$ {K}_i=\left[ \begin{array}{c@{\quad }c@{\quad }c} f_i & s_i & u_i \\ 0 & \alpha_if_i & v_i \\ 0 & 0 & 1\end{array}\right] $$
(2)

where s i is the skew ratio, \([ u_{i}\;v_{i}\;1 ]^{T}\) is the principal point, f i is the scaling factor (focal length to pixel size ratio) and α i is the aspect ratio for the ith view. Substituting K i from (2) into (1) gives

$$ {\omega}_i^{\ast}={K}_i {K}_i^T= \left[ \begin{array}{c@{\quad }c@{\quad }c} f_i^2+s_i^2+u_i^2 & s_i\alpha_if_i+u_i v_i & u_i \\ s_i \alpha_i f_i+u_i v_i & \alpha_i^2 f_i^2+v_i^2 & v_i \\ u_i & v_i & 1\end{array}\right] . $$
(3)

To relate the projective distortion matrix H to the dual image of the absolute conic \({\omega}_{i}^{\ast }\), first denote H as

$$ {H}= [ {H}_1|\mathbf{H}_2 ] \in \mathbb{R}^{4\times 4} $$
(4)

where H 1∈ℝ4×3 is the first 3 columns of H and H 2∈ℝ4×1 is the last column of H. The upgraded projection matrix can be expressed as

$$ \tilde{{P}}_i=\hat{{P}}_i{H}= [ \hat{{P}}_i {H}_1|\hat{{P}}_i\mathbf{H}_2 ] =[ {M}_i|\hat{{P}}_i\mathbf{H}_2 ] $$
(5)

where M i is defined as

$$ {M}_i=\hat{{P}}_i{H}_1=\left[ \begin{array}{c} \mathbf{m}_{1i}^T \\ \noalign{\vspace*{3pt}} \mathbf{m}_{2i}^T \\ \noalign{\vspace*{3pt}} \mathbf{m}_{3i}^T\end{array}\right] \in \mathbb{R}^{3\times 3} $$
(6)

and m 1i ,m 2i and m 3i are 3-vectors. The absolute dual quadric Ω can be projected onto the image planes as the dual images of the absolute conic. Similarly to (1), the projection of the absolute dual quadric by the upgraded projection matrix \(\tilde{{P}}_{i}\) in (5) can also be expressed as

(7)

The both projections can be combined as

$$ {\omega}_i^{\ast}={K}_i {K}_i^T\sim {M}_i{M}_i^T= \hat{{P}}_i \bigl( {H}_1{H}_1^T \bigr) \hat{{P}}_i^T=\hat{{P}}_i {Q} \hat{{P}}_i^T $$
(8)

where \({Q}={H}_{1}{H}_{1}^{T}\) ∈ℝ4×4 is the absolute dual quadric, and the dual image of the absolute conic \({K}_{i}{K}_{i}^{T}\) is its dual image on the ith view.

In Sect. 5, we will show how to determine H 2 linearly by choosing the world origin at the centroid of the upgraded 3D points. In Sect. 6, a flexible approach to solve H 1 from user selected constraints (such as zero-skew constraint, unit aspect-ratio constraint and/or partial constant principal-point constraints) is proposed.

5 Estimating H 2

To estimate H 2, we choose the centroid of the scaled upgraded 3D points \(\upsilon_{j} \tilde{\mathbf{X}}_{j} = {H}^{-1}\hat{\mathbf{X}}_{j}\) at the origin so that

$$\sum_{j=1}^n \upsilon_j \tilde{\mathbf{X}}_j= \varUpsilon \left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array}\right] $$

where \(\varUpsilon=\sum_{j=1}^{n} \upsilon_{j}\) and \(\tilde{\mathbf{X}}_{j}\ (j=1, \ldots , n)\) are the upgraded 3D points in homogeneous coordinates in a Euclidean frame. The projection equation for \(\tilde{\mathbf{X}}_{j}\) can be expressed as

$$ \lambda_{ij}\mathbf{w}_{ij} = \hat{{P}}_i\hat{\mathbf{X}}_j = \tilde{{P}}_i ( \upsilon_j \tilde{\mathbf{X}}_j ). $$
(9)

The scale factor λ ij in (9) can be obtained from w ij , \(\hat{{P}}_{i}\) and \(\hat{{X}}_{j}\) in the reconstructed projective frame. Summing all scaled 2D points for the ith view, we get

(10)

Equation (10) can be formulated as a least-squares problem of estimating H 2 from all views, such that

$$ \left[ \begin{array}{c} \sum_{j=1}^n\lambda_{1j}\mathbf{w}_{1j} \\ \noalign{\vspace*{3pt}} \sum_{j=1}^n\lambda_{2j} \mathbf{w}_{2j} \\ \vdots \\ \sum_{j=1}^n\lambda_{mj} \mathbf{w}_{mj}\end{array}\right] = \left[ \begin{array}{c} \hat{{P}}_1 \\ \hat{{P}}_2 \\ \vdots \\ \hat{{P}}_{m}\end{array}\right] ( \varUpsilon \mathbf{H}_2 ). $$
(11)

H 2 can then be estimated by solving (11) as a linear least-squares problem by ignoring ϒ as H 2 can be determined up to scale. By counting argument, we require 3m≥4. Thus, the minimum number of views, m for solving H 2 is 2. This choice on the upgraded Euclidean coordinates is adapted from the approach of factorization method on orthogonal projection [30].

6 Estimating H 1 (Absolute Dual Quadric)

From (8), the absolute dual quadric Q is equal to \({H}_{1}{H}_{1}^{T}\) and is rank-3 as H 1 is rank-3. Denote Q as

$$ {Q}={H}_1{H}_1^T=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} q_1 & q_2 & q_3 & q_{4} \\ q_2 & q_{5} & q_{6} & q_{7} \\ q_3 & q_{6} & q_{8} & q_{9} \\ q_{4} & q_{7} & q_{9} & q_{10}\end{array}\right]. $$
(12)

Collect the 10 variables of Q into a vector as

$$ \mathbf{q}= \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} q_1 &q_2 &q_3 &q_{4} &q_{5} &q_{6} &q_{7}& q_{8} &q_{9}&q_{10} \end{array}\right]^T. $$
(13)

From (6) and (8), each \({K}_{i}{K}_{i}^{T}\) can be written as

(14)

By (8), each element of this matrix is linear in the 10 elements (i.e. q k , ∀k) of Q. The absolute dual quadric Q can be obtained by applying different constraints on the dual image of the absolute conics \({M}_{i}{M}_{i}^{T}\). Three different constraints on the dual image of the absolute conic will be considered here.

  1. 1.

    Zero-skew constraint

    Recalled that \(\tilde{{P}}_{i}=\tilde{{K}}_{i} [\tilde{{R}}_{i}^{T}|-\tilde{{R}}_{i}^{T}\tilde{\mathbf{t}}_{i} ]\) and using (5), we have \({M}_{i}=\tilde{{K}}_{i}\tilde{{R}}_{i}^{T}\). Let the orthogonal matrix \(\tilde{{R}}_{i}=[\mathbf{r}_{1i} \; \mathbf{r}_{2i} \; \mathbf{r}_{3i}] \in \mathbb{R}^{3 \times 3}\), the three columns of \({M}_{i}^{T}\) can be expanded as

    (15)

    Let us define the zero-skew constraint following Faugeras [3] as

    $$ \phi_z({M}_i)= ( \mathbf{m}_{1i}\times \mathbf{m}_{3i} ) \cdot ( \mathbf{m}_{2i}\times \mathbf{m}_{3i} ). $$
    (16)

    Expanding (16) by the expressions from (15), we have

    $$ \phi_z({M}_i)=\alpha_i f_i s_i. $$
    (17)

    Hence, the zero-skew constraint can be written as

    $$ \phi_z({M}_i)=0. $$
    (18)

    Equation (18) can be treated as a 4D ruled quadric in a 10D space as

    $$ \mathbf{q}^T{\varPhi}_i^z\mathbf{q}=0 $$
    (19)

    where \({\varPhi}_{i}^{z}\) is a 10×10 symmetric matrix and it is of rank-4. The derivation of an expression for \({\varPhi}_{i}^{z}\) and the proof of the rank-4 property are given in Sect. 9.

  2. 2.

    Unit aspect-ratio constraint

    Define the unit aspect-ratio constraint ϕ u (M i ) following Faugeras [3] as

    $$ \phi_u( {M}_i)=\vert \mathbf{m}_{1i}\times \mathbf{m}_{3i} \vert ^2-\vert \mathbf{m}_{2i}\times \mathbf{m}_{3i} \vert ^2. $$
    (20)

    Expanding (20) by (15), we have

    $$ \phi_u({M}_i)=|s_i \mathbf{r}_{1i}-f_i \mathbf{r}_{2i}|^2-\alpha_i^2f_i^2=s_i^2+ \bigl(1-\alpha_i^2\bigr)f_i^2. $$
    (21)

    For general cameras, s i is almost zero and the magnitude of f i is usually of the order of several thousands times that of s i . As f i s i , the skew factor s i is negligible compared with f i . When the zero-skew constraint is enforced, (21) is equal to zero only if \(\alpha_{i}^{2}=1\). The unit aspect-ratio constraint can therefore be imposed as

    $$ \phi_u({M}_i)=0. $$
    (22)

    Equation (22) can also be treated as a 4D ruled quadric in a 10D space as

    $$ \mathbf{q}^T{\varPhi}_i^u \mathbf{q}=0 $$
    (23)

    where the symmetric matrix \({\varPhi}_{i}^{u}\in \mathbb{R}^{10\times 10}\) is of rank-4. An expression for \({\varPhi}_{i}^{u}\) is given in Sect. 9.

  3. 3.

    Constant principal-point constraints

    For the ith and jth views, constant principal-point constraints consist of two equations u i =u j and v i =v j . By comparing the entries (1,3) and (2,3) of \({K}_{i}{K}_{i}^{T}\) and \({M}_{i}{M}_{i}^{T}\), the two constraints can be expressed as

    Both conditions can be transformed into quadratic equations in q, namely

    $$ \mathbf{q}^T{\varPhi}_{ij}^x \mathbf{q}=0 $$
    (24)

    and

    $$ \mathbf{q}^T{\varPhi}_{ij}^y \mathbf{q}=0 $$
    (25)

    where \({\varPhi}_{ij}^{x}\) and \({\varPhi}_{ij}^{y}\) are 10×10 symmetric matrices and they are of rank-4.

Equations (19), (23), (24) and (25) are quadratic equations in q. The derivations of expressions for \({\varPhi}_{i}^{z}\), \({\varPhi}_{i}^{u}\), \({\varPhi}_{ij}^{x}\) and \({\varPhi}_{ij}^{y}\) and the proof of the rank-4 properties are given in Sect. 9. It can be shown that each of \({\varPhi}_{i}^{z}\), \({\varPhi}_{i}^{u}\), \({\varPhi}_{ij}^{x}\) and \({\varPhi}_{ij}^{y}\) has 4 non-zero eigenvalues of which two are positive and two are negative. This kind of quadric is called a ‘ruled quadric’ [10].

6.1 Cost Function for Estimating the Absolute Dual Quadric Q

From the above formulations of the constraints, the determination of H 1 can be posed as a non-linear minimization problem with cost function

$$ \varepsilon_{Q}= \min_{\|\mathbf{q}\|=1} \sum_k^M \bigl\vert \mathbf{q}^T {\varPhi}_k \mathbf{q} \bigr\vert $$
(26)

where Φ k can be any ruled quadric representing constraints, k is the index for the summation over all the included constraints and M is the total number of selected constraints. For example, when there are m cameras and both zero-skew and unit aspect-ratio constraints are applied on all views, the number of constraints will be M=2m. In the form of (17) and (21), the constraints are weighted by different scaling factors such as α i f i and \(f_{i}^{2}\). To equalize this weighting effect on each Φ k , we divide each Φ k by its own eigenvalue with the largest magnitude.

6.2 Solving Its Upper Bound

Let us denote the eigenvalue decomposition of the symmetric matrix Φ k as \({V}_{k} {\varLambda}_{k} {V}_{k}^{T}\) where Λ k is a diagonal matrix containing all eigenvalues of Φ k and V k is an orthogonal matrix containing the corresponding eigenvectors. The eigenvalues in Λ k are sorted in descending order. Denote the diagonal sub-blocks of Λ k containing the positive, zero and negative eigenvalues as \({e}_{k}^{+} \in \mathbb{R}^{2 \times 2}\), 06×6, \({e}_{k}^{-} \in \mathbb{R}^{2 \times 2}\), respectively.

Each constraint can be expressed as

We then define

$$ {\varPhi}_k^* = {V}_k \left[\begin{array}{c@{\quad }c@{\quad }c} {e}_k^+ & {0} & {0} \\ {0} & {0}_{6 \times 6} & {0}\\ {0} & {0} & -{e}_k^-\\ \end{array}\right] {V}_k^T. $$
(27)

Note that \({\varPhi}_{k}^{*}\) is a positive semi-definite matrix. We can obtain an upper bound of (26) by a minimization problem

(28)

where \({\varPhi}^{*}= (\sum_{k=1}^{M} {\varPhi}_{k}^{*} )\). As each \({\varPhi}_{k}^{*}\) is previously normalized so its largest eigenvalue is 1, the relationship between ε Q and \({\varepsilon}_{Q}^{*}\) can be written as

$$ 0 \leq {\varepsilon}_{Q} \leq {\varepsilon}_{Q}^* \leq M. $$
(29)

The minimum of (28) is equal to the smallest eigenvalue of Φ in (28) and its corresponding eigenvector is chosen as q. As Φ ∈ℝ10×10, to obtain a unique solution to the minimization problem (28), Φ should be close to rank-9. From our experiments in real and synthetic data, the minimization problem (28) always returns a relatively small value of the order of 10−5 for each constraint.

6.3 Decomposing H 1 from Q

To compute H 1 from the estimated absolute dual quadric Q, Q must be a rank-3 positive semi-definite matrix. Empirically, when the number of views is large enough, Q formed from q by (12) and (13) will be close to rank-3 satisfying the positive semi-definite condition.

By means of singular value decomposition, the computed absolute dual quadric Q can be factorized as Q=USU T. Take

$$ {H}_1 = {U}_3({S}_3)^{\frac{1}{2}} \in \mathbb{R}^{4 \times 3} $$
(30)

where U 3 is the first three columns of U and S 3 is the left upper 3×3 matrix of S.

The proposed algorithm is shown in Algorithm 1.

Algorithm 1
figure 1

Determining Absolute Dual Quadric Q and projective distortion H

7 Self-calibration (Decomposition of K, R, t)

After the projective distortion matrix H has been recovered, all the projection matrices \(\hat{{P}}_{i}\) in the projective frame can be upgraded to a Euclidean frame as

$$ \tilde{{P}}_i=\hat{{P}}_i{H} $$
(31)

and the projective shape \(\hat{\mathbf{X}}\) can be upgraded to the Euclidean frame as

$$ \tilde{\mathbf{X}}_j\sim {H}^{-1}\hat{\mathbf{X}}_j. $$
(32)

To extract the intrinsic parameters and extrinsic parameters from projection matrices in the Euclidean frame, we can apply the QR factorization [10] to decompose the left-most 3×3 matrices of \(\tilde{{P}}_{i}\) to \(\alpha_{i}\tilde{{K}}_{i} \tilde{{R}}_{i}\). The decomposition should satisfy that all the diagonal values of \(\tilde{{K}}_{i}\) must be positive, \(\tilde{{K}}_{i} ( 3,3 ) =1\) and the determinant of the rotational matrix \(\tilde{{R}}_{i}\) should be equal to 1, i.e. \(\vert \tilde{{R}}_{i}\vert =1\). All the requirements can be enforced during the decomposition. Then the translation of the cameras can be easily obtained by applying \({-}\frac{1}{\alpha_{i}}\tilde{{R}}_{i}\tilde{{K}}_{i}^{-1}\) to the last column of \(\tilde{{P}}_{i}\). The upgraded Euclidean projection matrices are in the form of

$$\tilde{{P}}_i=\alpha_i\tilde{{K}}_i \bigl[ \tilde{{R}}_i^T|\mathbb{-}\tilde{{R}}_i^T \tilde{\mathbf{t}}_i \bigr]. $$

The Self-Calibration Algorithm is summarized in Algorithm 2.

Algorithm 2
figure 2

A Flexible Self-calibration Algorithm

8 Experimental Results

In this section, the proposed method is evaluated using synthetic data and real data. The reconstructed projective spaces are computed first by the methods proposed by Tang and Hung [13, 28] for minimizing the 2D reprojection error.

8.1 Synthetic Data

A synthetic scene has been constructed with 3 virtual grid planes (i.e. three planes of the blue dotted box) as shown in Fig. 1. On each plane, there are 25 lattice points of a 4×4 grid and the dimension of the grid is 0.4 m×0.4 m, so there are a total of 75 3D points in the scene. Each plane is perpendicular to the other two planes. The intrinsic parameters for all cameras are fixed as

$$\left[ \begin{array}{c@{\quad }c@{\quad }c} 2000 & 0 & 500 \\ 0 & 2000 & 500 \\ 0 & 0 & 1 \\ \end{array} \right]. $$

There are 10 cameras randomly located within the red dashed box (of size 3 m×3 m×2 m) with fixed intrinsic parameters and pointing towards the centroid of the 3D points such that the images of the 3D points almost fully occupy an image size of 1000×800 for all the views. The yellow pyramids in the Fig. 1 are the cameras. We will only apply the zero-skew and unit aspect-ratio constraints in the experiments. There are two sets of evaluation results. The first set is to evaluate the reconstructed scene. The angles between any two planes are then computed for assessing the orthogonality of the upgraded scene. The second set is a comparison between the estimated intrinsic parameters of the cameras and ground truth at different levels of Gaussian noise. Gaussian noise with standard deviations from 0 to 4 pixels are added to images with increments of 0.5 pixel. The tests are repeated 50 times and the mean values are computed.

Fig. 1
figure 3

Self-calibration: the configuration of the synthetic scene

8.1.1 Orthogonality of Planes in the Upgraded Scene

There are three orthogonal planes for this synthetic scene. In each trial, each plane is computed by minimizing the geometric error of the normal distances from the 3D points to the plane. RMS error of deviations of the angles from 90 are calculated, as shown in Fig. 2(a). The maximum derivation of angles is less than 0.14 even when the 2D points are contaminated by Gaussian noise with σ=4 pixels. The error of the angles is gradually increased with the level of added noise.

Fig. 2
figure 4

Performance on synthetic data

8.1.2 Performance on Intrinsic Parameters

Figure 2(b) shows how the estimated intrinsic parameters varies against noises. There are 50 trials for each noise level. Figure 2(b) present the intrinsic parameters in an arrangement similar to the matrix form of K. The unit aspect-ratio constraint works well since the two diagrams at the position (1, 1) and (2, 2) in Fig. 2(b) are almost the same and slightly larger than the expected value 2000 (but by no more than 0.5 %). The zero-skew constraint forces the skew factors to be around 0. The principal points are also around (500, 500).

Histograms showing the distributions of the estimated intrinsic parameters (for the 50 trials) for the cases that 2D points contaminated by low noise level and high noise level are given in Figs. 2(c) and 2(d) respectively. When Gaussian noise with standard deviation of 1 pixel is added, the variations of the parameters are highly concentrated around the ground truth and the maximum derivation is less than 3 % for the principal point and the focal length. When Gaussian noise with standard deviation of 4 pixels are added, the distribution of the parameters are spread wider and the corresponding maximum derivation is around 12 %.

8.2 Real Image Sequences

There are 12 real image sequences selected for testing the proposed method. The first 7 standard image sequences are obtained from the Visual Geometry Group (VGG) at University of Oxford. The remaining 6 image sequences are taken with the same DSLR camera and lens. The image sequences, ‘Mickey Mouse’ and ‘Dr. Sun Statue’, are taken with random motion while auto-focus function is enabled so that objects can be seen clearly and be located at the centers of each image. The other 4 image sequences (i.e., ‘Tigger’, ‘Spiderman’ and ‘Terra-cotta Warrior’) are taken with the objects undergoing circular motion on a turn table and the camera fixed on a tripod. Each shot is taken after the object is rotated 10 degrees. For real image sequences, we use only the zero-skew and unit aspect-ratio constraints since these two constraints can be assumed to be well satisfied in high quality cameras.

Figure 3 shows the results of two sets of VGG real data. Figure 3(a) shows the 1st image of the Model House image sequence. Most of the 2D corresponding points are lying on the 3 main planes in the scene, i.e., the floor, the front wall which is perpendicular to the floor and the front side of the roof. After applied our method on the projective reconstruction result by Hung and Tang [13], one of the images is projected on the upgraded 3D points as a textured 3D model and a side view of the textured model with the cameras are shown in Fig. 3(b). Similarly, another set of results for Merton College II sequence is shown in 3(d).

Fig. 3
figure 5

Some reconstructed scenes in Euclidean space

Two more sets of results reconstructed from our own image sequences are shown in Fig. 4. For the Dr. Sun Statue image sequence, Fig. 4(a) shows the 6th image. The cameras were moved around the statue by a photographer nearly on an eye-sight level horizontal plane. The reconstructed scene with cameras are shown in Fig. 4(b). From the side view of the textured 3D model in Fig. 4(c), the shape of Dr. Sun Statue can be seen clearly. Our method is also applied to upgrade a projective reconstruction from a circular motion image sequence as in Fig. 4(d). There are 36 images taken surrounding the Terra-cotta Warrior on a turn table with rotating angle in 10 degrees for every consecutive image. To illustrate the reconstructed shapes of the sculpture, there are 3 different views captured from different angles of the 3D cloud points from Fig. 4(e) to Fig. 4(g). The results from the other image sequences are not shown here as they are similar to the graphs shown.

Fig. 4
figure 6

Some reconstructed scenes in Euclidean space

Table 1 shows the performance of the proposed method by means of two sets of numerical data. The first set is for estimating the absolute dual quadric Q and the second set is for solving H 1. For the first results for estimating Q, the number of constraints, the 1st, 9th and 10th singular values of Φ are listed as M, s 1, s 9 and \({\varepsilon}_{{Q}}^{*}=s_{10}\) respectively. The results show that a distinctive null space of Φ can be identified for each of the data sets, despite the relaxation of the original minimization problem (26) to (28). The next column, ε Q (q), denotes the values of the original cost function (26) applied with the singular vectors corresponding to the smallest singular values s 10. It shows that all results satisfy the inequality relationship (29) and all \({\varepsilon}_{{Q}}^{*}=s_{10}\) are relatively very small compared with their corresponding s 1 and s 9. The rank-9 condition is fulfilled for estimating Q. The second set, namely the ratio s 3/s 1 and s 4/s 1 of the singular values of Q, shows that Q is approximately of rank-3.

Table 1 Performance on real image Sequences

8.3 Comparison

A linear method proposed by Pollefeys et al. [19] is implemented for comparison with simulation and real data. This linear method assumes that the varying focal lengths across multiple views are the only unknown. The zero-skew, unit-aspect ratio and known principal point constraints are directly enforced in the formulation. The image sequence, Oxford Model House from VGG is selected. The set of 2D corresponding points \(\hat{\mathbf{w}}_{ij}\) across multiple images, the reconstructed 3D points M j , and reconstructed camera intrinsic \(\hat{{K}}_{i}\) and extrinsic parameters, \(\hat{{R}}_{i}\) and \(\hat{\mathbf{t}}_{i}\) (i.e., the projection matrices \(\hat{{P}}_{i} = \hat{{K}}_{i} \hat{{R}}_{i}^{T} [{I}_{3} \ |{-}\hat{\mathbf{t}}_{i}]\)) are also provided and taken as ground truth in this paper. The image sizes are 768×576 pixels. For the reconstructed cameras in Euclidean space, the focal lengths are varying between 594 and 672 pixels and the maximum variation of the principal points between the 10 images is 93.5 pixels.

8.3.1 Synthetic Data

To compare both methods with the reconstructed data for simulation, we generated a random projective distortion matrix H∈ℝ4×4 to downgrade the Euclidean space down to a projective space. To satisfy the constraints of the method of [19], 2D points and projection matrices are first transformed by the known intrinsic parameters. Variations on the principal points are denoted as \((\Delta c_{i}^{x},\Delta c_{i}^{y})\) on x- and y-axes respectively. Hence, the projection matrices and 3D points in the projective space and the corresponding 2D points for ith view are as follows

(33)

and

(34)

where \(\bar{f}\) is the mean value of the ground truth focal lengths and it is 638.52 pixels. Both methods are applied to upgrade the projective space back to Euclidean spaces and their estimated focal lengths in x- and y-axes for the ith view and the kth trial are denoted as \(f^{x}_{ik}\) and \(f^{y}_{ik}\) respectively. Our method is used with the zero-skew and unit-aspect ratio constraints only. The root mean square error (RMSE) on the focal lengths, ϵ, across multiple views and trials for a given level of variation on the principal points, it is expressed as

$$ \epsilon_F = \sqrt{ {\frac{1}{2mL} \sum _{k,i} \bigl\{ \bigl(f^x_{ik}-\bar{f}\, \bigr)^2 + \bigl(f^y_{ik}-\bar{f}\, \bigr)^2 \bigr\}}} $$
(35)

where the number of trials is L and the number of view is m. After 200 trials (i.e., L=200), the results are summarized in Fig. 5. Except for the cases when added noise levels are less than 10 pixels, there are cases that the method of [19] failed to return reasonable solutions. The failed cases usually will be rank-deficient on solving the least-square problem, the reconstructed space is still as mess as a projective space or becomes a planar object, etc. The ratio of failed cases versus added noises on the principal points is shown in Fig. 5(a). The percentage of failed cases can be more than 50 % in some noise levels. After added 30 pixels shift on the principal points, the percentage of failed cases become at least 30 %. However, in this test, our proposed method always return a reasonable solution.

Fig. 5
figure 7

Comparison on simulated distortion

Figure 5(b) shows the RMSE of focal lengths ϵ F versus the added noises on the principal points. The failed cases from the method of [19] are removed for plotting Fig. 5(b). The method of [19] can return exact solutions when the noise level is zero. The RMSE of focal lengths is almost proportional to the added noise levels on the principal points when the failed cases are taken out. Our method cannot return the exact solution even though there is no noise added. It is because the original minimization problem (26) is replaced by its upper bound (28). There is nothing to force the minimum values of both minimization problems to be the same. Our method is almost invariant to the added noises and the returned errors on the focal lengths maintain a certain amount. To illustrate how the upgraded 3D spaces look like, we selected the results for the case that principal points are shifted by 200 pixels and the method of [19] can return a reasonable solution. The results of the two methods are shown in Figs. 5(c) and 5(d). Our proposed method can return a better result that the wall is much closer to be perpendicular to the floor plane.

8.3.2 Real Data

Based on the same set of Oxford Model House in the previous section, we use the projective bundle adjustment method proposed by Hung and Tang [13] to reconstruct 3D points from 2D correspondences across multiple views. The 2D points in (34) is first shifted (the original 2D points also contain 2D errors) by \((\Delta c_{i}^{x},\Delta c_{i}^{y})\) on the principal points and passed into projective reconstruction method [13]. The method of [19] and our method are then applied on the projective spaces for upgrading. There are also 200 trials for each noise level. The results are shown in Fig. 6. The RMSE 2D reprojection errors from the projective bundle adjustment [13] are around 0.5 pixels. From Fig. 6(a), our method is more stable than the method of [19]. There is no failed case reported for our proposed method. Figure 6(b) shows that our method is still invariant to any variations on the principal points while computing the absolute dual quadrics. Similarly, the side views of the reconstructed 3D points from our method and the method of [19] are shown in Figs. 6(c) and 6(d) respectively.

Fig. 6
figure 8

Comparison on modified real data

9 Proof for Rank-4 Properties of Subspace Constraints

To prove the rank-4 properties, we first derive the dual image of the absolute conic in term of \({H}_{1}{H}_{1}^{T}\) and P. For simplicity, all the sub-indices i will not be shown. The projective projection matrix \(\hat{{P}}\) can be denoted as \(\hat{{P}}= [ \mathbf{p}_{1}\;\mathbf{p}_{2}\;\mathbf{p}_{3} ]^{T}\), where p k is the 4-vector of the kth row of \(\hat{{P}}\). The dual image of the absolute conic ω can be expressed as

(36)

The 3×3 matrix in (14) is exactly the same as (36) elementwise. The derivations for each constraint are shown with the notation in (36).

9.1 Zero-Skew Constraint

The zero-skew constraint from (18) is expressed in notation of (6) as

$$( \mathbf{m}_1\times \mathbf{m}_3 ) \cdot ( \mathbf{m}_2 \times \mathbf{m}_3 ) =0. $$

To relate this constraint to the quantities in (14), the cross product operators should be transformed to dot product operators. First, by applying a 3D cross product property, A⋅(B×C)≡−C⋅(B×A), the zero-skew constraint becomes

$$-\mathbf{m}_3\cdot \bigl[ \mathbf{m}_2\times ( \mathbf{m}_1\times \mathbf{m}_3 ) \bigr] =0. $$

To expand the rest of cross product, another property A×(B×C)≡B(AC)−C(AB) is applied, then,

(37)

Replacing the entries from (36), (37) becomes

(38)

Applying Kronecker product notation to (38), it becomes

(39)

where h is defined as the vector obtained by applying the stack operator to \({H}_{1}{H}_{1}^{T}\) (i.e. link up all the columns from left to right into a single vector),

(40)

Let us define v 1=p 3p 1, v 2=p 3p 2, v 3=p 2p 1 and v 4=p 3p 3. (39) can be simplified as

$$f_z( \mathbf{h})= \mathbf{h}^T \bigl( \mathbf{v}_1 \mathbf{v}_2^T - \mathbf{v}_3 \mathbf{v}_4^T \bigr) \mathbf{h}=0. $$

The quadratic forms, \(\mathbf{v}_{1} \mathbf{v}_{2}^{T}\) and \(\mathbf{v}_{3} \mathbf{v}_{4}^{T}\) can always be written as a sum of squares of linear functions of h as

$$f_z( \mathbf{h})=\frac{1}{4} \mathbf{h}^T\left\{ \begin{array}{l} ( \mathbf{v}_1 + \mathbf{v}_2)( \mathbf{v}_1 + \mathbf{v}_2)^T\\ \quad {} - (\mathbf{v}_1 - \mathbf{v}_2)(\mathbf{v}_1 - \mathbf{v}_2)^T \\ \quad {}-( \mathbf{v}_3 + \mathbf{v}_4)( \mathbf{v}_3 + \mathbf{v}_4)^T\\ \quad {} + ( \mathbf{v}_3 - \mathbf{v}_4)(\mathbf{v}_3 - \mathbf{v}_4)^T \end{array} \right\} \mathbf{h}. $$

f z (h) can be expressed as

$$ f_z( \mathbf{h}) = \frac{1}{4} \mathbf{h}^T {T}_z^T \operatorname{diag}(1,-1,-1,1)\ {T}_z \mathbf{h} $$
(41)

where

$${T}_z=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \mathbf{v}_1 + \mathbf{v}_2 & \mathbf{v}_1 - \mathbf{v}_2 & \mathbf{v}_3 + \mathbf{v}_4 & \mathbf{v}_3 - \mathbf{v}_4 \end{array} \right]^T \in \mathbb{R}^{4 \times 16}. $$

Since \({H}_{1}{H}_{1}^{T}\) is a symmetric matrix, among the 16 elements of h, only 10 are independent variables. Hence, h can be expressed in terms of q∈ℝ10×1 by means of a binary matrix, Z∈ℝ16×10 as

$$ \mathbf{h}={Z} \mathbf{q}. $$
(42)

Substitute (42) into (41), we have

(43)

where

$$ {\varPhi}^z= \frac{1}{4} {Z}^T {T}_z^T\operatorname{diag}(1,-1,-1,1)\ {T}_z {Z} \in \mathbb{R}^{10 \times 10}. $$
(44)

In general, the projection matrix \(\hat{{P}} = [ \mathbf{p}_{1}\ \mathbf{p}_{2}\ \mathbf{p}_{3}]^{T}\) is of full rank so that its 3 row vectors are linear independent to each other. T z is then rank-4 since its 4 row vectors will also be linear independent. By (44), the quadratic form f z (q) consists of two positive squares and two negative squares. By the law of inertia for quadratic forms [6], the number of positive and negative squares are invariant to the choice of basis. Hence, Φ z is a rank-4 10×10 symmetric matrix with two positive and two negative eigenvalues.

9.2 Unit Aspect-Ratio Constraint

This constraint can be expressed in the form of cross product as

$$ \vert \mathbf{m}_1\times \mathbf{m}_3\vert ^2= \vert \mathbf{m}_2\times \mathbf{m}_3\vert ^2 . $$
(45)

By applying the above two cross product properties, it can be shown that (45) is equivalent to

Similar to the development for zero-skew constraint, we can apply the Kronecker product on (45) to express (45) in terms of q as

(46)

Let v 5=p 1p 3, v 6=p 2p 3 and v 7=p 2p 2p 1p 1. Then (46) can be simplified as

$$f_u( \mathbf{q})= \mathbf{q}^T{Z}^T \bigl\{ \mathbf{v}_5 \mathbf{v}_5^T - \mathbf{v}_6 \mathbf{v}_6^T + \mathbf{v}_7 \mathbf{v}_4^T \bigr\} {Z} \mathbf{q} =0. $$

Reform the quadratic forms as a symmetric matrix, we have

where

$${T}_u = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \mathbf{v}_5 & \mathbf{v}_6 & \mathbf{v}_7 + \mathbf{v}_4 & \mathbf{v}_7 - \mathbf{v}_4 \end{array} \right]^T \in \mathbb{R}^{4 \times 16} $$

and the symmetric matrix Φ u between the q T and q can be expressed as

$$ {\varPhi}^u = \frac{1}{4} {Z}^T {T}_u^T\operatorname{diag}(2,-2,1,-1) \ {T}_u {Z} \in \mathbb{R}^{10 \times 10}. $$
(47)

By the law of inertia for quadratic forms [6], Φ u is also rank-4 with two positive and two negative eigenvalues.

9.3 Partial Constant Principal-Point Constraints

For a camera with fixed intrinsic parameters, the principal point is the same across all the taken images. In auto-focusing or zooming operations, the principal point may vary. However, calibration experiments suggest the variation of the principal point is small. Applying this assumption, any image pair from the same camera can provide two additional constraints (i.e. for the two components of the principal point in the 2D plane). As these constraints are across a pair of two images, let us denote the principal points as \([ u_{i}\;v_{i}\;1 ]^{T}\) for the ith view and \([ u_{j}\;v_{j}\;1 ]^{T}\) for the jth view.

If u i =u j , we have

$$\frac{\mathbf{m}_{1i}\cdot \mathbf{m}_{3i}}{ \mathbf{m}_{3i}\cdot \mathbf{m}_{3i}}=\frac{ \mathbf{m}_{1j}\cdot \mathbf{m}_{3j}}{ \mathbf{m}_{3j}\cdot \mathbf{m}_{3j}} $$

where the subscript i and j are referred to ith and jth views respectively. Expressed in terms of \(\hat{{P}}\) and H 1 from (36), we have

Using Kronecker product, this becomes

Let v 8=p 3i p 1i , v 9=p 3j p 3j , v 10=p 3j p 1j and v 11=p 3i p 3i and reform it as quadratic form. We have

where

$$\mbox{\small\selectfont $ {T}_{x} = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \mathbf{v}_{8} + \mathbf{v}_{9} & \mathbf{v}_{8} - \mathbf{v}_{9} & \mathbf{v}_{10} + \mathbf{v}_{11} & \mathbf{v}_{10} - \mathbf{v}_{11} \end{array} \right]^{T} \in \mathbb{R}^{4 \times 16}$} $$

and

$$ {\varPhi}_{ij}^x=\frac{1}{4} {Z}^T {T}_x^T \operatorname{diag}(1,-1,-1,1)\ {T}_x {Z} \in \mathbb{R}^{10 \times 10}. $$
(48)

Clearly, \({\varPhi}_{ij}^{x}\) is at most of rank-4 and has two positive and two negative eigenvalues. From the second constraint v i =v j , we have

$$\frac{\mathbf{m}_{2i}\cdot \mathbf{m}_{3i}}{\mathbf{m}_{3i}\cdot \mathbf{m}_{3i}}=\frac{ \mathbf{m}_{2j}\cdot \mathbf{m}_{3j}}{\mathbf{m}_{3j}\cdot \mathbf{m}_{3j}}. $$

Expressed in terms of \(\hat{{P}}\) and H 1 from (36) and followed by Kronecker product, we have

Let v 12=p 3i p 2i , v 13=p 3j p 2j and reform it as quadratic form. We have

$$f_y( \mathbf{q})=\frac{1}{4} \mathbf{q}^T{Z}^T {T}_y^T\operatorname{diag}(1,-1,-1,1)\ {T}_y {Z} \mathbf{q} = \mathbf{q}^T{\varPhi}_{ij}^y \mathbf{q} = 0 $$

where

$$\mbox{\small\selectfont $ {T}_{y} = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \mathbf{v}_{12} + \mathbf{v}_{9} & \mathbf{v}_{12} - \mathbf{v}_{9} & \mathbf{v}_{13} + \mathbf{v}_{11} & \mathbf{v}_{13} - \mathbf{v}_{11} \end{array} \right]^{T} \in \mathbb{R}^{4 \times 16}$} $$

and

$$ {\varPhi}_{ij}^y=\frac{1}{4} {Z}^T {T}_y^T \operatorname{diag}(1,-1,-1,1)\ {T}_y {Z} \in \mathbb{R}^{10 \times 10}. $$
(49)

Clearly, \({\varPhi}_{ij}^{y}\) is at most of rank-4 and has two positive and two negative eigenvalues.

The above two constraints on the x, y-coordinates of the principal point can be applied independently and the constraints can be applied to any image pair captured by the same camera. There is, however, no restriction that the same camera is used to capture the whole image sequence.

10 Additional Constraints

In this section, we develop some additional constraints for a priori information about the cameras. The previous constraint expressing technique can be further applied on the new constraints. They can also be related in the same framework so that different constraints can be applied on different cameras even the new constraints are linear to the absolute dual quadric Q but not in quadratic form.

10.1 Known Principal Points

When the principal points are known for some views (or all views), it is possible to apply an 2D translation for those views such that the translated principal points become (0, 0). Assuming that the principal point of the ith view is known, after the translation of the principal point to the origin, the dual image of the absolute conic of the ith view becomes

$$ {K}_i{K}_i^T= \left[ \begin{array}{c@{\quad }c@{\quad }c} f_i^2+s_i^2 & s_i\alpha_if_i & 0 \\ s_i\alpha_if_i & \alpha_i^2f_i^2 & 0 \\ 0 & 0 & 1 \end{array} \right] . $$
(50)

Comparing (50) with (14), we can deduce two constraints on q as

$$ \mathbf{m}_{1i}\cdot \mathbf{m}_{3i}=0\quad \text{and}\quad \mathbf{m}_{2i}\cdot \mathbf{m}_{3i}=0. $$
(51)

Both conditions are linear in q and can be expressed as

$$ {\varPhi}_i^{x0} \mathbf{q}=0\quad \text{and}\quad {\varPhi}_i^{y0} \mathbf{q}=0 $$
(52)

where \({\varPhi}_{i}^{x0}\) and \({\varPhi}_{i}^{y0}\in \mathbb{R}^{1\times 10}\). Expressing these conditions by Kronecker product, we get

$$ {\varPhi}_i^{x0}= \bigl( \mathbf{p}_{3i}^T \otimes \mathbf{p}_{1i}^T \bigr) {Z}\quad \text{and}\quad {\varPhi}_i^{y0}= \bigl( \mathbf{p}_{3i}^T \otimes \mathbf{p}_{2i}^T \bigr) {Z}. $$
(53)

Both \({\varPhi}_{i}^{x_{0}}\) and \({\varPhi}_{i}^{y_{0}}\) are first scaled to unit norm vectors as \(\vert {\varPhi}_{i}^{x0}\vert =\vert {\varPhi}_{i}^{y0}\vert =1\). To integrate with the previous constraints in (26), the above linear constraints (52) should be transformed as \(\mathbf{q}^{T} \{ ({\varPhi}_{i}^{x0} )^{T} {\varPhi}_{i}^{x0} \} \mathbf{q}=0\), where \(({\varPhi}_{i}^{x0} )^{T} {\varPhi}_{i}^{x0} \in \mathbb{R}^{10 \times 10}\) is rank-1 matrix. These constraints can also be used independently to determine Q by at least 5 (>9/2) cameras having known principal points instead.

10.2 Known Principal Points and Euclidean Image Planes

An Euclidean image plane satisfies the zero-skew constraint and unit aspect-ratio constraint. Applying these two more constraints to (50), we have

$${K}_i{K}_i^T= \left[ \begin{array}{c@{\quad }c@{\quad }c} f_i^2 & 0 & 0 \\ 0 & f_i^2 & 0 \\ 0 & 0 & 1\end{array} \right]. $$

It follows from (14) and (50) that the zero-skew constraint can be simply expressed as

$$\mathbf{m}_{1i}\cdot \mathbf{m}_{2i}=0 $$

and the unit aspect-ratio constraint can be written

$$\mathbf{m}_{1i}\cdot \mathbf{m}_{1i}= \mathbf{m}_{2i}\cdot \mathbf{m}_{2i}. $$

Both conditions can be transformed as

$$ {\varPhi}_i^{z0} \mathbf{q}=0\quad \text{and}\quad {\varPhi}_i^{u0} \mathbf{q}=0 $$
(54)

where

$${\varPhi}_i^{z0}= \bigl( \mathbf{p}_{2i}^T \otimes \mathbf{p}_{1i}^T \bigr) {Z}\in \mathbb{R}^{1\times 10} $$

and

$$ {\varPhi}_i^{u0}= \bigl( \mathbf{p}_{1i}^T \otimes \mathbf{p}_{1i}^T- \mathbf{p}_{2i}^T \otimes \mathbf{p}_{2i}^T \bigr) {Z}\in \mathbb{R}^{1\times 10}. $$
(55)

After scaling both \({\varPhi}_{i}^{z0}\) and \({\varPhi}_{i}^{u0}\) to unit norm vectors as \(\vert {\varPhi}_{i}^{z0}\vert = \vert {\varPhi}_{i}^{u0}\vert =1\), they can also be rewritten in quadratic form as rank-1 matrices and integrated with the previous constraints in (26). Combining these two constraints with (52), Q can be solved independently by at least 3 views (>9/4) having known principal points and Euclidean image planes.

11 Conclusion

A flexible self-calibration algorithm is proposed to deal with the problem of recovering projective distortion matrix to upgrade a projective frame to a Euclidean frame. The common metric constraints for self-calibration have been unified in a common framework where they are represented as 4D ruled quadrics in a 10D space. This common framework is very flexible for customizing different metric constraints in different camera configurations. The projective distortion matrix can be obtained by minimizing a single cost function, namely (26). In practice, we proposed to minimize an upper bound of the cost function, and experiments show that the results are very satisfactory both in the case of synthetic data given in Sect. 8.1 and real data given in Sect. 8.2. The results should be further refined using different types of iterative non-linear algorithms or Euclidean bundle adjustment. The proposed method can provide a flexible and reliable starting point for further Euclidean bundle adjustments.