A revisit of the normalized eight-point algorithm and a self-supervised deep solution

Fan, Bin; Dai, Yuchao; Seo, Yongduek; He, Mingyi

doi:10.1007/s44267-024-00035-0

A revisit of the normalized eight-point algorithm and a self-supervised deep solution

Research
Open access
Published: 18 February 2024

Volume 2, article number 3, (2024)
Cite this article

Download PDF

You have full access to this open access article

Visual Intelligence Aims and scope Submit manuscript

A revisit of the normalized eight-point algorithm and a self-supervised deep solution

Download PDF

775 Accesses
1 Citation
Explore all metrics

Abstract

The normalized eight-point algorithm has been widely viewed as the cornerstone in two-view geometry computation, where the seminal Hartley’s normalization has greatly improved the performance of the direct linear transformation algorithm. A natural question is, whether there are and how to find other normalization methods that can further improve the performance for each input sample. In this paper, we provide a novel perspective and propose two contributions to this fundamental problem: (1) we revisit the normalized eight-point algorithm and make a theoretical contribution by presenting the existence of different and better normalization algorithms; (2) we introduce a deep convolutional neural network with a self-supervised learning strategy for normalization. Given eight pairs of correspondences, our network directly predicts the normalization matrices, thus learning to normalize each input sample. Our learning-based normalization module can be integrated with both traditional (e.g., RANSAC) and deep learning frameworks (affording good interpretability) with minimal effort. Extensive experiments on both synthetic and real images demonstrate the effectiveness of our proposed approach.

Deep Fundamental Matrix Estimation Without Correspondences

SIM2E: Benchmarking the Group Equivariant Capability of Correspondence Matching Algorithms

Deep Fundamental Matrix Estimation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Geometric computation has long been one of the major issues in computer vision. In particular, two-view geometry computation is a central building block for three-dimensional (3D) modeling and camera motion estimation. For example, self-driving is implemented through the technology of simultaneous localization and mapping (SLAM) and the structure from motion (SfM) algorithm. Among many important core algorithms, the eight-point algorithm [1] computes the fundamental matrix from a set of eight or more point correspondences between two views, which has the advantage of the simplicity of implementation. However, it was extremely susceptible to image noise and hence was of very limited practical use until Hartley devised a normalized eight-point algorithm in his seminal work [2], which shows that by preceding the algorithm with a data normalization (translation and scaling) of the coordinates of the correspondences, the results obtained are comparable to those of the best iterative algorithms. As a consequence, with its simple strategy of translation and scaling, the isotropic normalization, now termed as Hartley’s normalization, has gradually become an indispensable component of many geometric computations not only for fundamental matrix estimation [3] but also for homography [4], ellipse fitting [5], bundle adjustment [6], etc.

One particular aspect of Hartley’s normalization in regard to the direct linear transformation (DLT) formulation of the fundamental matrix computation is that it allows the DLT solution to possess a better condition number. Therefore, when the solution matrix is enforced to have rank 2, a much more stable estimate of the fundamental matrix is obtained; this is important because it is the starting point of all the structure and motion computations such as guided correspondence search, camera and structure optimization, and 3D reconstruction for more than two views. Consequently, enforcing the rank-2 constraint as much as possible at the DLT stage becomes an interesting topic of study. For example, Mühlich and Mester [7] performed a statistical analysis to obtain an optimal data normalization for DLT fundamental matrix computation and showed that Hartley’s normalization can be expected to work well, although it is not identical to the optimal transformation. Mair et al. [8] performed further error analysis to obtain a better performance than Hartley’s eight-point algorithm. The work of da Silveira and Jung [9] presented a perturbation analysis of the eight-point algorithm for a wide field of view cameras. In contrast to these works based on statistical analysis, this paper tries to determine the mechanism of data normalization through deep learning without specific statistical modeling. Considering that the fundamental matrix estimation is strongly affected by the error distribution of the feature matching algorithm, we argue that a data normalization scheme can be exploited to achieve DLT solutions of improved rank-2 condition by learning the error distribution from the data themselves; this approach coincides with the views of Refs. [7, 9, 10]. In particular, as displayed in Fig. 1, we propose to learn a data-driven normalization scheme under the standard configuration of eight correspondences.

Currently, the success of deep learning in high-level vision tasks has been gradually extended to multi-view geometry problems such as homography [11], fundamental matrix [12], bundle adjustment [13], plane sweeping [14, 15], and rolling-shutter modeling [16, 17]. However, this success has not been extended to the normalized eight-point algorithm and a different or better normalization scheme has so far not been presented nor replaced by the deep learning pipeline. This is mainly due to the following obstacles: (1) gradient descent cannot be trivially applied as mentioned in Ref. [12]; (2) the network must be invariant to the permutation of the correspondences, i.e., different orderings of the input data should produce the same normalization; and (3) a large amount of labeled input and output data should be used for supervised learning (in this case, the input is eight-point correspondences and the output is the optimal data normalization). In this paper, we overcome these problems by back-propagating through a singular value decomposition (SVD) layer and using a self-supervised learning mechanism in the permutation-invariant network architecture; this also solves the problem of having to train large amounts of data. Our approach not only produces an interpretable pipeline of fundamental matrix estimation but can also be easily embedded in other robust frameworks such as the differentiable random sample consensus (RANSAC) [18]. Through experiments, our learning-based normalization demonstrated superior performance to Hartley’s normalization and a good generalization ability across different datasets. Our main contributions can be summarized as follows.

(1)
We propose a self-supervised learning-based deep solution for normalizing DLT fundamental matrix estimation under the standard configuration of eight point correspondences.
(2)
We make a theoretical contribution by demonstrating the existence of different and better normalization algorithms beyond Hartley’s normalization.
(3)
Extensive experiments on both synthetic and real images demonstrate the effectiveness and good generalizability of our proposed approach.

2 Related work

In this section, we briefly review related work in traditional two-view geometry computation and deep learning-based multi-view geometry learning.

2.1 Two-view geometry estimation

The normalized eight-point algorithm [2] significantly improves the numerical accuracy of the fundamental matrix and extends the scope of applications due to the improved condition number of the hand-designed normalization scheme. Since this seminal work, there have been various follow-up studies on the uncertainty in fundamental matrix estimation and the relationships between the epipolar constraint and corresponding errors. Csurka et al. [19] proposed a method to simultaneously estimate the fundamental matrix and its uncertainty. Mühlich and Mester [10] concluded that the normalization strategy can ensure that the two-view non-iterative motion estimation algorithm maintains unbiasedness and consistency. They further introduced a normalization transformation scheme based on the bound of epipolar constraint errors obtained by assuming known feature matching covariance, which was also used to extend the existing first-order error propagation analysis of the eight-point algorithm in Ref. [8]. However, this approach was still not optimal because the error distribution of the input data was not considered [7]. The closed-form computation of the uncertainty of the fundamental matrix was presented in Ref. [20] to recover correspondences via the uncertain equilibrium of motion estimation. Chojnacki and Brooks [21] revisited the normalized eight-point algorithm and presented a statistical model of data distribution by merging the statistical approach of Ref. [10], which was further extended in Ref. [22] by introducing a structured model for the data distribution. In addition, da Silveira and Jung [9] performed perturbation analysis for the fundamental matrix estimation without considering any kind of matching error distribution.

2.2 Deep learning-based geometry estimation

Recently, the success of deep learning in high-level vision tasks has been gradually extended to various multi-view geometry estimation problems. DeTone et al. [11] employed a deep convolutional neural network (CNN) to regress a homography from a pair of input images in an end-to-end manner. A follow-up study [23] developed the unsupervised variant by replacing direct supervision with image-based loss. This pipeline has been extended to fundamental matrix estimation, where a fundamental matrix is directly regressed from a pair of stereo images without correspondences [24]. Ranftl and Koltun [12] treated the fundamental matrix estimation problem as a weighted homogeneous least-squares problem, where the matching weights and fundamental matrix are simultaneously estimated by using supervised deep networks. With the availability of camera intrinsics, Yi et al. [25] recovered the essential matrix from putative correspondences with little training data and limited supervision, thus finding good correspondences for wide-baseline stereo. Furthermore, Probst et al. [26] proposed an unsupervised learning framework for consensus maximization in the context of solving 3D vision problems such as 3D-3D matching [27, 28] and image-to-image matching (homography and fundamental matrix). DSAC [18] is a differentiable counterpart of RANSAC and can also be leveraged as a robust optimization component for other deep learning pipelines.

Different from existing work in deep learning-based multi-view geometry computation, our self-supervised learning strategy removes the need for supervisory signals and thus generalizes well across different datasets. Furthermore, our learning-based normalization module can be integrated with both traditional and deep learning frameworks.

3 A revisit of the normalized eight-point algorithm

We use capital letters, A, B, etc., to denote matrices. The operation of reshaping a matrix into a vector is denoted by $\mathrm{vec}(\cdot )$, defined as $\mathrm{vec}(\boldsymbol{A}) = [\boldsymbol{a}_{1}^{T},\ldots, \boldsymbol{a}_{N}^{T}]^{T}$, where $\boldsymbol{a}_{i}$ is the i-th column vector of A and N is the number of columns. Its inverse operation is denoted as $\mathrm{mat}(\cdot )$.

Given a pair of correspondences $\boldsymbol{u}'_{i}$ and $\boldsymbol{u}_{i}$ between two views, the epipolar constraint is expressed as

$$\begin{aligned} {\boldsymbol{u}'_{i}{^{T}}} \boldsymbol{F} \boldsymbol{u}_{i} = 0, \end{aligned}$$

(1)

where $\boldsymbol{F}= [f_{ij}]$ is a $3 \times 3$ matrix of rank 2, termed as the fundamental matrix. Collecting $N=8$ point correspondences $\{ (\boldsymbol{u}'_{i}, \boldsymbol{u}_{i}) | i=1,\ldots,8\}$, i.e., the standard configuration, we may rewrite Eq. (1) as a linear equation of f:

$$\begin{aligned} \centering {\boldsymbol{A}} \boldsymbol{f} = 0, \end{aligned}$$

(2)

where $\boldsymbol{f}= \mathrm{vec}(\boldsymbol{F}^{T})$ is a nine-dimensional vector composed of stacked columns of $\boldsymbol{F}^{T}$, and $\boldsymbol{A}= [\boldsymbol{a}_{1}, \ldots, \boldsymbol{a}_{8}]^{T}$ is the $8\times 9$ coefficient matrix with $\boldsymbol{a}_{i} = \mathrm{vec}(\boldsymbol{u}_{i} { \boldsymbol{u}'}_{i}^{T})$ for $i=1,\ldots,8$. This approach provides the DLT formulation for computing F, and a solution may be obtained through SVD of A.

Despite its simplicity, the computation of the DLT for the eight-point algorithm [1] is extremely susceptible to noise in the image coordinate measurements. In the seminal work [2], Hartley showed that the precision of the eight-point algorithm can be greatly improved by proper normalization of the image coordinates; this approach is the classic normalized eight-point algorithm. Hartley’s normalization is designed to compute image translation and scaling such that the average distance of the transformed coordinates from the origin is $\sqrt {2}$:

$$\begin{aligned} \boldsymbol{T}_{\mathrm{H}} = \begin{bmatrix} s&{}&{ - so_{1}} \\ {}&s&{ - so_{2}} \\ {}&{}&1 \end{bmatrix}, \end{aligned}$$

(3)

with s, $o_{1}$ and $o_{2}$ given by

$$\begin{aligned} o_{j} = \frac{1}{N}\sum_{i = 1}^{N} {\boldsymbol{u}_{i}^{{{(j)}}}} \quad{\mathrm{{and}}}\quad s = \frac{{\sqrt {2} }}{{\frac{1}{N}\sum_{i = 1}^{N} { \Vert {\boldsymbol{u}_{i} - \boldsymbol{o}} \Vert _{2}} }}, \end{aligned}$$

(4)

where the superscript j denotes the j-th entry of vector $\boldsymbol{u}_{i} $. Given two normalization matrices $\boldsymbol{T}'$ and T, Eq. (2) is transformed to

$$\begin{aligned} \hat{{\boldsymbol{A}}} \hat{\boldsymbol{f}} = \mathbf{0}, \end{aligned}$$

(5)

where $\hat{{\boldsymbol{A}}} = [\hat{{\boldsymbol{a}}}_{1},\ldots, \hat{{\boldsymbol{a}}}_{8} ]^{T} $ is the transformed coefficient matrix with $\hat{{\boldsymbol{a}}}_{i} = {\mathrm{{vec}}}( \hat{\boldsymbol{u}}_{i}\hat{\boldsymbol{u}}'_{i}{^{T}}) = { \mathrm{{vec}}}( \boldsymbol{T}\boldsymbol{u}_{i} \boldsymbol{u}'_{i}{^{T}} \boldsymbol{T}^{\prime T} )$. In summary, the normalized eight-point algorithm mainly includes the following three steps.

(1)
Normalization: Transform the input image coordinates according to $\hat{\boldsymbol{u}}'_{i} = \boldsymbol{T}' \boldsymbol{u}'_{i}$ and $\hat{\boldsymbol{u}}_{i} = \boldsymbol{T} \boldsymbol{u}_{i}$.
(2)
Compute the corresponding fundamental matrix $\hat{{\boldsymbol{F}}}'$ to normalize data by
1. (a)
  Direct linear transform: Determine $\hat{{\boldsymbol{F}}} = {\mathrm{{mat}}}(\hat{\boldsymbol{f}}) $ from the right singular vector $\hat{\boldsymbol{f}}$ corresponding to the smallest singular value of $\hat{\boldsymbol{A}} $ defined in Eq. (5).
2. (b)
  Singularity constraint enforcement: Replace $\hat{{\boldsymbol{F}}} $ by $\hat{{\boldsymbol{F}}}'=\hat{{\boldsymbol{U}}}{\mathrm{{diag}}}(r_{1},r_{2},0) \hat{{\boldsymbol{V}}}^{T}$, where $\hat{{\boldsymbol{F}}} = \hat{{\boldsymbol{U}}} \hat{{\boldsymbol{D}}} \hat{{\boldsymbol{V}}}^{T}$ with $\hat{\boldsymbol{D}} $ is a diagonal matrix $\hat{\boldsymbol{D}} = {\mathrm{{diag}}}(r_{1},r_{2},r_{3})$ satisfying $r_{1} \ge r_{2} \ge r_{3} $.
(3)
Denormalization: Set ${{\boldsymbol{F}}} = {{\boldsymbol{T}}}^{\prime T} \hat{{\boldsymbol{F}}}' {{\boldsymbol{T}}} $.

The condition number of A is defined as ${\kappa }(\boldsymbol{A}) = \Vert \boldsymbol{A} \Vert _{2} \Vert \boldsymbol{A}^{+} \Vert _{2}$, where $\boldsymbol{A}^{+}$ is the pseudo-inverse of A. Its equivalent condition number may be defined as the ratio of the greatest to the second smallest singular values, ${\kappa }(\boldsymbol{A}) = \sqrt {d_{1}/d_{8}}$, for $\boldsymbol{A}^{T}\boldsymbol{A} = \boldsymbol{U} \mathrm{diag}(d_{1},d_{2},\ldots,d_{8},d_{9})\boldsymbol{U}^{T}$. It has been reported in the literature [2, 9, 21, 22] that the unsatisfactory performance of the eight-point algorithm is mainly due to the worse numerical conditioning of the coefficient matrix A. In fact, the condition number $k(\boldsymbol{A})$ is extremely large, leading to two least eigenvalues relatively close to one another, and causing their corresponding eigenvectors to be mixed up and indistinguishable. As a result, a negligible perturbation of the matrix entries tends to cause a significant change in the smallest eigenvector, since it may fall anywhere in the proximity to the eigensubspace spanned by the similar eigenvectors associated with those virtual degenerate eigenvalues [21]. It has been found that proper selection of normalization to the input image coordinates results in better numerical conditioning when carrying out linear DLT computation, and that the improved numerical conditioning provides with the smallest eigenvector of $\hat{\boldsymbol{A}}$ far less susceptible to interference [2, 22]. From this point, a natural question arises: Can we achieve the ultimate optimal condition number $k(\hat{\boldsymbol{A}})=1$? Below we figure out that the condition number of the transformed coefficient matrix cannot reach the optimum of 1. A follow-up question must be: Can we have a better normalization transformation? This paper provides a positive answer in the next section. We develop a self-supervised CNN-based technique that learns the convolutional neural network weights based on a geometric loss function. It requires no ground truth labeling but has shown highly improved performance in various experiments.

Proposition 1

There is no pair of normalization matrices $\boldsymbol{T}'$ and T that results in $k(\hat{\boldsymbol{A}}) = 1$.

Proof

(proof by contradiction) For the full row rank matrix A, there must be an invertible matrix $\boldsymbol{P} = [p_{ij}] $ such that $\boldsymbol{A}=\boldsymbol{P}^{-1}\boldsymbol{Q}$ holds, where the matrix $\boldsymbol{Q} \in \mathbb{R}^{8\times 9}$ also has full row rank [29]. Moreover, one can assume that each row of Q represents a standard orthonormal basis of the 9-dimensional subspace, which is easily achieved by matrix decomposition [29], such as Gram–Schmidt orthogonalization, QR decomposition, and SVD decomposition.

The condition number ${\kappa (\hat{\boldsymbol{A}}) =1}$ if and only if $\hat{\boldsymbol{A}}\hat{\boldsymbol{A}}^{T} = c \boldsymbol{I}$, where c is a non-zero positive constant [29, 30]; this implies that the rows of $\hat{\boldsymbol{A}}$ make up eight orthogonal bases of the 9-dimensional subspace up to a fixed-length scale $\sqrt{c}$. Therefore, in order to achieve $\kappa (\hat{\boldsymbol{A}}) = 1$, the two invertible transformations $\boldsymbol{T}'$ and T should make $\hat{\boldsymbol{A}}= \boldsymbol{Q} = \boldsymbol{P} \boldsymbol{A}$ hold, i.e.,

$$\begin{aligned} {\mathrm{{mat}}}\bigl(\hat{\boldsymbol{a}}_{i}^{T} \bigr) = \sum_{j = 1}^{8} {{p_{ij}} \cdot {\mathrm{{mat}}}\bigl({\boldsymbol{a}_{j}^{T}}}\bigr),\quad i=1,\ldots,8. \end{aligned}$$

(6)

Note that ${\mathrm{{rank}}}({\mathrm{{mat}}}(\hat{\boldsymbol{a}}_{i}^{T})) = {\mathrm{{rank}}}({ \mathrm{{mat}}}(\boldsymbol{a}_{i}^{T})) = 1 = {\mathrm{{rank}}}(\hat{\boldsymbol{u}_{i}} \hat{\boldsymbol{u}}'_{i}{^{T}}) = {\mathrm{{rank}}}(\boldsymbol{u}_{i} \boldsymbol{u}'_{i}{^{T}}) $ for any $i\in [1,8] $. Except for the trivial configuration in which ${\boldsymbol{P}} = \boldsymbol{I}$, the rank of the sum on the right-hand-side must exist to be equal to 3 for any $\boldsymbol{T}'$ and T (e.g. given in Eq. (3)); so Eq. (6) cannot be established. That is, there are no normalization matrices $\boldsymbol{T}'$ and T to make ${\kappa (\hat{\boldsymbol{A}}) =1}$ tenable. □

4 Learning-based normalization with self-supervised CNNs

This section develops a machine learning model that produces T and $\boldsymbol{T}'$, the two data normalization matrices, which result in a better estimation of F than Hartley’s normalization for eight input correspondences. As discussed in Sect. 3, the estimation of the fundamental matrix has two main steps. First, the input image coordinates are normalized by T and $\boldsymbol{T}'$ to construct the data matrix $\hat{\boldsymbol{A}}$, and the solution $\hat{\boldsymbol{f}}$ is obtained. Second, $\hat{\boldsymbol{F}}$ is reconstructed by enforcing the singularity constraint. The following are two observations regarding this estimation process:

(1)
The goal of Hartley’s normalization is to achieve a better computation of $\hat{\boldsymbol{f}}$. However, this does not guarantee the singularity condition $\mathrm{det}(\mathrm{mat}(\hat{\boldsymbol{f}})) = 0$, which is why the singularity constraint enforcement (SCE) is necessary.
(2)
There are cases where enforcing the singularity ($\hat{\boldsymbol{F}} = \hat{\boldsymbol{U}}\mathrm{diag}(r_{1},r_{2},0) \hat{\boldsymbol{V}}^{T}$) brings about large nonlinear projection errors and leads to an unsatisfactory estimation of $\hat{\boldsymbol{F}}$. This happens especially when $\rho =r_{2}/r_{3}$ is not large enough.

It is evident that the singularity constraint should be considered at the same time as well as the numerical conditioning when the normalization matrices T and $\boldsymbol{T}'$ are prepared, which implies the existence of better normalization schemes.

Our approach adopts a CNN-based model and a self-supervised learning algorithm to train it. The model outputs the parameters of the normalization matrices when eight input correspondences are provided as input. Following the conjecture of the affine structure of the normalization matrix proposed in Ref. [10], the normalization matrix is designed here to have two more parameters than Hartley’s normalization:

$$\begin{aligned} \boldsymbol{T}_{L} ={}& \begin{bmatrix} \alpha _{1} & \\ & \alpha _{2} & \\ & & 1 \end{bmatrix} \begin{bmatrix} \cos \theta & -\sin \theta & \\ \sin \theta & \cos \theta & \\ & & 1 \end{bmatrix} \\ &{}\times \begin{bmatrix} 1 & & -o_{1} \\ & 1 & -o_{2} \\ & & 1 \end{bmatrix} , \end{aligned}$$

(7)

which can characterize the data distribution better and enable more general normalization schemes to be implemented by CNNs. Nevertheless, how to robustly determine three normalization parameters (especially $\alpha _{1}$, $\alpha _{2}$, and θ) has always been a difficult problem. Note that, after Hartley’s seminal solution [2], there has been no substantial progress in designing hand-crafted normalization strategies. In contrast, we try to extend Hartley’s normalization and develop a deep solution for normalization. The performance of the CNN model for this parametrization is evaluated and visualized through various experiments in Sect. 5. The overall computation pipeline of our framework is illustrated in Fig. 2.

4.1 Self-supervised learning for normalization

Network architecture. The overall network architecture is illustrated in Fig. 3. We adopt the structure of 12 consecutive ResNet blocks as the first stage of the CNN network, which is consistent with the classic two-view geometry estimation networks [12, 25]. The eight input points $\boldsymbol{u}'$ or u are first processed by multi-layer perceptrons of 128 neurons sharing weights [25] between correspondences. Then, the 128-dimensional features for each correspondence are transmitted as output through 12-layer ResNet blocks [25, 31]. The integration of global information is performed by weight-sharing operations between different correspondences, followed by instance normalization [32] after each layer. Max-pooling and instance normalization are applied to each layer of the 12-layer ResNet blocks, namely, the input of the first ResNet block and the output of each of the next 12 ResNet blocks, to extract 13 global features of $1\times 128$ dimensions, respectively. This process enables the CNN layer to maintain the permutation invariance and fix the size of the global feature maps. Then, 13 feature maps are concatenated and delivered to the two-dimensional (2D) convolutional layer, which consists of eight channels, $3\times 3$ square kernels, and unequal strides with four in the column and one in the row. The output of the 2D convolution is then passed through two fully-connected layers each with a dimension of 256, followed by ReLU. Finally, three-parameter estimation corresponding to $\boldsymbol{u}'$ or u is regressed. Note that our network supports the input of more than eight correspondences and this flexibility is mainly due to the max-pooling and instance normalization design, which is valuable in practice.

Our network is inspired by 3DRegNet [33] but has significant differences in architecture design: we utilize weight sharing for point correspondences, instance normalization module for better performance, and fewer parameters in 2D convolution. Specifically, compared to the representative two-view geometry estimation methods [12, 25], our network is invariant to the permutation of the correspondences.

Self-supervised learning. In order to train our model through self-supervised learning, the outputs obtained from the CNN model are leveraged to construct the normalization matrices T and $\boldsymbol{T}'$, and are fed into the next module performing (1) the data scaling, (2) DLT to compute $\hat{\boldsymbol{f}}$, and (3) SVD to compute singularity constrained $\hat{\boldsymbol{F}}$. Finally, the output F is evaluated using the loss function chosen to be the symmetry epipolar distance [34]:

$$\begin{aligned} & \mathcal{L} \bigl( {\boldsymbol{F}}; \boldsymbol{u}_{i}, \boldsymbol{u}_{i}'\bigr) \\ &\quad= { \bigl\vert { \boldsymbol{u}'_{i}{^{T}}} \boldsymbol{F} \boldsymbol{u}_{i} \bigr\vert } { \biggl( { \frac{1}{{ \Vert ({\boldsymbol{F}{^{T}} \boldsymbol{u}'_{i}})^{\mathrm{{(1:2)}}} \Vert _{2}}} + \frac{1}{{ \Vert ({\boldsymbol{F} \boldsymbol{u}_{i}})^{\mathrm{{(1:2)}}} \Vert _{2}}}} \biggr)}. \end{aligned}$$

(8)

We tested several variants of distance functions including the Sampson distance and algebraic distance, and decided to use the symmetry epipolar distance, because it showed superior results in the experiments. Interestingly, these findings contrast with the findings of Ref. [34].

By training through minimizing the loss function, we can train the network without any ground truth data at all, contrary to Ref. [33] or Ref. [12]; the network achieves self-supervisory in the geometric sense. It also enables us to exploit a very large number of frames from video sequence datasets under various kinds of camera motion.

Addressing the ordering invariance. Our network model is designed to be invariant to the order of the input image points similar to Ref. [35] or Ref. [33], thereby obtaining invariance in the subsequent fundamental matrix computation.

Proposition 2

As long as the computation of the normalization matrices $\boldsymbol{T}'$ and T has permutation invariance, then so has the computation of the fundamental matrix.

Proof

Because $\boldsymbol{T}'$ and T maintain invariant for any order of the input data $\boldsymbol{u}'$ and u, the resulting $\hat{\boldsymbol{u}}'$ and $\hat{\boldsymbol{u}}$ hold the same order as $\boldsymbol{u}'$ and u after normalization; this is equivalent to performing a row transformation on the transformed coefficient matrix $\hat{{\boldsymbol{A}}}$ in Eq. (5) for different orders of $\boldsymbol{u}'$ and u. However, when the row transformation is made to $\hat{{\boldsymbol{A}}}$, the right singular vector corresponding to the smallest singular value of $\hat{{\boldsymbol{A}}}$ does not change [29], i.e. the estimation of $\hat{{\boldsymbol{F}}}$ is not affected. Furthermore, the final fundamental matrix F also has permutation invariance.

Training procedure. The network is implemented in PyTorch. We adopt the Adamax Optimizer [36] with an initial learning rate of 10⁻³ and a decreasing learning rate of 0.8 times per 10 epochs. The chosen batch size is 16 and the network is trained for 150 epochs. Each input set is pre-filtered by the residual based on the original eight-point algorithm with a threshold (60 pixels) sufficiently large to enhance the stability of the training process.

5 Experimental results

To prove that our approach can learn normalization matrices adapted to the input data and obtain more accurate fundamental matrix estimations, we benchmark the performance of our approach on three typical datasets with varying regularity. Furthermore, we perform cross-dataset validation to prove the generalizability of our approach.

5.1 Datasets

KITTI dataset. The KITTI odometry dataset [37] consists of 22 distinct sequences from a car driving around a residential area. This dataset exhibits dominant forward motion with high regularity but shows difficult data associations. We choose the first 11 sequences with ground truth from GPS and a Velodyne LiDAR. Specifically, we employ sequences “00” to “05” for training and sequences “06” to “10” for testing in our experiment, which enables a fair comparison with recent state-of-the-art methods [12].

TUM dataset. We use the indoor sequences from the TUM RGB-D dataset [38], which contains several hand-held sequences with ground truth obtained by an additional motion capture system. This dataset reflects rich camera motion and scene geometry, and shows the most general cases for fundamental matrix estimation. We exploit the cross-validation for the sequence “fr3_long_office” during training. To better test the generalizability of the proposed method, we resize the image size of the TUM RGB-D dataset to be consistent with that of the KITTI dataset.

Cambridge dataset. The Cambridge dataset [39] is a large-scale outdoor urban localization setting, containing six challenging scenes with changes in perspective and illumination; this setting is quite different from TUM and KITTI datasets. Here we adopt the “St Mary’s Church” scene to evaluate the generalizability of our proposed approach, and report only the qualitative results in the following section.

We generate two different correspondence datasets for each of the KITTI dataset and the TUM dataset, which are stored in a manner similar to that used in Ref. [40]. First, 1000 correspondences based on SIFT [41] are pre-filtered by employing a ratio test with a threshold of 0.8. The second one does not leverage the ratio test to pre-filter the correspondences, which generates a challenging dataset with high noise. The ratio test is a frequently used strategy for improving the robustness and accuracy of feature matching. Therefore, unless otherwise stated, we utilize pre-filtered datasets in our experiments. Moreover, each input sample is generated by shuffling all the correspondences between two views in the dataset.

5.2 Evaluation protocols

To evaluate the performance of our approach, we report the average better rate per input sample, i.e., the average percentage that our learning-based normalization outperforms Hartley’s normalization in terms of the symmetric epipolar distance (see Eq. (8)). Besides, in the experiments within the RANSAC framework, we evaluate the average percentage of inliers (correspondences with errors less than 1 pixel or 0.1 pixels), as well as the F1 (the average percentage of correspondences below 1 pixel error with respect to the ground truth epipolar line).

5.3 Experimental evaluations

In the first experiment, we evaluate the performance of our approach per input sample. We first optimize F based on Eq. (8) under singularity constraints for Hartley’s normalization and our learning-based normalization in the KITTI test set, and the results are summarized in Fig. 4(a). The equivalence between our approach and Hartley-based optimization result is reported, which indicates that our approach can provide better initial values for more sophisticated nonlinear optimization methods. Unlike the constant $\sqrt{2}$ distance from the origin in Hartley’s normalization, Fig. 5 shows that our learning-based normalization predicts a distance tailored to each input data, which exploits the inherent regularity of the input data.

Then, we quantitatively evaluate the average rate of improvement for each input sample, which is our primary concern. Since Hartley’s normalization is the most widely-used normalization method [34], we only compare with it here. As presented in Table 1, our learning-based normalization outperforms Hartley’s normalization for each input sample. Interestingly, the model trained on the KITTI dataset is generalizable well to the TUM dataset, and vice versa, which shows the generalizability of our approach. To further analyze the impact of training sets on our approach, we provide experimental results by evaluating the average percentage of each input sample when using KITTI and TUM datasets jointly as training sets. The performance of our approach is further improved for each input sample, which shows that our approach can learn a better and more generalized normalization scheme from more training data that contains diverse regularities. Finally, in Fig. 4(b), we report the distribution of the symmetric epipolar distance for the original eight-point algorithm, with Hartley’s normalization, and with our learning-based normalization. While both have achieved great improvements with respect to the un-normalization version, our learning-based normalization consistently outperforms Hartley’s normalization in achieving lower errors for eight input correspondences.

Table 1 Results of the average improvement rate of per input sample in diverse training sets. Our approach not only takes into account the inherent regularity of the input data but also learns a better and more generalized normalization scheme

Full size table

From the superior performance of our learning-based normalization algorithm over each input sample, we further heuristically verify that our approach can be effectively integrated into the traditional RANSAC framework [45]. In the experimental comparison, we follow the most related and classic work [12]. We compare our approach with the least median of squares (LMEDS) [43], MLESAC [42], USAC [44], Ranftl’s method [12] and RANSAC [45], where RANSAC is based on Hartley’s normalization while our approach is performed with the learning-based normalization. Note that USAC is a state-of-the-art robust estimation framework, and “RANSAC + normalized eight-point algorithm” represents the gold standard [34] for geometric tasks such as visual odometry and SLAM. Inside Ranftl’s method [12], the matching scores have been used as additional information to guide the estimation, which can result in an obvious improvement in average accuracy. By contrast, we leverage only the original RANSAC to conduct experiments for performance evaluation. It is also worth noting that as a supervised learning-based framework, Ranftl’s method requires ground truth correspondences in training, while our approach is fully self-supervised. Additionally, designing an ensemble network to improve overall performance such as DSAC [18] is outside the scope of this paper, as our focus is better normalization for each sample.

Table 2 summarizes the results on the KITTI dataset. Within the RANSAC framework, our learning-based normalization performs on par with Hartley’s normalization on the KITTI benchmark. Furthermore, we evaluate the performance based on the challenging testing set without the ratio test, and the results are presented in Table 3. Note that our approach achieves higher inliers on the TUM dataset. We remark here that, recent analyses in Refs. [46, 47] as well as related experiments in Ref. [48] indicate that the RANSAC paradigms with supporting heuristics can only increase the chance of finding the final good solution and are not completely governed by the internal solver, which is one possible reason for the slight improvement of our method when it is embedded into RANSAC. Overall, the effectiveness of our learning-based normalization method combined with RANSAC is demonstrated.

Table 2 Results on the KITTI test set using the ratio test at different inlier thresholds

Full size table

Table 3 Performance of the proposed method in combination with RANSAC in the test set without the ratio test

Full size table

Finally, we directly employ the network model trained on the KITTI dataset, which is very different from the Cambridge dataset. The qualitative generalization results for the Cambridge dataset are reported in Fig. 6. One can see that our approach can achieve an accurate two-view fundamental matrix estimation, which reflects the good generalizability of our approach. Moreover, since we always centralize the correspondences first, varying image sizes and distributions of features will not have a significant impact on the final results. Currently, our forward propagation time is approximately 5 times that of Hartley’s normalization due to the use of 12-layer ResNet architectures. Fortunately, these efficiency sacrifices can improve normalization to achieve more accurate epipolar geometry for each sample.

Influence of the number of correspondences. We perform additional experiments to analyze the influence of the number of correspondences in the input. We take the median of 1000 trials based on a random testing image. The results are shown in Fig. 7, which indicate that better fundamental matrices can be obtained with an increasing number of correspondences.

Condition numbers. We conduct another experiment to compare the condition numbers in solving the fundamental matrix and the results are reported in Fig. 8. We observe that better numerical conditioning of the transformed coefficient matrix can be obtained by our learning-based normalization, which is one of the keys to our upgraded performance.

Nonlinear projection. The singularity of $\hat{\boldsymbol{F}}$ is evaluated by calculating $\rho =r_{2}/r_{3}$ for every 100 consecutive frames of the KITTI test set. The results are displayed in Fig. 9, which shows our learning-based approach is able to achieve smaller nonlinear projection errors. These findings also verify our argument that the condition number of the transformed coefficient matrix via a better normalization will be more conducive to imposing the singularity constraint on the resulting fundamental matrix. Note that these experimental results all highlight the superiority of our learning-based normalization approach.

6 Conclusion

In this paper, we revisit the classic two-view geometry computation with eight point correspondences and employ CNNs to provide a novel perspective for better normalization. First, we present that the ideal condition number can be obtained by our approach to be more consistent with the following singularity constraint enforcement step. Second, we propose a self-supervised deep neural network to learn a robust normalization scheme for more accurate fundamental matrix estimation. Our approach enables a data-driven estimation pipeline to perform interpretable and generalized fundamental matrix estimation. Our learning-based normalization solution is superior to Hartley’s normalization for each input sample, and is comparable to Hartley’s normalization when integrated with RANSAC. Its potential advantage is to provide better initial values for non-linear optimization and to afford better interpretability for an ensemble network. In the future, we plan to design a lightweight network to weigh time and quality, utilize ground truth correspondences or ground truth matching scores to explore supervised two-view geometry estimation, and further extend our deep solution to other multi-view geometry problems such as triangulation and trifocal tensor estimation.

Data availability

The datasets generated and/or analyzed during the current study are available in the KITTI repository: https://www.cvlibs.net/datasets/kitti/eval_odometry.php.

Abbreviations

2D:: two-dimensional
3D:: three-dimensional
CNN:: convolutional neural network
DLT:: direct linear transformation
RANSAC:: random sample consensus
SCE:: singularity constraint enforcement
SfM:: structure from motion
SLAM:: simultaneous localization and mapping
SVD:: singular value decomposition

References

Longuet-Higgins, H. C. (1981). A computer algorithm for reconstructing a scene from two projections. Nature, 293(5828), 133–135.
Article ADS Google Scholar
Hartley, R. (1997). In defense of the eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580–593.
Article Google Scholar
Dai, Y., Li, H., & Kneip, L. (2016). Rolling shutter camera relative pose: Generalized epipolar geometry. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4132–4140). Piscataway: IEEE.
Google Scholar
Zhao, C., Fan, B., Hu, J., Pan, Q., & Xu, Z. (2021). Homography-based camera pose estimation with known gravity direction for UAV navigation. Science China. Information Sciences, 64(1), 1–13.
Article ADS Google Scholar
Szpak, Z. L., Chojnacki, W., & van den Hengel, A. (2015). Guaranteed ellipse fitting with a confidence region and an uncertainty measure for centre, axes, and orientation. Journal of Mathematical Imaging and Vision, 52(2), 173–199.
Article MathSciNet Google Scholar
Zhang, L., & Koch, R. (2014). Structure and motion from line correspondences: Representation, projection, initialization and sparse bundle adjustment. Journal of Visual Communication and Image Representation, 25(5), 904–915.
Article Google Scholar
Mühlich, M., & Mester, R. (2001). Subspace methods and equilibration in computer vision. In Proceedings of the 12th Scandinavian conference on image analysis (pp. 415–422). Cham: Springer.
Google Scholar
Mair, E., Suppa, M., & Burschka, D. (2013). Error propagation in monocular navigation for Z_∞ compared to eightpoint algorithm. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 4220–4227). Piscataway: IEEE.
Google Scholar
da Silveira, T. L., & Jung, C. R. (2019). Perturbation analysis of the 8-Point algorithm: A case study for wide FoV cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11757–11766). Piscataway: IEEE.
Google Scholar
Mühlich, M., & Mester, R. (1998). The role of total least squares in motion analysis. In Proceedings of the 5th European conference on computer vision (pp. 305–321). Cham: Springer.
Google Scholar
DeTone, D., Malisiewicz, T., & Rabinovich, A. (2016). Deep image homography estimation. ArXiv preprint. arXiv:1606.03798.
Ranftl, R., & Koltun, V. (2018). Deep fundamental matrix estimation. In Proceedings of the 15th European conference on computer vision (pp. 284–299). Cham: Springer.
Google Scholar
Tang, C., & Tan, P. (2018). BA-Net: Dense bundle adjustment network. In Proceedings of the 6th international conference on learning representations (pp. 284–299). Retrieved October 7, 2023, from https://openreview.net/forum?id=B1gabhRcYX.
Sunghoon, I., Hae-Gon, J., Stephen, L., & In, S. K. (2019). DPSNet: End-to-end deep plane sweep stereo. [Poster presentation]. In Proceedings of the 7th international conference on learning representations, New Orleans, USA.
Google Scholar
Fan, B., Wang, K., Dai, Y., & He, M. (2021). RS-DPSNet: Deep plane sweep network for rolling shutter stereo images. IEEE Signal Processing Letters, 28, 1550–1554.
Article ADS Google Scholar
Fan, B., Dai, Y., & He, M. (2023). Rolling shutter camera: Modeling, optimization and learning. Machine Intelligence Research, 20(6), 783–798.
Article Google Scholar
Fan, B., Dai, Y., & Li, H. (2022). Rolling shutter inversion: Bring rolling shutter images to high framerate global shutter video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 6214–6230.
Google Scholar
Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., & Gumhold, S. (2017). DSAC-differentiable RANSAC for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6684–6692). Piscataway: IEEE.
Google Scholar
Csurka, G., Zeller, C., Zhang, Z., & Faugeras, O. D. (1997). Characterizing the uncertainty of the fundamental matrix. Computer Vision and Image Understanding, 68(1), 18–36.
Article Google Scholar
Sur, F., Noury, N., & Berger, M.-O. (2008). Computing the uncertainty of the 8 point algorithm for fundamental matrix estimation. In Proceedings of the British machine vision conference (pp. 965–974). Swansea: BMVA Press.
Google Scholar
Chojnacki, W., & Brooks, M. J. (2003). Revisiting Hartley’s normalized eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1172–1177.
Article Google Scholar
Chojnacki, W., & Brooks, M. J. (2007). On the consistency of the normalized eight-point algorithm. Journal of Mathematical Imaging and Vision, 28(1), 19–27.
Article MathSciNet Google Scholar
Nguyen, T., Chen, S. W., Shivakumar, S. S., Taylor, C. J., & Kumar, V. (2018). Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics and Automation Letters, 3(3), 2346–2353.
Article Google Scholar
Poursaeed, O., Yang, G., Prakash, A., Fang, Q., Jiang, H., & Hariharan, B. (2018). Deep fundamental matrix estimation without correspondences. In Proceedings of the 15th European conference on computer vision (pp. 485–497). Cham: Springer.
Google Scholar
Yi, K. M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., & Fua, P. (2018). Learning to find good correspondences. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2666–2674). Piscataway: IEEE.
Google Scholar
Probst, T., Paudel, D. P., Chhatkuli, A., & van Gool, L. (2019). Unsupervised learning of consensus maximization for 3D vision problems. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 929–938). Piscataway: IEEE.
Google Scholar
Zhang, Z., Dai, Y., Fan, B., Sun, J., & He, M. (2022). Learning a task-specific descriptor for robust matching of 3D point clouds. IEEE Transactions on Circuits and Systems for Video Technology, 32(12), 8462–8475.
Article Google Scholar
Zhang, Z., Sun, J., Dai, Y., Fan, B., & Liu, Q. (2022). Searching dense point correspondences via permutation matrix learning. IEEE Signal Processing Letters, 29, 1192–1196.
Article ADS Google Scholar
Horn, R. A., & Johnson, C. R. (2012). Matrix analysis. Cambridge: Cambridge University Press.
Book Google Scholar
Chen, D. (1986). Some conclusions on condition numbers of matrix. Journal of East China Normal University (Natural Science), 3(2), 11–18.
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). Piscataway: IEEE.
Google Scholar
Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2017). Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6924–6932). Piscataway: IEEE.
Google Scholar
Pais, G. D., Ramalingam, S., Govindu, V. M., Nascimento, J. C., Chellappa, R., & Miraldo, P. (2020). 3DRegNet: A deep neural network for 3D point registration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7193–7203). Piscataway: IEEE.
Google Scholar
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
Google Scholar
Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652–660). Piscataway: IEEE.
Google Scholar
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. [Poster presentation]. In Proceedings of the 3th international conference on learning representations, San Diego, USA.
Google Scholar
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3354–3361). Piscataway: IEEE.
Google Scholar
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 573–580). Piscataway: IEEE.
Google Scholar
Kendall, A., Grimes, M., & Cipolla, R. (2015). PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In Proceedings of the IEEE international conference on computer vision (pp. 2938–2946). Piscataway: IEEE.
Google Scholar
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3061–3070). Piscataway: IEEE.
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Torr, P. H., & Zisserman, A. (2000). MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding, 78(1), 138–156.
Article Google Scholar
Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American Statistical Association, 79(388), 871–880.
Article MathSciNet Google Scholar
Raguram, R., Chum, O., Pollefeys, M., Matas, J., & Frahm, J.-M. (2012). USAC: A universal framework for random sample consensus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 2022–2038.
Article Google Scholar
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Article MathSciNet Google Scholar
Chin, T.-J., Cai, Z., & Neumann, F. (2018). Robust fitting in computer vision: Easy or hard? In Proceedings of the 15th European conference on computer vision (pp. 701–716). Cham: Springer.
Google Scholar
Chin, T.-J., Suter, D., Ch’ng, S.-F., & Quach, J. (2020). Quantum robust fitting. In Proceedings of the 15th Asian conference on computer vision (pp. 485–499). Cham: Springer.
Google Scholar
Ding, Y., Yang, J., Ponce, J., & Kong, H. (2020). Minimal solutions to relative pose estimation from two views sharing a common direction with unknown focal length. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7045–7053). Piscataway: IEEE.
Google Scholar

Download references

Acknowledgements

We want to thank Jihuang Dai and Xiang Guo for investigating relevant literature.

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 62271410) and the National Postdoctoral Innovative Talent Program, China (No. BX20230013).

Author information

Authors and Affiliations

School of Electronics and Information, Northwestern Polytechnical University and Shaanxi Key Laboratory of Information Acquisition and Processing, Xi’an, 710129, China
Bin Fan, Yuchao Dai & Mingyi He
Department of Art and Technology, Sogang University, Seoul, 04107, South Korea
Yongduek Seo

Authors

Bin Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yuchao Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yongduek Seo
View author publications
You can also search for this author in PubMed Google Scholar
Mingyi He
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, theoretical derivation and analysis were performed by BF and YD. The first draft of the manuscript was written by BF and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yuchao Dai.

Ethics declarations

Competing interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fan, B., Dai, Y., Seo, Y. et al. A revisit of the normalized eight-point algorithm and a self-supervised deep solution. Vis. Intell. 2, 3 (2024). https://doi.org/10.1007/s44267-024-00035-0

Download citation

Received: 20 September 2023
Revised: 15 January 2024
Accepted: 17 January 2024
Published: 18 February 2024
DOI: https://doi.org/10.1007/s44267-024-00035-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A revisit of the normalized eight-point algorithm and a self-supervised deep solution

Abstract

Similar content being viewed by others

Deep Fundamental Matrix Estimation Without Correspondences