Similar image matching via global topology consensus

Chen, Qingqing; Yao, Junfeng; Long, Junyi

doi:10.1007/s00371-023-02824-y

Similar image matching via global topology consensus

Original article
Open access
Published: 30 March 2023

Volume 40, pages 937–952, (2024)
Cite this article

Download PDF

You have full access to this open access article

The Visual Computer Aims and scope Submit manuscript

Similar image matching via global topology consensus

Download PDF

1234 Accesses
1 Altmetric
Explore all metrics

Abstract

Recovering three-dimensional structure from images is one of the important researches in computer vision. The quality of feature matching is one of the keys to obtaining more accurate results. However, as different objects or different surfaces of objects have similar images with the same elements and different typography, the camera pose estimation will be wrong and the task will fail. This paper proposes a new mismatch elimination algorithm based on global topology consistency. We first formulate the matching task as a mathematical model based on the global constraints, then convert the feature matching into grid matching, calculate the confidence of the grids according to the changes in the angle and displacement between correspondence grid vectors, and remove the mismatches with low confidence. The experiments have demonstrated that our proposed method performs better than the state-of-the-art feature matching methods to accomplish outlier match rejection in the task of similar image matching and could be helped to obtain the correct camera pose to reconstruct more complete and more accurate object models.

LSD-SLAM: Large-Scale Direct Monocular SLAM

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Article 15 June 2024

Fast Global Registration

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Multi-view 3D(three-dimensional) reconstruction first takes a series of pictures from different perspectives around an object or scene, estimates camera pose through the correct feature matching result between image pairs, and then reconstructs the 3D model after the 3D point cloud reconstruction and texture mapping [1]. As one of the keys to the whole process, image feature matching has a significant impact on the integrity and accuracy of the reconstructed model. It extracts feature points from image pairs, calculates the similarity between feature points, and establishes reliable correspondences between images. This technique could be widely used, such as in image registration [2], 3D reconstruction [3], target recognition [4], and visual simultaneous localization and mapping (VSLAM) [5].

To obtain complete and accurate scene estimation results, the commonly used method is using triangulation to estimate camera pose and scene structure, which requires the correct feature matching results as the basis. The reliable correspondence between two images can be obtained by removing outlier matches from the putative match sets created by some algorithms. During the past few decades, many approaches have been proposed to solve the outlier elimination problem.

However, when the images taken from different scenes or different surfaces of an object are similar, but their global structures of the detected and matched feature points are different, for example, the basic pattern elements in the images are the same, but the elements are typed differently or symmetrically, they should be considered as not belonging to the same scene or the same object, but the mismatches with locally similar features will be easily mistaken as correct, so the accuracy of the matching result is significantly reduced. When the feature points are matched correctly, we can recover the camera motion between two images from the correspondence of these 2D image points. This is done by solving the essential or basis matrix using the pairwise polar geometry method based on the pixel positions of the paired points. Then, we can solve the relative rotation matrix R and the relative translation matrix T between the cameras. If the paired points are matched incorrectly, it will lead to incorrect calculations of the rotation and translation parameters of the inter-camera motion, and the pose estimation of the cameras will be wrong. The incorrect estimation of the extrinsic parameters will lead to the next 3D coordinates of the spatial points calculated using triangulation error, and the task of recovering scene structure will be failed.

To address the above challenges, this paper proposes an efficient similar-pattern-oriented feature matching filtering algorithm (SMF). The SMF algorithm runs after an initial feature matching algorithm, i.e., GMS (Grid-based Motion Statistics) [6], LPM (Locality Preserving Matching) [7], etc., and further judges differences or symmetries of pattern structures between image pairs. An observable fact is that if we take images around the same scene, not only the feature points neighborhood has local topological consistency, but the global feature point structures should also have global topological consistency. Based on the observation, we design a model that constraints the unknown inlier correspondences to have similar global topology. The experiment on mismatch removal shows that our SMF can effectively get more accurate inlier matches by identifying the mismatches caused by only considering the local matching consistency and can be used to obtain more accurate camera poses. The experiment also shows that the reconstructed 3D models are more completed and more accurate by using our SMF to remove mismatches. The main contributions of this paper are as follows:

It adopts grid matching based on feature distribution, which reduces the time complexity of subsequent feature point vector matching and can be used in real-time tasks.
It proposes a matching confidence calculation method based on global topology consistency, which can well determine whether there are mismatches between the image pairs caused by the different layouts of the pattern elements, and can increase the accuracy of feature matching results.
The experiments show that the algorithm SMF proposed in this paper can effectively remove wrong matches between similar but different images and is helpful to obtain more accurate and more completed camera pose and 3D reconstruction results.

Section 2 describes the related work. The proposed algorithm SFM is presented in Sect. 3, including the consensus on global topology, the grid strategy, and the confidence calculator. The performance of our method in comparison with other approaches is illustrated in Sect. 4, and the concluding remark is in Sect. 5.

2 Related works

The existing image matching methods can be classified into three categories: area-based, feature-based, and learning-based [8]. Area-based methods [9,10,11] usually do not detect features and typically refer to dense matching. Learning-based methods [12,13,14] can achieve better performance in some cases, but they still need subsequent mismatch removal steps because of the high percentages of the outliers in the putative sets. Feature-based methods first extract features and local descriptors of images and use direct or indirect ways to find out the correspondences.

Direct feature matching methods use spatial geometrical relations to get the correspondences [15, 16]. Indirect feature matching methods establish putative matches and remove false matches to establish reliable correspondences. The putative match set obtained by judging the similarity of descriptors still includes a large percentage of mismatches. The removal methods use extra local and/or geometrical constraints. They can be roughly divided into three categories: resampling-based methods, non-parametric model-based methods, and relaxed methods.

The resampling methods start with the classic RANSAC proposed by Fischler et al. [17]. It sampled a minimal subset iteratively, estimated the fundamental matrix model, and obtained the consistent inliers by verifying the quality of the set's number. Many approaches have been developed to improve RANSAC [18,19,20,21]. The shortcomings of these resampling methods are the runtime will exponentially increase when the outlier percentage in the putative set is high, and the estimated parametric model is less-efficient undergoing non-rigid transformations which are more complex.

The non-parametric model-based methods are been developed to divorced from the resampling methods, including ICF [22], BD [23], VFC [24], MR-RPM [25, 26]. They distinguish the mismatches by applying the prior conditions, i.e., motion coherence, in which the correspondence features have slow-and-smooth motion. These methods can be applied to non-rigid transformations, but are not applicable to real-time tasks because of cubic complexity.

The relaxed methods use coherence constraints and local neighborhood consistency. BIAN et al. proposed GMS [6], which is based on the assumption that the number of correct matching points near the correctly matched features should be greater than the number near the false matched ones. It transforms motion smoothness constraints into statistics measures to reject false matches and proposes a score estimator base on grid and performances well in tasks requiring real-time feature matching. LPM [7] and ANTC [27] observed that due to the physical constraints, the local topological structures are usually kept even though the absolute distance between the matches changes a lot under complex deformations. GLOF [28] detected outliers in a small neighborhood by using the local density reachability. However, when the small pattern elements between the image pairs that are taken from different scenes are the same, but they are arranged in a different way, or even nearly symmetrical, which means the feature structure has changed, the relaxed methods only do match detection for local extreme points, and cannot realize the differences of the global topological structures of patterns, so the mismatches between image elements of different typeset will be considered as true while they should be recognized as false.

3 Method

The proposed algorithm SMF can be used to establish the accurate correspondences between similar but different images. We first construct putative match sets that have local consistency by using a certain matching algorithm and then use the global constraint to remove the false matches.

3.1 Problem formulation

Image pairs $\{ I_{a} ,I_{b} \}$ have ${\text{\{ }}A,B{\text{\} }}$ features, respectively. Suppose we have obtained a putative match set $S = {\text{\{ }}(x_{1} ,y_{1} ),(x_{2} ,y_{2} ),...,(x_{i} ,y_{i} ),...,(x_{T} ,y_{T} )\}$ between $I_{a}$ and $I_{b}$, where $x_{i}$ and $y_{i}$ are the spatial positions of corresponding feature points extracted by using the well-known feature detector (SIFT [29], SURF [30], or ORB [31]), and the matches in $S$ satisfy the local constraints. $S$ has cardinality $|S| = T$. The goal of this work is to remove the mismatches in $S$ caused by similar images with different pattern layouts to produce match results which are more accurate.

3.1.1 Formulation for locality preserving matching

LPM [7] has proposed that the accurate match set can be obtained by filtering out false matches that do not meet the requirements of the spatial neighborhood structure, and getting the final solution by minimizing the following cost function:

$$ C(p;S,\lambda ,\tau ) = + \lambda \sum\limits_{{i = 1}}^{N} {\frac{{p_{i} }}{K}\left( {\sum\limits_{{j|x_{j} \in {\text{N}}_{{x_{i} }} }} {{\text{d}}\left( {y_{i} ,y_{j} } \right) + \sum\limits_{{j|x_{j} \in {\text{N}}_{{x_{i} }} ,y_{j} \in {\text{N}}_{{y_{i} }} }} {{\text{d}}\left( {v_{i} ,v_{j} } \right)} } } \right)\left( {N - \sum\limits_{{i = 1}}^{N} {p_{i} } } \right)} $$

(1)

where $p_{i} \in \{ 0,1\}$ represents the correctness of the matches $(x_{i} ,y_{i} )$.$p_{i} = 1$ indicates $S_{i} = (x_{i} ,y_{i} )$ is an inlier match, and it is outlier otherwise. $N_{{x_{i} }}$ is the neighborhood set of point $x$ under Euclidean distance and $K$ is the number of $N_{{x_{i} }}$. $\lambda > 0$ balances the first and the second items. ${\text{d}}(y_{i} ,y_{j} )$ denotes the Euclidean distance metric between $y_{i}$ and $y{}_{j}$. ${\text{d}}(v_{i} ,v_{j} )$ represents the consistency of local topology according to $s(v_{i} ,v_{j} )$, which is calculated based on the differences between $v_{i}$ and ${\text{v}}{}_{j}$ by the following formulate:

$$ s(v_{i} ,v_{j} ) = \frac{{\min \{ |v_{i} |,|v_{j} |\} }}{{\max \{ |v_{i} |,|v_{j} |\} }} \cdot \frac{{(v_{i} ,v_{j} )}}{{|v_{i} | \cdot |v_{j} |}} $$

(2)

where $v_{i}$ is the vector from $x_{i}$ to $y_{i}$, $v_{j}$ is the vector from $x_{j}$ to $y_{j}$, $(.,.)$ denotes the inner product. If $s(v_{i} ,v_{j} )$ is bigger than a predefined threshold $\tau$, the neighborhood topology is consistent, and the distance ${\text{d}}(v_{i} ,v_{j} )$ can be written as follow:

$$ {\text{d}}(v_{i} ,v_{j} ) = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {0,}\, & {s(v_{i} ,v_{j} ) \ge \tau } \\ \end{array} } \\ {\begin{array}{*{20}c} {1,} \,& {s(v_{i} ,v_{j} ) < \tau } \\ \end{array} } \\ \end{array} } \right. $$

(3)

3.1.2 Formulation for global topology consistency

Equation (1) considers both the distance consistency and the topology consistency of the local neighborhood. It works well when similar image pairs have the same pattern structures. However, if the layout of pattern elements between images is different, the same elements with different relative positions in $I_{a}$ and $I_{b}$ will still be matched since they have local matching consistency, resulting in erroneous matching. We show an example in Fig. 1. Regions $x_{i}$ and $x_{j}$ in Fig. 1a and b are both matched to regions $y_{i}$ and $y_{j}$ relatively because they satisfy local distance and topology consistency. But intuitively, they should be considered as correct in Fig. 1a because they represent the same scene, while wrong in Fig. 1b because they are on different surfaces of the object. From Fig. 1c, we observe that the cosine value of the angle between vectors constructed by one feature area to another feature area in $I_{a}$ and $I_{b}$ is close to $1$ because of the similar global topology structure. The cosine value in Fig. 1d is less than $0$ because of the different global topological structures. More specifically, the global topological structures have significant influence on the similarity of $v_{i,j}^{x}$ and $v_{i,j}^{y}$.

Motivated by this idea, we define the consensus of global topology between $v_{i,j}^{x}$ and $v_{i,j}^{{\text{y}}}$ as follows:

$$ {\text{d}}(v_{i,j}^{x} ,v_{i,j}^{y} ) = \frac{{(v_{i,j}^{x} ,v_{i,j}^{y} )}}{{|v_{i,j}^{x} | \cdot |v_{i,j}^{y} |}} $$

(4)

Obviously, the consensus of global topology ${\text{d}}(v_{i,j}^{x} ,v_{i,j}^{y} ) \in [ - 1,1]$. The larger the value, the stronger the consistency. On the basis of Eq. 1 that constructs a local consistent match set $I$, we further propose a cost function for getting a global topological consistent match set $U^{*}$:

$$ U^{*} = \arg \mathop {\min }\limits_{U} C(U;I), $$

(5)

With the cost function $C$ defined as:

$$ C(U;I) = \sum\limits_{i \in U}^{{}} {\sum\limits_{{j \in U,j \notin N_{{{\text{x}}_{i} }} }}^{{}} {(1 - {\text{d}}(v_{i,j}^{x} ,v_{i,j}^{y} )} } ) $$

(6)

3.2 Grid the problem

Since the neighborhood of each point has guaranteed local consistency by using Eq. 1, if point $x_{i}$ does not satisfy global topological consistency with point $x_{j}$ in another image because of the different position, other points in the neighborhood of $x_{i}$ and other points in the neighborhood of $x_{j}$ will also not satisfy the global constraints. Therefore, we can address it with a grid approximation. This section transits the previous analysis into an efficient grid matching algorithm.

3.2.1 Construct grid matching characteristics

We construct a grid matcher to convert the distribution characteristics of feature points into grid matching characteristics.

The image pairs $I_{a}$ and $I_{b}$ are, respectively, divided into $N = 400$ non-overlapping grids, and we call the divided result as $G_{x}$ and $G_{y}$, respectively. The small grids in $G_{x}$ are marked as $G_{{x_{1} }} ,G_{{x_{2} }} ,...,G_{{x_{i} }} ,...,G_{{x_{N} }}$, and the small grids in $G_{b}$ are marked as $G_{{y_{1} }} ,G_{{y_{2} }} ,...,G_{{y_{j} }} ,...,G_{{y_{{_{N} }} }}$. If the coordinates of the points fall in the same grid area, their grid number is the same. Define the matching score of the grid pair $ G_{{x_{i} }} ,G_{{y_{j} }} )$ as follows:

$$ S(G_{{x_{i} }} ,G_{{y_{j} }} ) = |S_{k} |,\{ S_{k} |S_{k} (G_{{x_{i} }} ,G_{{y_{j} }} ) \in S\} $$

(7)

$S_{k} (G_{{x_{i} }} ,G_{{y_{j} }} )$ indicates that the feature point pairs fall in the grid numbered $G_{{x_{i} }}$ and $G_{{y_{j} }}$, respectively, $\{ S_{k} \}$ represents the set of matching pairs that satisfy this characteristic, $S(G_{{x_{i} }} ,G_{{y_{j} }} )$ represents the total number of feature matching pairs fall in $G_{{x_{i} }}$ and $G_{{y_{j} }}$.

The matching score of each grid pair is calculated by constructing a cumulative matrix of size $N{*}N$, and each matrix element is initialized to zero. If a pair of feature points falls in the grid numbered $G_{{x_{i} }}$ in $G_{x}$ and the grid numbered $G_{{y_{j} }}$ in $G_{y}$, respectively, the value of the element in the $G_{{x_{i} }}$ row and $G_{{y_{j} }}$ column of the matrix is added by one. As shown in Fig. 2a, $(x_{i} ,y_{i} )$ is the i-th pair of feature matches, and if the coordinate of $x_{i}$ belongs to the grid numbered 3 in $G_{x}$ and the coordinate of $y_{i}$ belongs to the grid numbered 4 in $G_{y}$, the (3,4) element of the cumulative matrix is added by one. Traversal of all matching pairs yields the matching statistics for each grid in $G_{x}$ with each grid in $G_{y}$ as shown in Fig. 2b.

For multiple grids in $G_{y}$ that have a matching relationship with $G_{{x_{i} }}$, non-maximization suppression is used, and only the case with the highest matching score is considered, and the grid numbered $G_{{y_{k} }}$ with the highest score is found as follows:

$$ S(G_{{x_{i} }} ,G_{{y_{k} }} ) = \max (S(G_{{x_{i} }} ,G_{{y_{1} }} ),S(G_{{x_{i} }} ,G_{{y_{2} }} ),...S(G_{{x_{i} }} ,G_{{y_{l} }} )) $$

(8)

where $l$ represents the number of grids in $G_{y}$ that have a matching relationship with $G_{{x_{i} }}$.

3.2.2 Build grid vector matching sets

We designed a grid vector strategy to transform grid matching into a motion smoothness constraint problem for grid vectors. We construct a grid vector set $V_{a}$ for any two different grids $ G_{{x_{i} }} ,G_{{x_{j} }} )$ in $G_{x}$ which has matched features inside. If there are $M$ grids in $G_{x}$ that contain non-zero matches, the cardinality of $V_{a}$ is $\frac{{M{*}(M - 1)}}{2}$. Similarly, we construct the feature grid vector set $V_{b}$ for $G_{y}$. From the above analysis, it can be seen that the cardinality of $V_{b}$ is the same as $V_{a}$.

Suppose that after non-maximal suppression, the corresponding grid of $G_{{x_{i} }}$ is $G_{{y_{i} }}$, and the corresponding grid of $G_{{x_{j} }}$ is $G_{{y_{j} }}$, then the grid vector $v_{i,j}^{x}$ in $G_{x}$ and the grid vector $v_{i,j}^{y}$ in $G_{y}$ form a grid vector pair. As shown in Fig. 3, the grid vector $V_{10,8}^{x}$ is formed by $G_{{x_{10} }}$ and $G_{{x_{8} }}$, the grid vector $V_{11,8}^{y}$ is formed by $G_{{y_{11} }}$ and $G_{{y_{8} }}$, and they construct a grid vector pair ($V_{10,8}^{x}$, $V_{11,8}^{y}$). So, we can obtain $\frac{{M{*}(M - 1)}}{2}$ grid vector pairs between $V_{a}$ and $V_{b}$.

If multiple grids in $G_{x}$ are matched to the same grid $G_{{y_{i} }}$, we divide $G_{{y_{i} }}$ into $3{*}3$ small grids and use the similar procedure to construct vector $v_{i,j}^{y}$.

3.2.3 Global topology consistency of grids

From the analysis in Sect. 3.1, we can see that if the matching result between two images is correct, the corresponding grid vectors should have global topological consistency and should be close to each other. Note that the problem we are going to solve is the mismatch caused by different pattern layouts. Since the neighborhood of each pattern element between image pairs satisfies local consistency, we need to find out the sets that do not satisfy global topological consistency and remove them from the putative matches. In Sect. 3.1 we discussed the global topology between points, now we use the grid approximation and Eq. 6 in Sec 3.1 becomes as follows:

$$ C({\rm P};I) = \sum\limits_{{i|G_{i} \in {\rm P}}}^{{}} {\sum\limits_{{j|G_{j} \in {\rm P},G_{j} \ne G_{i} }}^{{}} {\left( {1 - {\text{d}}\left( {V_{{G_{i} ,G_{j} }}^{x} - V_{{G_{i} ,G_{j} }}^{y} } \right)} \right)} } $$

(9)

where P is the correct grid set that has global topology consistency, and all the matches contained in P will be preserved. $V_{{G_{i} ,G_{j} }}^{x}$ is the vector from grid $G_{{x_{i} }}$ to grid $G_{{x_{j} }}$, $V_{{G_{i} ,G_{j} }}^{y}$ is the vector from the corresponding grid of $G_{{x_{i} }}$ to the corresponding grid of $G_{{x_{j} }}$ in the other image. The set $I$ contains the putative matches obtained by a certain image matching method.

3.2.4 Compute grid vector consistency

The complex transformation will lead to the difference in the absolute distance between the corresponding grid vectors above. Nevertheless, the relative position between the two endpoints of the vector will be preserved. For example, if $G_{{x_{i} }}$ is to the left of $G_{{x_{j} }}$, then $G_{{y_{i} }}$ will also be to the left of $G_{{y_{j} }}$. Thus, we convert the point vector distance in Eq. 4 to the grid vector distance and quantize it into two levels:

$$ \widehat{d}\left( {V_{{G_{i} ,G_{j} }}^{x} ,V_{{G_{i} ,G_{j} }}^{y} } \right) = \left\{ {\begin{array}{*{20}l} {1,} \hfill &\quad {{\text{d}}\left( {V_{{G_{i} ,G_{j} }}^{x} ,V_{{G_{i} ,G_{j} }}^{y} } \right) > 0} \hfill \\ {0,} \hfill &\quad {{\text{d}}\left( {V_{{G_{i} ,G_{j} }}^{x} ,V_{{G_{i} ,G_{j} }}^{y} } \right) < 0} \hfill \\ \end{array} } \right. $$

(10)

We note that even if $\mathop d\limits^{ \wedge } > 0$, there may be false matches. Figure 4 shows an example. From Fig. 4a and b, we can see that $V_{{G_{i} ,G_{j} }}^{x}$ and $V_{{G_{i} ,G_{j} }}^{y}$ don’t have global consistency. But since the cosine value of the angle between $V_{{G_{i} ,G_{j} }}^{x}$ and $V_{{G_{i} ,G_{j} }}^{y}$ is also greater than zero, it is easy to mistake them as a correct match if we judge it directly according to Eq. 10. By observation we found that the displacement between $G_{{x_{i} }}$ and $G_{{y_{i} }}$ is small, as well as $G_{{x_{f} }}$ and $G_{{y_{f} }}$, but the displacement between $G_{{x_{j} }}$ and $G_{{y_{j} }}$ is large. That is to say, both directions of the mismatched grid vector and their displacement are different from each other due to different global topology.

Based on the above displacement considerations, we define a distance metric that considers both displacements and angles:

$$ s(G_{i} ,G_{j} ) = \left\{ {\begin{array}{*{20}l} 0 &\quad {(\kappa \mathop p\limits^{ - } - p_{i,j}^{{}} ) < 0} \\ 1 &\quad {(\kappa \mathop p\limits^{ - } - p_{i,j}^{{}} ) > 0} \\ \end{array} } \right. $$

(11)

$p_{i,j}$ is computed by Eq. 12, it contains the displacement between the grid pair $(G_{{x_{i} }} ,G_{{y_{i} }} )$ and the grid pair $(G_{{x_{j} }} ,G_{{y_{j} }} )$, $\mathop p\limits^{ - }$ is computed by Eq. 13 that represents the average displacement of all grid pairs:

$$ p_{i,j} = {\text{abs}}(||G_{{x_{i} }} - G_{{y_{i} }} || - ||G_{{x_{j} }} - G_{{y_{j} }} ||) $$

(12)

$$ {\text{ }}\mathop p\limits^{ \wedge } = \sum\limits_{{i = 1}}^{M} {{\text{P}}_{{\text{i}}} ||G_{{x_{i} }} - G_{{y_{i} }} ||} $$

(13)

${||}.{||}$ denotes the Euclidean distance metric, $P_{i}$ is the contribution of each grid. We assume that all the contributions are the same and $P_{i} = 1/M$. κ = 0.3 is experimentally determined.

Combining Eqs. 10 and 11, we obtain a new consensus of global topology considering both displacement and angle between $V_{{G_{i} ,G_{j} }}^{x}$ and $V_{{G_{i} ,G_{j} }}^{y}$:

$$ \mathop {\text{d}}\limits^{ \wedge } (V_{{G_{i} ,G_{j} }}^{x} ,V_{{G_{i} ,G_{j} }}^{y} ) = \left\{ {\begin{array}{*{20}c} {s(G_{i} ,G_{j} )} &\quad {{\text{d}}(V_{{G_{i} ,G_{j} }}^{x} ,V_{{G_{i} ,G_{j} }}^{y} ) > 0} \\ 0 &\quad {{\text{d}}(V_{{G_{i} ,G_{j} }}^{x} ,V_{{G_{i} ,G_{j} }}^{y} ) < 0} \\ \end{array} ,} \right. $$

(14)

where the value of $s(G_{i} ,G_{j} )$ is in $\{ 0,1\}$.

3.2.5 Grid matching metric

From Eq. 14, we can get the consistency of each grid vector pair, and then, all vectors are divided into two cases, with or without consistency. Clearly, a vector pair consisting of both correctly matched grids would be considered correct, while a vector consisting of either of the incorrectly matched grids should be considered false.

Assuming that there are $M$ grids in total, in which there are $t$ correct matched grids that should be retained, then the number of incorrectly matched grids that should be removed is $M - t$. If a grid is matched correctly, the number of vectors with global consistency generated by it should be $t - 1$, and the number of vectors without global consistency generated by it should be $M - t$. If a grid is matched incorrectly, the vectors connected to it will all be without global consistency. That is to say, the number of all grid vectors with global consistency should be $\frac{t(t - 1)}{2}$ marked as $T$, and the number of vectors without global consistency should be $t(M - t) + \frac{(M - t)(M - t - 1)}{2}$ marked as $F$. Then, we define a global consistency score calculator for each grid as follows:

$$ S_{{G_{{\text{i}}} }} = \frac{{G_{{T_{i} }} - G_{{F_{i} }} }}{T - F} $$

(15)

where $G_{{T_{i} }}$ is the number of globally consistent vectors in the vector set connected to $G_{i}$, $G_{{F_{i} }}$ is the number without global consistency connected to it. We calculate the score of correctly matched grids, the score of mismatched grids, and the value of $(T - F)$ when the total number of grids $M$ and the number of correctly matched grids $t$ take different values, and we enumerate the cases of $M = 20$ and $M = 40$, which are summarized in Fig. 5. As we can see, when the value of $(T - F)$ is greater than zero, the score of correctly matched grids will be greater than $2/M$, while the score of incorrectly matched grids will be less than zero. When the value of $(T - F)$ is less than zero, the score of correctly matched grids will be less than $2/M$, while the score of mismatched grids is greater than $2/M$.

According to the above analysis, we can use the following formula to calculate the match confidence of $G_{i}$:

$$ F_{{G_{i} }} = \text{sgn} \left( {\left( {T - F} \right)*\left( {S_{{G_{i} }} - \frac{2}{M}} \right)} \right) $$

(16)

In order to simplify the calculation, $F_{{G_{i} }}$ can be represented as follows:

$$ F_{{G_{i} }} = \text{sgn} \left( {T - F} \right)*\text{sgn} \left( {S_{{G_{i} }} - \frac{2}{M}} \right) $$

(17)

$F_{{G_{i} }}$ can be divided into the following two cases:

$$ F_{i} = \left\{ {\begin{array}{*{20}l} 1 &\quad {if(F_{{G_{i} }} > 0)} \\ 0 &\quad {if(F_{{G_{i} }} < 0)} \\ \end{array} } \right., $$

(18)

where $F_{i} = 1$ represented the grid $G_{i}$ is correctly matched between image pairs, and $F_{i} = 0$ represented not.

According to Eqs. 9, 15 and 17,we can obtain the minimization problem in Eq. 19.

$$ C({\rm P};I) = \sum\limits_{{i|G_{i} \in {\rm P}}}^{{}} {\left( {1 - \text{sgn} \left( {T - F} \right)*\text{sgn} \left( {\frac{{G_{{T_{i} }} - G_{{F_{i} }} }}{{T - F}} - \frac{2}{M}} \right)} \right)} $$

(19)

3.3 Implementation details

The proposed SMF method is summarized in Algorithm 1. First, a specific matching algorithm is used to do pre-matching. In the experiment part, we use five algorithms to do pre-match separately, including RANSAC [17], VFC [24], GMS [6], LPM [7], and GLOF [28]. Second, grid matching is performed based on the feature match distribution. Third, the grid vector matching set is constructed, and the global consistency is calculated by using Eq. 14. In order to prevent false matching caused by image rotation, we rotate $G_{y}$ around the center every 45 degrees and take the matching result which has the smallest cost value. Finally, we get a grid set $P$ that minimize Eq. 19. After that, we keep all feature matching pairs located within the grids in $P$, delete the rest, and obtain the final matching result $I^{*}$. There is only one parameter κ in our method, which is used to determine whether the grid vector is global consistency or not according to the displacement.

3.4 Computational complexity

The main two steps in the SMF method are calculating the number of matches between grids and calculating the global consistency score for all grid vector pairs. The time complexity of the first step is $O(T)$, where $T$ is the number of the putative match set. Suppose the number of the grids that have matches inside is $M$, which is much smaller than $T$, then the time complexity and the space complexity of the second step are $O(M^{2} )$. So, compared with the given pre-matching results, our SMF has linear time complexity.

4 Experimental results

We conduct experiments on image feature matching and object reconstruction to evaluate the performance of our SMF. The features of each image are detected by using ORB [31], which is efficient, robust, can obtain a large number of extracted feature points, and is suitable for real-time tasks. The number of features is set to 5,000. We implemented all the algorithms in Visual Studio 2019, opencv4.5.1, and C + + without any optimization. All the experiments are performed on a notebook with 2.8 GHz Intel Core i7-1165G and 16 GB RAM.

4.1 Results on feature matching

Image pairs with four different types of similarity are used to verify the effectiveness of our SMF on mismatch removal, including identical image pairs with large baselines, similar image pairs with partially same patterns, similar image pairs with completely different pattern layouts, and similar image pairs with almost symmetric patterns. We show some examples in Fig. 6. The traditional ratio-test(threshold 0.6) match result is used as the putative match set.

The DTU [32] dataset is used to verify the effectiveness of the SMF algorithm on the same object. We collect the other three datasets with different types of similar images. We manually checked the correctness of each feature match for each image pair, labeled it as true or false, and used it as the ground truth to ensure objectivity. The details of the datasets are described as follows:

DTU [32]: The dataset is mainly used for MVS(Multiple-View Stereo) reconstruction. It contains many types of 3D scenes, each containing a series of images taken from different perspectives. The ground truth is calculated by camera parameters supplied by the dataset. Two scenes are selected, and we choose 50 image pairs that have large baselines. The average number of the putative matches is 1203.5, and the average inlier rate is 56.73%.
Similar image dataset No.1: The dataset contains images taken from three different scenes. The image pairs have partly identical patterns and partly differently positioned patterns. We collect 30 image pairs totally. The average number of the putative matches is 1406.3, and the average inlier rate is 39.63%.
Similar image dataset No.2: The dataset contains images taken from 4 different objects. The different surfaces of the objects have similar pattern elements, but different typography. We match every two images for each object and create 43 image pairs. The average number of the putative matches is 455. Due to the completely different typography, the number of true matches should be zero.
Similar image dataset No.3: The dataset contains images taken from 3 different objects. The different surfaces of the objects have almost symmetrical patterns. We match every two images for each object and create 27 image pairs in total. As with dataset No.2, the number of true matches should also be zero.

We test our SMF on the datasets described above and compared it with five state-of-the-arts: RANSAC [17], VFC [24], GMS [6], LPM [7], and GLOF [28]. RANSAC [17] is particularly a classic sampling-based approach; VFC [24] is a non-parametric-interpolation- based method; LPM [7] is a locality-neighborhood-based method; GMS [6] is based on grid motion statistics; GLOF [28] is a mismatch rejection method based on the local density reachability. We implement these algorithms based on publicly available codes. Then, the SMF method proposed in this paper is tested by using these algorithms as pre-matching algorithms to test the filter effectiveness. We also test the results by using other matching filters based on LPM [7] and VFC [24].

Figure 7a–e illustrates some representative matching results on similar image pairs including in dataset No.1 by using SMF based on RANSAC [17], VFC [24], GMS [6], LPM [7], and GLOF [28]. These image pairs contain the same patterns with different locations. We first use the five methods to do pre-match, and all lines in Fig. 7a–e are the pre-matching result. Our goal is to identify false matches caused by different typography. The matches of patterns with different positions should be identified as false, marked with blue lines, and the rest of the true matches are marked with green lines. For example, the small pattern of the “Book” and “Dictionary” pairs has different positions, so the matches between the small pattern should be recognized as mismatches. The pattern layout of the two surfaces on “Toothpaste” is completely different, so the matches between these two surfaces should also be false. Figure 7f indicates the match results by using GMS [6] as a matching filter based on LPM [7]. Figure 7g indicates the match results by using GLOF [28] as a matching filter based on VFC [24]. Figure 7h shows the correct matching results. From Fig. 7, we can see that SMF can work with different algorithms to effectively identify the mismatches caused by the same patterns at different locations.

Three metrics are used to evaluate the result: precision, recall, and F-score. The number of true positive matches(TP), true negative matches(TN), false positive matches(FP) and false negative matches (FN) are given, and the precision is calculated by:

$$ P{ = }\frac{{{\text{TP}}}}{{\text{TP + FP}}} $$

(20)

The recall is obtained by:

$$ R =\, \frac{{{\text{TP}}}}{{\text{TP + FN}}} $$

(21)

The F-score is given as follows:

$$ F = \frac{2 \times P \times R}{{P + R}} $$

(22)

The quantitative comparisons of the average precision, recall, and F-score for each object are summarized in Table 1. From the results, we see that when using the original algorithm, RANSAC [17] can obtain relatively better metrics because of the matrix transformation principle, which can deal with the different matrices caused by a few same patterns at different locations. Other original algorithms have low precision because they likewise only identify correct matches based on local feature matching or local topological consistency. We can see that after using our SMF algorithm, the mismatch caused by different global topologies can be effectively identified. The metrics of using GMS [6] as the filter on LPM [7] and the metrics of using GLOF [28] as the filter on VFC [24] indicate that using SMF as the filter can get more corrective matching results. A more intuitive evaluation result is shown in Fig. 8 for a more straightforward comparison.

Table 1 Precision metric, recall metric and F-score metric of image pairs including in dataset No.1 based on different algorithms

Full size table

Figure 9a illustrates some examples of the matching results of similar image pairs with totally different pattern layouts(Dataset No.2) and similar image pairs with almost symmetry patterns(Dataset No.3) by using SMF on the result of the individually executed algorithm mentioned above. Figure 9b illustrates the results of two combined algorithms. These image pairs have completely different global typography or are almost symmetrical. Therefore, they should theoretically not have any matching in order to obtain sufficiently correct camera poses in subsequent work such as 3D modeling. From the results, it can be seen that when other algorithms identify the matches between similar images with different typography as correct, our SMF can identify them as belonging to different scenes and consider them as wrong, which is beneficial for subsequent camera pose estimation work. This verifies the effectiveness of the proposed global topology strategy. So, SMF can be used as a follow-up step to any matching algorithm for mismatch removal of different typography images.

Figure 10 shows the average run time of the mentioned approaches on the four datasets. The maximum image resolution does not exceed 1500*1200. GMS [6] has the fastest running time due to the grid strategy. Compared with the run time of the original algorithm, continuing to use SMF only takes approximately 37.4% more time to identify mismatches caused by different global topologies.

4.2 Results on object reconstruction

Open-source software such as VisualSFM [33, 34], COLMAP [35, 36], and OpenMVG + OpenMVS [37,38,39] is widely used to reconstruction scenes from unserialized images. To verify the role of SMF for 3D reconstruction, we perform reconstruction tasks on three objects that are similar on different surfaces and compare the results with the other three open-source software.

We take 17 images from different angles around Toothpaste, the schematic is shown in Fig. 11a. Figure 11b is the results of the camera pose estimation used by OpenMVG [37, 38], COLMAP [35, 36], and OpenMVG [37, 38] + SMF. It can be seen that there are obvious pose estimation errors when using COLMAP [35, 36] and OpenMVG [37, 38]. By additionally using SMF, we can get the correct camera pose estimation results, which is important for the next 3D reconstruction step [40].

In addition, we continue the dense point cloud reconstruction process based on the different camera poses calculated by the three algorithms to estimate the depth for each scene taken from different positions. We calculate the depth estimation errors for OpenMVG [37, 38], COLMAP [35, 36] and OpenMVG [37, 38] + SMF. We use Intel Realsense D435i and align the RGB images with the depth maps. Although there are usually many invalid values in the obtained depth maps, such as those invalid values caused by the occlusion due to the different angles of the left and right cameras, the valid depth values are relatively reliable. Figure 12 shows one example of the RGB image, the left IR view, the right IR view, and the corresponding depth map, in which the black is invalid data. We sample 20 frames for each angle and take the average depth of these 20 frames as the true depth value of each scene taken from different positions. To eliminate the effect of different units of depth values get from different algorithms, we normalized the data. We use three evaluation metrics to evaluate the depth estimation performance, including AbsRel, MAE, and RMSE, that are shown in Eq. (23), Eq. (24), and Eq. (25):

$$ {\text{Abs}}{\text{Re}} l = \frac{{\sum\nolimits_{i = 1}^{N} {\frac{{|D_{i} - D_{i}^{*} |}}{{D_{i}^{*} }}} }}{N} $$

(23)

$$ {\text{MAE}} = \frac{{\sum\nolimits_{i = 1}^{N} {|D_{i} - D_{i}^{*} |} }}{N} $$

(24)

$$ {\text{RMSE}} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} ( D_{i} - D_{i}^{*} )^{2} }}{N}} $$

(25)

where $D_{i}$ is the estimated depth of the $i_{th}$ point, $D_{i}^{*}$ is the true depth value of the $i_{th}$ point, and $N$ is the number of the points with valid depth values in the scene. Table 2 shows the metrics compared with the true depth value by using OpenMVG [37, 38], COLMAP [35, 36], and OpenMVG [37, 38] + SMF. The depth estimate process for the 1^st image to the 9^th image failed, so we compare the error results for the 10^th image to the 17^th image. It can be seen that the estimated depth results are effectively improved after further filtering of the mismatches by using the method proposed in this paper. Figure 13 shows the reconstructed dense point clouds based on the three different estimated camera pose results mentioned in Fig. 11b. It can be seen that the dense point clouds obtained based on the camera pose calculated by OpenMVG [37, 38] and COLMAP [35, 36] have contour errors and point cloud loss, while the point cloud obtained based on the camera pose calculated by further using SMF has relatively complete and correct result.

Table 2 The error between the true depth and the depth obtained based on the three different estimated camera pose results by using OpenMVG [37, 38], COLMAP [35, 36], and OpenMVG [37, 38] + SMF for the 9^th to the 17^th image

Full size table

Figure 14 shows the reconstruction results by using different algorithms. For datasets Toothpaste and Garbage can, the results of the other three algorithms have different degrees of damage and distortion. For Pencil box, VisualSFM [33, 34] and COLMAP [35, 36] have surface missing or distortion. Although the results with OpenMVG + OpenMVS [37,38,39] are more complete, they still have distortions. It can be seen that after using our SMF to further filter the outliers caused by different global topology, it can help to subsequently calculate more accurate results for the camera pose, and the accuracy and completeness of the resulting 3D reconstructed model are higher.

5 Conclusion

In this paper, we present a global-typography-based method SMF to remove mismatches for similar image pairs. It works based on the consensus that the global topological structure of matched points of image pairs taken from the same scene or the same object should be similar. SMF can work after using a certain matching algorithm that the given match result has local consistency and then can be used to remove false matches that do not have global consistency. We formulated the global topological structure of the matches into a mathematical model to robustly recover the inliers by judging the false matches caused by different pattern layouts. The experimental results on image matching demonstrated that our method outperforms the state-of-the-art methods. Moreover, it can be used in reconstruction pipelines to obtain more accurate results.

So far, the algorithm SMF proposed in our work assumes that the cardinality of the images is equal. In future work, we consider advancing SMF to define the global topology model better and applying our method to more computer vision applications, such as matching the elimination of different instances or distinguishing similar environments in VSLAM. We will also compare the effectiveness of the proposed algorithm with deep-learning methods.

Data availability

Data will be made available on reasonable request.

References

Fuhrmann S., Langguth F., Goesele M.: MVE - A Multi-View Reconstruction Environment. In: Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage(2014)
Liang, L., Zhao, W., Hao, X., Yang, Y., Yang, K., Liang, L., Yang, Q.: Image registration using two-layer cascade reciprocal pipeline and context-aware dissimilarity measure. Neurocomputing 371, 1–14 (2020)
Article Google Scholar
Lin, W., Wang, F., Cheng, M., Yeung, S.K., Torr, P.H.S.: Code: coherence based decision boundaries for feature correspondence. IEEE Trans. Pattern Anal. Mach. Intell. 40(1), 34–47 (2017)
Article Google Scholar
Gamba, J.: Target recognition and classification techniques. Radar Signal Process. Autonomous Driv. 105, 121 (2020)
Google Scholar
Zhao, Y., Vela, P.A.: Good feature matching: toward accurate, robust VO/VSLAM with low latency. IEEE Trans. Robotics 99, 1–19 (2020)
Google Scholar
Bian, J., Lin, W., Matsushita, Y., Yeung, S.K., Nguyen, T.D., Chen, M.: GMS: Grid-based motion statistics for fast, ultra-robust feature correspondence. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2828–2837(2017)
Ma, J., Zhao, J., Jiang, J., Zhou, H., Guo, X.: Locality preserving matching. Int. J. Comput. Vision 127(5), 512–531 (2019)
Article MathSciNet Google Scholar
Ma, J., Jiang, X., Fan, A., Jiang, J., Yan, J.: Image matching from handcrafted to deep features: a survey. Int. J. Comput. Vision 129, 23–79 (2021)
Article MathSciNet Google Scholar
Lazaridis, G., Petrou, M.: Image registration using the Walsh transform. IEEE Trans. Image Process. 15(8), 2343–2357 (2006)
Article Google Scholar
Cao, S., Shen, H., Chen, S., Li, C.: Boosting structure consistency for multispectral and multimodal image registration. IEEE Trans. Image Process. 29, 5147–5162 (2020)
Article Google Scholar
Loeckx, D., Slagmolen, P., Maes, F., Vandermeulen, D., Suetens, P.: Nonrigid image registration using conditional mutual information. IEEE Trans. Image 29(1), 19–29 (2009)
Article Google Scholar
Liu, Y., Pan, J., Su, Z., Tang, K.: Robust dense correspondence using deep convolutional features. Vis. Comput. 36, 827–841 (2020)
Article Google Scholar
Brachmann, E., Rother, C.: Neural-guided RANSAC: Learning where to sample model hypotheses. Proceedings of the IEEE International Conference on Computer Vision, 4322–4331(2019)
Chen, S., Zhang, J., Jin, M.: A simplified ICA-based local similarity stereo matching. Vis. Comput. 37, 411–419 (2021)
Article Google Scholar
Liu C., Niu D., Yang, X., Zhao X.: Graph matching based on feature and spatial location information. The Visual Computer (2022)
Liao, Q., Sun, D., Andreasson, H.: Point set registration for 3d range scans using fuzzy cluster-based metric and efficient global optimization. IEEE Trans. Pattern Anal. Mach. Intell. 43(9), 3229–3246 (2021)
Article Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Torr, P.H., Zisserman, A.: Mlesac: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2020)
Article Google Scholar
Nasuto, D., Craddock, J. B. R.: Napsac: high noise, high dimensional robust estimation-it’s in the bag. Proceedings of the British machine vision Conference, 458–467(2002).
Zhu, W., Sun, W., Wang, Y., Liu S., Xu, K.: An Improved RANSAC Algorithm Based on Similar Structure Constraints. International Conference on Robots & Intelligent System, 94–98(2016)
Barath, D., Matas, J. :Graph-cut ransac. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6733–6741(2018)
Li, X., Hu, Z.: Rejecting mismatches by correspondence function. Int. J. Comput. Vision 89(1), 1–17 (2010)
Article Google Scholar
Lipman, Y., Yagev, S., Poranne, R., Jacobs, D.W., Basri, R.: Feature matching with bounded distortion. ACM Trans. Gr. 33(3), 1–14 (2014)
Article Google Scholar
Ma, J., Zhao, J., Tian, J., Yuille, A.L., Tu, Z.: Robust point matching via vector field consensus. IEEE Trans. Image Process. 23(4), 1706–1721 (2014)
Article MathSciNet Google Scholar
Ma, J., Zhao, J., Jiang, J., Zhou, H: Non-rigid point set registration with robust transformation estimation under manifold regularization. Proceedings of AAAI Conference Artificial Intelligence, 4218–4224 (2017).
Wang, G., Wang, Z., Chen, Y., Liu, X., Ren, Y., Peng, L.: Learning coherent vector fields for robust point matching under manifold regularization. Neurocomputing 216, 393–401 (2016)
Article Google Scholar
Liu, Y., Li, Y., Dai, L., Yang, C., Wei, L., Lai, T., Chen, R.: Robust feature matching via advanced neighborhood topology consensus. Neurocomputing 421, 273–284 (2021)
Article Google Scholar
Wang, G., Chen, Y.: Robust feature matching using guided local outlier factor. Pattern Recogn. 117, 107986 (2021)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int.. J Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Bay, H., Tuytelaars, T., Gool, L.V.: Speeded-up robust features. Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to SIFT or SURF. The IEEE International Conference on Computer Vision, 2564–2571(2011)
Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vision 120(2), 153–168 (2016)
Article MathSciNet Google Scholar
Wu,C.: Towards Linear-time Incremental Structure FromMotion. International Conference on 3D Vision, 127–134(2013)
Wu, C., Agarwal,S., Curless,B., Seitz, S.M.: Multicore Bundle Adjustment. Conference on Computer Vision and Pattern Recognition, 3057–3064(2011)
Schönberger, J. L., Frahm, J. M. :Structure-from-Motion Revisited. IEEE Conference on Computer Vision & Pattern Recognition,4104–4113(2016)
Schönberger, J.L., Zheng, E., Frahm, J.M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. European Conference on Computer Vision, 501–518(2016)
Moulon P., Monasse P., Perrot R., Marlet, R.: OpenMVG: Open Multiple View Geometry. International Workshop on Reproducible Research in Pattern Recognition, 60–74(2016)
Moulon P., Monasse P., Marlet R.: Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion. IEEE International Conference on Computer Vision, 3248–3255(2013)
Cernea, D.: Openmvs: Open Multiple View Stereovision(2015). Available online: https://cdcseacave.github.io/openMVS. Accessed on 10 Oct. 2021
Liu, D., Chen, L.: SECPNet—secondary encoding network for estimating camera parameters. Vis. Comput. 38, 1689–1702 (2022)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China (No. 62072388), the Collaborative Project Foundation of Fuzhou-Xiamen-Quanzhou Innovation Zone(No.3502ZCQXT202001), the Industry Guidance Project Foundation of Science technology Bureau of Fujian province in 2020 (No.2020H0047), the Natural Science Foundation of Science Technology Bureau of Fujian province in 2019 (No.2019J01601), the Creation Fund project of Science Technology Bureau of Fujian province in 2019(No.2019C0021), the Middle Youth Education Project of Fujian Province in 2019 (No. JAT190596), and Fujian Sunshine Charity Foundation.

Author information

Authors and Affiliations

College of Mechatronics and Information Engineering, Putian University, Putian, Fujian, China
Qingqing Chen
School of Film, Xiamen University, Xiamen, Fujian, China
Junfeng Yao
College of New Engineering Industry, Putian University, Putian, Fujian, China
Junyi Long

Authors

Qingqing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Yao
View author publications
You can also search for this author in PubMed Google Scholar
Junyi Long
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingqing Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, Q., Yao, J. & Long, J. Similar image matching via global topology consensus. Vis Comput 40, 937–952 (2024). https://doi.org/10.1007/s00371-023-02824-y

Download citation

Accepted: 26 February 2023
Published: 30 March 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00371-023-02824-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Similar image matching via global topology consensus

Abstract

Similar content being viewed by others

LSD-SLAM: Large-Scale Direct Monocular SLAM

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Fast Global Registration

1 Introduction

2 Related works

3 Method

3.1 Problem formulation

3.1.1 Formulation for locality preserving matching

3.1.2 Formulation for global topology consistency

3.2 Grid the problem

3.2.1 Construct grid matching characteristics

3.2.2 Build grid vector matching sets

3.2.3 Global topology consistency of grids

3.2.4 Compute grid vector consistency

3.2.5 Grid matching metric

3.3 Implementation details

3.4 Computational complexity

4 Experimental results

4.1 Results on feature matching

4.2 Results on object reconstruction

5 Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Similar image matching via global topology consensus

Abstract

Similar content being viewed by others

LSD-SLAM: Large-Scale Direct Monocular SLAM

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Fast Global Registration

1 Introduction

2 Related works

3 Method

3.1 Problem formulation

3.1.1 Formulation for locality preserving matching

3.1.2 Formulation for global topology consistency

3.2 Grid the problem

3.2.1 Construct grid matching characteristics

3.2.2 Build grid vector matching sets

3.2.3 Global topology consistency of grids

3.2.4 Compute grid vector consistency

3.2.5 Grid matching metric

3.3 Implementation details

3.4 Computational complexity

4 Experimental results

4.1 Results on feature matching

4.2 Results on object reconstruction

5 Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation