Introduction

Three-dimensional (3D) reconstruction [1,2,3,4] is an important research content in the field of computer vision, which has been widely used in medical systems [5], autonomous navigation [6], aeronautical and remote sensing measurements [7], and industrial automation [8]. Because the number, precision, and distribution range of feature matching directly affect the effect of 3D reconstruction, feature matching becomes the key step in 3D reconstruction, which is mainly manifested as: (1) the reconstructed 3D model is prone to point cloud sparsity and reconstruction interruption, when the number of feature point matching is limited; (2) if the error rate of feature point matching is too high, noise point cloud and model distortion are likely to occur in 3D reconstruction; (3) when the matching distribution of feature points is concentrated, it will easily lead to instability of the 3D reconstruction model. Therefore, highly robust feature matching methods play an important role in improving the quality of 3D reconstruction.

To improve the performance of feature point matching, previous work mainly focused on improving the distinguishing ability of descriptors, and significant effects have been achieved, such as SIFT [9, 10], ORB [11], SURF [12], and A-SIFT [13], etc. Although these methods have been relatively mature, they still have the following disadvantages: (1) the number of feature point matches obtained by the nearest neighbor is large but the error rate is high; (2) feature point matching in the form of nearest neighbor and second nearest neighbor ratio [9] can improve the quality of feature point matching, but fewer matching pairs are obtained [14]; (3) these methods lack consideration of the distribution of matching feature points [15]; (4) these methods are only improved from the perspective of the descriptors [12, 16], and it is difficult to distinguish true and false matches. It can be seen that these matching methods still have many shortcomings in the number, precision, and distribution of feature point matching, and the establishment of a highly robust feature matching model is still a research topic worth challenging.

Motion consistency [17, 18] generally follows the principle of “similar features sharing consistent motions”. By incorporating smoothness constraints into feature point matching, it can effectively distinguish true and false matches and increase the number of correct matches. At present, many researchers have done a lot of research on feature point matching algorithms based on motion consistency [19,20,21,22,23,24,25,26,27,28,29]. Among them, The Grid-based Motion Statistics (GMS) [23, 24] is an efficient grid-based motion statistic. It converts the motion smoothing constraint into a statistic that eliminates false matches, which can improve the quality of matches while increasing the number of feature matches. However, it divides the neighborhood by an equally divided grid, which tends to cause the absence of correct neighborhoods and thus reduces the number of feature point matches. Considering the rotation invariance of circles, we propose a circle-based neighborhood partitioning method. Specifically, we take each matching point as the neighborhood center, and take the circle region within a certain range as the neighborhood of the matching point, thereby increasing the number of feature matches. In addition, the GMS [23, 24] only uses a single threshold \(\alpha \) to judge true and false matching, which is easy to lead to sensitive matching results and strict requirements for the value of \(\alpha \), and the value of \(\alpha \) is different for different matching scenarios, making it difficult to choose the optimal \(\alpha \). Therefore, to solve this problem, we present a new idea of Enhancing Motion Consistency (EMC), that is, based on motion consistency, a threshold \(\beta \) (\(\alpha <\beta \)) is added to strengthen the discriminative conditions for distinguishing true and false matches, thereby removing false matches caused by repetitive textures with high similarity and reducing the proportion of false matches. Moreover, to further improve the precision of feature point matching, we use Random Sample Consensus (RANSAC) [30, 31] to eliminate outliers, which greatly alleviate the problems of sparse point cloud, reconstruction interruption, noisy point cloud, and model distortion of the 3D reconstruction model.

Furthermore, to avoid the concentrated distribution of matching points in the local area of the image and improve the stability of the 3D model, we propose a Guided Diffusion (GD) idea. It mainly includes two steps of guided matching and motion consistency. In the step of guided matching, the distribution range of feature point matching is expanded by conducting epipolar geometric guided matching on the high-precision matching set. In the step of motion consistency, a small-range motion consistency constraint is applied to the diffusion matching set, which eliminates the false matching caused by the weak point-line epipolar constraint relationship. We verify the effectiveness of the proposed method by conducting experiments on multiple datasets and comparing it with existing methods. Specifically, our EMC+GD_C achieves an average improvement of 9.18% compared to GMS, 1.94% to EMC+GD_G, and 24.07% to the SIFT-based ratio test in feature matching precision.

In summary, to improve the reconstruction effect of 3D reconstruction, we propose a circle-based enhanced motion consistency and guided diffusion feature matching algorithm named EMC+GD_C for 3D reconstruction from three aspects of the number, precision, and distribution of feature points matching. More concretely, the contributions of this study can be summarized as following:

(1) Propose a circle-based neighborhood division method. Instead of the existing methods of dividing the neighborhood, we combine the characteristics of the rotation invariance of the circle, and take the circle neighborhood within a certain range of each matching point as our neighborhood. This method can effectively avoid the problem of missing neighborhood caused by image rotation and increase the number of feature point matching.

(2) An EMC strategy is proposed to determine whether the correspondences in circular neighborhood pairs satisfy the motion consistency or not, which can effectively improve the precision of the initial matching set.

(3) Propose a novel GD method that fuses guided matching and motion consistency constraints, which effectively expands the distribution range of feature point matching.

(4) Apply our EMC+GD_C algorithm to 3D reconstruction. It not only effectively alleviates the problems of point cloud sparseness, reconstruction interruption, point cloud noise, and model distortion during 3D reconstruction, but also improves the stability of the 3D model.

The remainder of the paper is organized as follows. Some related work is discussed in Sect. Related work. Section EMC+GD_C introduces the algorithm in detail. In Sect. Experiments, we analyze and discuss the experimental results. The conclusion is drawn in Sect. Conclusions and limitations.

Related work

As a key process of recovering 3D models from images, feature [32, 33] matching has been studied by more and more researchers from the perspective of 3D reconstruction [7, 34]. For example, Hu et al. [35] used the SIFT matching algorithm to find matching pixel points in two corresponding digital images and used the midpoint on the common perpendicular of a non-coplanar line to estimate 3D points. SIFT-based point matching is robust to obvious translation, rotation, and scaling, but the number of correctly matched points usually decreases sharply with the increase of the angle between the viewpoints. To solve this problem, Stumpf et al. [36] used the affine invariant extension (A-SIFT) of the SIFT detector to provide more reliable matching and obtain a more accurate 3D model than SIFT. Liu et al. [37] used the ORB feature detection algorithm to extract image features and realized a fast 3D reconstruction method for indoor, simple, and small-scale static environments, which has good accuracy, robustness, real-time, and flexibility. However, these algorithms that improve feature point matching from the perspective of descriptors are difficult to distinguish between true and false matches, and even if they can filter out some outliers, they will affect the number of correct matches. Therefore, in the process of 3D reconstruction, point cloud sparsity and reconstruction interruption are easy to occur.

In recent years, many researchers have conducted a lot of research on feature point matching algorithms based on motion consistency. For example, Maier et al. [20] proposed a guided matching method based on statistical optical flow, which constrained the search space by using spatial statistics for matching and filtering corresponding small subsets. Although this method has better performance in terms of processing time, it may filter out very small dynamic objects during the statistical optical flow process, thus reducing the robustness of the algorithm. Wang et al. [21] used Density Maximization (DM) and defined a good local smooth neighborhood to avoid noise, which significantly improved the precision of matching, and can handle outliers and many-to-many object matching at the same time. However, it is limited to sparse feature matching, which makes the calculation cost higher and the implementation complicated. Lin et al. [14] used Bilateral Functions (BF) to reduce the false filtering of correct matches. They calculated the global matching consistency function and enabled motion discontinuity on the object boundary. But it runs slowly and is easily affected by the repetitive structure, which reduces the matching precision. To solve this problem, Lin et al. [22] improved based on BF, they combined BF with RANSAC to form a wide baseline matcher that achieves high precision and recall rates in challenging scenarios. These techniques all used the consistency constraint of matching distribution to separate true and false matches. Although they can alleviate the interruption of reconstruction, the complicated formulas lead to complex smoothness constraints, which are difficult to understand and precise. Yang et al. [26] proposed a dynamic-scale grid structure into the mismatch removal stage to reduce the time complexity of neighborhood construction, and proposed a Gaussian-based weighted scoring strategy to combine the descriptor matching stability with geometric consistency. GMS [23, 24] used a statistical matching constraint, which is simpler and easier to understand. It adopted the idea of an equipartition grid to judge true and false matching, and encapsulated the consistency of motion in the grid, which improved the quality of matching while increasing the number of feature matching. However, when there are highly similar repeated structures, it will cause a large number of persistent mismatches, which are manifested as sparse and noisy point clouds in 3D reconstruction.

Fig. 1
figure 1

Algorithm framework

Fig. 2
figure 2

Neighborhood comparison chart. The red regions in the figure are the neighborhood actually divided. The green dashed box is the correct neighborhood

EMC+GD_C

Our algorithm framework is shown in Fig. 1, which is mainly divided into two steps: High-precision matching and GD-based uniform matching.

Step 1. High-precision matching. Based on the neighborhood division of circles and using the EMC idea, a certain number of initial matching set named IMS is obtained, and then RANSAC is implemented on the IMS to further improve the feature matching precision, thus obtaining a Dependable Matching Set named DMS.

Step 2. GD-based uniform matching. The idea of guided matching and motion consistency is applied to GD, and the Robust Matching Set named RMS with uniform distribution is obtained.

High precision matching

Circle-based neighborhood division

For the adjacent image pairs of two 3D scenes to be matched, a fast and concise ORB [11] algorithm with high computational efficiency was used to detect feature points and descriptors, and after brute force matching, a Brute Force Matching Set named BFMS is obtained, but it contains a lot of mismatches. The idea of motion consistency can be used to make a good judgment of true and false matches. Among them, GMS [23, 24] is the most successful matching algorithm for applying motion consistency. It uses an even grid to divide the neighborhood, but when the image is rotated, GMS [23, 24] only considers several discrete values, and when other rotation angles occur, it is easy to cause the loss of some feature points, as shown in Fig. 2a.

Considering the rotation invariance of the circle, we divide the neighborhood by drawing circles, and only divide the neighborhood of each matching point to avoid the loss of feature points and local neighborhood, as shown in Fig. 2b. Specifically, by traversing each matching feature point in BFMS, and using each matching feature point as the center to find all the feature points within a certain range (neighborhood radius r), the feature points in this range are the matching pairs in the corresponding neighborhood. In this paper, we set r as the radius of the circumcircle of the nine grids in GMS [23, 24] and normalized it to \(r=0.1\).

EMC-based initial matching

To determine the neighborhood score of each neighborhood, after dividing the neighborhood, we score the matching pairs in each neighborhood j, as shown in Fig. 3. Assuming that the total number of matching pairs in each neighborhood is \(P_j\), then the neighborhood score \(S_{mn}\) of each matching pair {\(m_j,n_j\)} in the neighborhood can be expressed as:

$$\begin{aligned} S_{mn} = \vert P_j \vert \text{- } 1 \end{aligned}$$
(1)

where \(-1\) denotes removing the original matching pair and j represents the index value of each neighborhood.

Fig. 3
figure 3

Schematic diagram of neighborhood score

Since the number of features in each neighborhood is different, we also take the feature reference value \({\tilde{N}}\) in the neighborhood into account, and according to the neighborhood score \(S_{mn}\) of each matching pair {\(m_j,n_j\)} in the neighborhood, the distribution of matching pairs \(D_{mn}\) is divided into high support (H) and low support (L):

$$\begin{aligned} D_{m n}=\left\{ \begin{array}{l} H, \quad \text { if } \quad S_{m n}>\alpha \times {\tilde{N}} \\ L, \quad \text { otherwise } \end{array}\right. \nonumber \\ \end{aligned}$$
(2)

where \(S_{mn}\) is the neighborhood score of the matched pair {\(m_j,n_j\)}, and \({\tilde{N}}\) is the feature reference value in the neighborhood. By reference to the GMS [23, 24] algorithm, we make \({\tilde{N}}\) = \(|N|^{\frac{1}{2}}\), where N is the total number of features in the neighborhood. \(\alpha \) is a threshold parameter.

In the algorithm of GMS [23, 24], only the threshold \(\alpha \) is used to judge the true and false matching, that is, the matching pair belonging to H is the correct matching pair (C), and the matching pair belonging to L is the wrong matching pair (E), which is easy to cause the matching result to be sensitive and the value of \(\alpha \) is strictly required. Especially in some special scenes, such as the repetitive texture regions with high similarity, it is not enough to distinguish between correct and incorrect matching only by \(\alpha \). Because repeated textures have similar motion and neighborhoods to the correct matching pairs, if \(\alpha \) is set too small, they will be mistaken for correct.

As shown in Fig. 3, the neighborhood scores of similar matching points \(n_2\) and \(n_3\) are: \(S_{mn_2}=3\), \(S_{mn_3}=1\), when a small threshold \(\alpha \) is used, they are largely divided into the correct matching area. Therefore, to avoid the shortcomings of the single threshold \(\alpha \) being sensitive to the matching results and strict requirements on the threshold \(\alpha \), we added a threshold \(\beta \) (\(\alpha <\beta \)), giving an idea of EMC to remove false matches caused by high similarity repetitive textures, and improve the precision of feature matching. The specific definition is as follows:

Definition 1

Enhanced Motion Consistency named EMC: Based on dividing the matching distribution \(D_{mn}\) into H and L in formula (2), the matching conditions of true and false are further subdivided, that is, H is divided into C, similar or repeated matching (R) and L are divided into E, which improves the standard conditions for correct matching. The calculation formula is:

$$\begin{aligned} D_{m n}=\left\{ \begin{array}{l} H=\left\{ \begin{array}{l} C, \quad \text{ if } \quad S_{m n}>\beta \times {\tilde{N}} \\ R, \quad \text{ if } \quad \alpha \times {\tilde{N}}<S_{m n} \le \beta \times {\tilde{N}}, \alpha <\beta \end{array}\right. \\ L=E, \quad \text{ if } \quad S_{m n} \le \alpha \times {\tilde{N}} \end{array}\right. \nonumber \\ \end{aligned}$$
(3)

where C is the correct matching, R is the similar or repeated matching, E is the error matching, \(S_{mn}\) is the neighborhood score of the matched pair {\(m_j,n_j\)}, \({\tilde{N}}\) is the feature reference value in the neighborhood, where \({\tilde{N}} = |N|^{\frac{1}{2}}\), N is the total number of features in the neighborhood, \(\beta \) is a threshold parameter and \(\alpha <\beta \). At this time, the set that correctly matches C is obtained and denoted as IMS:

$$\begin{aligned} IMS= set\{C\} \end{aligned}$$
(4)

RANSAC initial matching optimization

For image feature point matching, RANSAC is used to fit the model from the matching correspondence and remove outliers [22, 24]. Although the process of obtaining the IMS in this paper also attempts to remove the wrong correspondence, it cannot fit the model as an estimator based on RANSAC. Therefore, the IMS provides a higher-quality corresponding hypothesis for RANSAC. By implementing the RANSAC outlier removal scheme for them, the precision of feature point matching is further improved, and then a DMS is obtained. Since the scenes used for 3D reconstruction are generally large parallax scenes, we fit the DMS to the more precise fundamental matrix model F1. The fitted accurate model helps to better implement the guided matching, and the more correct matches are obtained, the smaller the error of the matching pair after diffusion.

GD-based uniform matching

Although the precision of DMS is high, on one hand, expanding the threshold \(\beta \) would remove the part of the correct matching within a small range; on the other hand, due to the influence of EMC and neighborhood division, similar features are usually located in the same neighborhood, making it easy for correct matching pairs to be concentrated and distributed in the part of the image. To obtain more matches, we combine guided matching and motion consistency constraints, and propose a guided diffusion idea named GD, which is mainly divided into two steps: Guided matching and Motion consistency constraint.

Fig. 4
figure 4

Schematic diagram of motion consistency constraint. Any feature point a in the image can be constrained by the epipolar constraint to obtain several corresponding points \(b_1 \cdots b_n\) on the epipolar line l, which will cause error matching (green line). In this paper, we apply the motion consistency in a small range to the epipolar constraint (yellow circle in the figure) and get the correct matching point \(b_k\)

Guided matching

Applying the idea of guided matching [38], based on DMS, more matching pairs can theoretically be guided by the model F1. As a result, we implement guided matching for all BFMS, retain all matching pairs that meet the conditions, and obtain a large number of corresponding points, namely the guided matching set, which is specifically defined as follows:

Definition 2

Guided Matching Set named GuideMS: By judging whether BFMS conforms to model F1, that is, verifying whether the distance from BFMS to model F1 is within a certain error range, all matching pairs that meet the conditions are GuideMS. The calculation formula defined as:

$$\begin{aligned} GuideMS=\vert D(BFMS,F1) \vert <t \end{aligned}$$
(5)

where D(xy) is the distance from x to y, \(\vert \cdot \vert \) is the absolute value operation, t is the distance parameter, and based on experience, we set \(t=10\).

Motion consistency constraint

Since the fundamental matrix is a weak point-to-line epipolar constraint relationship (see Fig. 4), there are a large number of false matches in GuideMS (see Fig. 5b). These false matches do not always satisfy motion consistency, therefore, based on GuideMS, we perform a small range of motion consistency constraints, and then obtain the matching result after diffusion, that is, the RMS, which is specifically defined as definition 3:

Definition 3

Robust Matching Set named RMS: The GuideMS obtained in formula (5) is judged for the motion consistency in a small range \(\gamma \), and all matching pairs that meet the conditions are retained as the robust matching set RMS, and described by (6):

$$\begin{aligned} RMS=GuideMS \leftarrow S_{mn}>\gamma \times {\tilde{N}} \end{aligned}$$
(6)

where \(\leftarrow \) is the verification operation, \(S_{mn}\) is the neighborhood score of the matching pair {\(m_j,n_j\)}, \({\tilde{N}}\) is the feature reference value in the neighborhood, \(\gamma \) is the threshold parameter.

As shown in Fig. 5a, the matching pair before diffusion has higher precision, but it is usually only concentrated in a certain area of the image. After guided matching, we obtained GuideMS, a guided matching set widely scattered on the image. However, due to the point-to-line epipolar constraint, the error rate of the matching pair is very high, as shown in Fig. 5b. After a small range of motion consistency constraints, we improved the precision of feature matching while ensuring the dispersion of matching, as shown in Fig. 5c (red area).

Fig. 5
figure 5

Comparison chart before and after diffusion

The EMC+GD_C algorithm in this paper is shown in Algorithm 1.

Algorithm 1
figure a

EMC+GD_C: Circle-based Enhanced Motion Consistency and Guided Diffusion Feature Matching for 3D Reconstruction.

Experiments

To verify the effectiveness of our method, we performe it on the datasets of Strecha et al. [39] and the 3D reconstruction datasets provided by the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA) [40]. The introduction of the dataset is shown in Table 1. We compare the SIFT-based ratio test (SIFT [9, 10]+ratio test) and GMS [23, 24] algorithm respectively, in particular, in order to see the advantages of the circle-based neighborhood partition method in this paper, we compare the grid-based enhanced motion consistency and guided diffusion feature matching algorithm named EMC+GD_G. That is, we keep our algorithm steps, and only replace the circle-based neighborhood division method in this paper with the grid-based neighborhood division method in GMS [23, 24]. Eight experiments are conducted: (1) evaluation IMS; (2) evaluation DMS: (3) evaluation RMS; (4) feature matching precision (P) comparison; (5) feature matching number comparison; (6) feature matching time comparison; (7) quantitative evaluation; (8) integrate our method into the VisualSFM system [41] and use Yasutaka Furukawa’s PMVS/CMVS toolchain for dense reconstruction. The experimental environment is a computer with an Intel (R) Core (TM) i9-9900X processor, Dell GeForce RTX 2080TI graphics card.

Table 1 Dataset introduction
Fig. 6
figure 6

A comparison chart of IMS. The number of matches is shown in brackets. a SIFT [9, 10]+ratio test; b GMS [23, 24]; c EMC+GD_G; d EMC+GD_C (ours)

Fig. 7
figure 7

A comparison chart of DMS. The number of matches is shown in brackets. a SIFT [9, 10]+ratio test; b GMS [23, 24]; c EMC+GD_G; d EMC+GD_C (ours)

Fig. 8
figure 8

A comparison chart of RMS. The number of matches is shown in brackets. a EMC+GD_G; b EMC+GD_C (ours)

Evaluate IMS

In this section, we evaluate the IMS. The experimental results of some image pairs are shown in Fig. 6, where the yellow box represents the overlapping areas of the image pairs, and the red box represents the mismatch caused by similar or repetitive structures. The IMS obtained by SIFT [9, 10] after nearest neighbor matching (the default threshold is 1.5) is shown in Fig. 6a, and the matching result is seriously unstable. Figure 6b shows the matching results obtained by GMS [23, 24] using the default threshold of 6, and a large number of false matches are obtained in similar or repeated regions. Figure 6c and d are the IMS results obtained by EMC+GD_G and EMC+GD_C algorithms using \(\beta \)=11, respectively. As can be seen from the figure, our method reduces the false matches obtained from similar or repeated regions through the idea of EMC, and although the number of matches is relatively reduced, it has higher accuracy than other tested algorithms. In addition, our method obtains more IMS than EMC+GD_G, which benefits from the circle-based neighborhood partitioning method in this paper. Therefore, our method is more favorable for matching optimization.

Evaluate DMS

In this section, we evaluate the DMS. The matching results of some image pairs are shown in Fig. 7, where the yellow box represents the overlapping area of the image pair, the red box represents the mismatch caused by the similar or repetitive structure, and the red dot represents the abnormal value filtered out after RANSAC. Figure 7a is the SIFT-based ratio test (SIFT [9, 10]+ratio test) algorithm, and a DMS containing a large number of false matches is obtained. Figure 7b shows that GMS [23, 24] is optimized by RANSAC and has a certain number of mismatches (red box), most of which are caused by similar or repetitive structures. Figure 7c and d achieve relatively high-precision matches, albeit with a drop in the number of matches. Due to the high precision of IMS in this paper, our method also exhibits high precision on DMS, and thanks to the circle-based neighborhood division, our method obtains more matches than EMC+GD_G.

Evaluate RMS

In this section, we evaluate the RMS. Some image pairs are shown in Fig. 8, where the yellow box represents the overlapping area of the image pair, and the red dots represent the outliers filtered by the RANSAC algorithm. The comparison diagram of EMC+GD_G algorithm before and after diffusion is shown in Fig. 8a, where \(\gamma \)=6, and the comparison diagram of our EMC+GD_C algorithm before and after diffusion is shown in Fig. 8b, where \(\gamma \)=6. Matching pairs before diffusion are easily concentrated in a local area of the image, and matching pairs after diffusion are more evenly distributed. In addition, compared with the EMC+GD_G algorithm, our method has more matching pairs after diffusion, and the distribution is more uniform.

Feature matching precision (P) comparison

The calculation of feature matching P in this paper uses the calculation formula proposed by Mikolajczyk et al. [38]:

$$\begin{aligned} 1-P=\frac{F}{T+F} \end{aligned}$$
(7)

where T is the number of correct matches; F is the number of incorrect matches.

Parametric analysis

In this section, we analyze the parameters \(\beta \) and \(\gamma \) used in the experiment on eight different data sets [39, 40].

Fig. 9
figure 9

Comparison of P and quantity of IMS under different parameters \(\beta \)

Fig. 10
figure 10

Comparison of P and quantity under different parameters \(\gamma \)

The comparison chart of IMS P and quantity when \(\beta \) takes different values is shown in Fig. 9. It can be seen from the figure that with the increasing of \(\beta \), the number of feature matching shows a slow downward trend, and the P of IMS shows a rapid upward trend. Until \(\beta \) increases to 11, P basically reaches more than 95%, and the subsequent growth is slow. Therefore, considering the P and the number of feature matching, the final \(\beta \) value is selected as 11.

When performing GD, it is necessary to perform a motion consistency constraint in a small range \(\gamma \) to reduce the mismatch caused by the epipolar constraint. When \(\gamma \) takes different values, the P and quantity comparison chart of the EMC+GD_C algorithm in this paper are shown in Fig. 10. As can be seen from the figure, with the increase of \(\gamma \), the number of feature matching shows a slow decline while P keeps increasing. When the value of \(\gamma \) is 6–7, the P begins to show a rapid upward trend, especially for the ‘castle’ dataset. Although the P still has an upward trend with the increase of \(\gamma \) value, the increase is relatively slow, and the average number of feature matches decreases with the increase of \(\gamma \) value. Therefore, considering both the P and the number of feature matching, the final \(\gamma \) value range is 6–7.

The P comparison of different matching methods

In this section, we compare the feature matching P of different matching methods on 8 sets of 908 pairs of adjacent image data sets [39, 40]. As shown in Fig. 11, our method (using thresholds \(\beta \)=11 and \(\gamma \)=6) and the EMC+GD_G algorithm have a higher P than the SIFT-based ratio test algorithm and the GMS algorithm, and our method is the highest. In addition, the average P of the SIFT-based ratio test, GMS algorithm, EMC+GD_G algorithm and our method is about 73.75%, 88.64%, 95.88% and 97.82%, respectively. It can be seen that our method is 24.07% higher than the SIFT-based ratio test algorithm, 9.18% higher than the GMS algorithm, and 1.94% higher than the EMC+GD_G algorithm.

Fig. 11
figure 11

Comparison chart of P of different matching methods

Fig. 12
figure 12

Comparison chart of number of different matching methods

Feature matching number comparison

In this section, we compare the number of feature matching for different matching methods on the adjacent image data sets [39, 40] in 908. In Fig. 12a, we compare the SIFT-based ratio test (SIFT [9, 10]+ ratio test), GMS [23, 24], EMC+GD_G, and our method (EMC+GD_C), respectively. As can be seen from the Fig. 12, although the SIFT-based ratio test (SIFT [9, 10]+ ratio test) and GMS [23, 24] algorithm obtained more matching pairs than our method, the matching P is relatively low (see Fig. 11). In addition, our method and the EMC+GD_G algorithm have obtained a large number of feature matches while maintaining a high P, and as shown in Fig. 12b, our method has obtained more matching pairs than the EMC+GD_G algorithm.

Computational complexity

In this section, we compare the computational complexity for different matching methods on the adjacent image data sets [39, 40] (a total of 8 groups of 908 pairs). The SIFT-based ratio test (SIFT [9, 10]+ ratio test), GMS [23, 24], EMC+GD_G, and our method (EMC+GD_C) are compared in Table 2. As can be seen from the table, the GMS [23, 24] algorithm takes the shortest time, followed by the EMC+GD_G algorithm, and then the SIFT-based ratio test (SIFT [9, 10]+ratio test) algorithm, our EMC+GD_C algorithm takes the longest time.

Table 2 Computational complexity of different matching methods

In summary, our method has the following advantages over the EMC+GD_G algorithm: (1) higher P; (2) more feature matching numbers; (3) more uniform distribution. However, the computational complexity is relatively high, as shown in Table 3.

Table 3 EMC+GD_G and EMC+GD_C algorithm comparison
Fig. 13
figure 13

SP comparison chart

Fig. 14
figure 14

Dense reconstruction of three scenes. a Input image pair; b SIFT [9, 10]+ratio test; c GMS [23, 24]; d EMC+GD_G; e EMC+GD_C (ours)

Quantitative evaluation

In this section, we verify the data set of Strecha et al. [39], using 91 pairs of sequence images as the test set. The ground truth of this data set includes the projection matrix \(P'\) and the camera internal parameter K. The rotation matrix \(R_1\) and the translation matrix \(t_1\) can be decomposed by formula (8):

$$\begin{aligned} P'=K[R_1 \quad t_1] \end{aligned}$$
(8)

The essential matrix E can be obtained from the fundamental matrix F2 by formula (9):

$$\begin{aligned} F2= K^{-T} \times E \times K^{-1},\quad E= K^T \times F2 \times K \end{aligned}$$
(9)

The essential matrix E can be decomposed into the rotation matrix \(R_2\) and the translation matrix \(t_2\) (see formula (10)):

$$\begin{aligned} E= [t_2] \times {R_2} \end{aligned}$$
(10)

Since the translation matrix t is equal when it differs by a scale factor, we do not use an absolute quantity but convert it to the form of a vector, that is, the angular error value \(x^o\) of rotation and translation is used as a variable. Pose estimators often give very incorrect solutions (or crashes) when they fail, so the average error is meaningless. To avoid this situation, we use the success percentage SP [22] (see formula (11)):

$$\begin{aligned} SP(x)=\frac{\sum \nolimits _{i=1}^{N-1}(e_i(R/t)\le x^o)}{N} \end{aligned}$$
(11)

where N represent the number of all image pairs, \(e_i(R/t)\) represents rotation error \(e_i(R)\) or translation error \(e_i(t)\) between the ith image and the ith plus 1 image, which are respectively expressed as formula (12):

$$\begin{aligned} e_i(R) =\vert R_{i1}-R_{i2}\vert ,\quad e_i(t) =\vert t_{i1}-t_{i2} \vert \end{aligned}$$
(12)

where \(\vert \cdot \vert \) is the absolute value operation, i is the ith image, \(R_1\) and \(t_1\) are obtained in formula (8), \(R_2\) and \(t_2\) are obtained in formula (10).

In Fig. 13a and b, we compare the rotation SP and translation SP of the SIFT-based ratio test (SIFT [9, 10]+ ratio test), GMS [23, 24], EMC+GD_G algorithm, and our EMC+GD_C algorithm, respectively. It can be seen from the Fig. 13a and b, the curve drawn by the threshold \(x^o\) of the rotation or translation error and the SP is a non-decreasing curve, and the smaller the error value, the higher the precision of the pose estimation. Therefore, the \(1^o\) threshold is a value we are more concerned about. Our method has a relatively higher SP in both rotation and translation, especially for image pairs with repeated textures.

Since the rotation SP and translation SP comparison graphs are both non-decreasing curves, to comprehensively see the superiority of our method, we take the average value of the rotation error and the translation error as the threshold of the pose estimation error (see Fig. 13c). It can be seen from the Fig. 13c, our method has a higher SP than other test algorithms in pose estimation, especially for image pairs with repeated textures.

Comparisons of 3D reconstruction results

In this section, we discuss the application of our method in 3D reconstruction and verify it on the adjacent image data sets [39, 40] in 908. The main application reconstruction system is VisualSFM [41]. Some of the dense reconstruction results are shown in Fig. 14a is the input image sequence, Fig. 14b is the point cloud reconstructed by the VisualSFM [41] system (SIFT is used by default), which is sparse and easily interrupted. The reasons are as follows: (1) In a 3D scene with large parallax, there are not enough matching points, resulting in inaccurate model fitting, and the same scene is mistaken as a different scene, which will cause the scene to be interrupted; (2) when the number of correct matching points is too small, the point cloud will be sparse. Compared with Fig. 14b, GMS [23, 24] in Fig. 14c obtained more point clouds after dense reconstruction, but the P of the matching pair is not high, and the model estimation is inaccurate, resulting in noisy point clouds and model distortion. Compared with Fig. 14b and c, the EMC+GD_G algorithm in Fig. 14d improves the reconstruction quality. In Fig. 14e, our EMC+GD_C algorithm has obtained better point cloud quantity, reconstruction quality, and reconstruction effect.

Conclusions and limitations

In this paper, we present a circle-based enhanced motion consistency and guided diffusion feature matching algorithm (EMC+GD_C) for 3D reconstruction. We divide the neighborhood by drawing circles to increase the number of feature matches. The proposed EMC idea efficiently improves the precision of the IMS. A DMS and a more precise model are obtained based on the RANSAC. By combining guided matching and motion consistency, the idea of GD is proposed, which expands the distribution range of feature point matching. Experiments demonstrate that our method has better performance in the number, precision, and distribution of feature matching, and can achieve better reconstruction effects in the number of point clouds, reconstruction quality, and reconstruction stability.

This paper has gradually improved the quality of feature point matching through three steps of initial matching, reliable matching and robust matching, but it will increase the time complexity of the algorithm. In addition, this paper divides the neighborhood by drawing circles, which increases the number of initial matching sets, but experiments show that it will increase the time complexity of the algorithm. Therefore, in the future, we will consider further optimizing the algorithm by introducing the idea of ‘parallel’ or ‘integration’ to reduce the complexity of the algorithm.