1 Introduction

Image registration is a crucial preprocessing step for many subsequent image analysis techniques, such as image mosaicing, change detection, digital elevation model generation and map updating, etc.

The existing automatic image registration methods can be broadly divided into two categories, i.e. area-based methods and feature-based methods [1]. Area-based (or template matching) methods compare similarities between image regions using different measures, such as normalized cross correlation (NCC) [2] and mutual information (MI) [3]. Though area-based methods can attain sub-pixel registration accuracy, they generally can’t be applied to the registration task of remote sensing image pairs directly when large rotation or scaling changes exist. Comparing to area-based methods, feature-based methods are more robust to image variations. Feature-based methods extract salient image structures for matching. In particular, point feature is one of the most important groups of feature. Aanæs et al. [4] investigated the performance of some state-of-the-art point feature detectors and found that simple Harris detector performs very well for small-scale changes while scale invariant feature transform (SIFT) detector is superior to others when considering large-scale changes.

For high-resolution remote sensing image registration, a major difficulty is significant local geometrical distortions [5], which are caused by height displacement and/or platform instability. To overcome these problems, Brown [6] recommended extracting a large number of CPs to applied piecewise mapping between images. For instance, in [7] and [8], Harris detector was used to generate a dense set of CPs after an initial matching stage based on SIFT. In [9], NCC was applied to obtain uniformly spread CPs after the same preliminary stage.

The distribution of CPs also has significant impact on the estimation of transform model [10]. Some approaches have been reported to achieve spatially well-distributed CPs. The strategies employed by those methods can be divided into two categories. The first strategy is to constrain the distribution of interest points in the feature detection step. Brown et al. [11] introduced an adaptive local non-maximal suppression algorithm to select specified number of Harris corner points. Sedaghat et al. [12] proposed the uniform robust (UR) –SIFT, which introduced a grid division step to SIFT and only selected the strongest few keypoints in each grid in both scale and image space. However, though SIFT detector can extract a high number of keypoints in both input and reference images, most of them can’t be correctly matched. Reducing the extracted features may result in even fewer matches [13]. The second strategy is to select a subset of well distributed matched CPs. Fonseca et al. [14] suggested clustering the aligned CPs into some clusters and maintaining the strongest one in each cluster. Since the clustering based method was very time-consuming, a faster alternative CP dispersion method was recommended by Fonseca et al. This method divides the input image into subimages and selects the strongest matches within each subimage. An improved approach was proposed by Wang et al. [15] that used least squares iterative fitting method to eliminate false matches in each subimage.

In this letter, a novel CP dispersion method is proposed based on theoretical analysis of the relationship between aligning errors in CPs and parameter estimation. The proposed algorithm utilizes a greedy approach to select high-precision and well-distributed CPs based on a modified minimum spanning tree algorithm. In addition, a coarse-to-fine matching strategy is employed to acquire adequate CPs for the registration of images with local distortions. Coarse registration is implemented to get an initial transform model between input and reference images using SIFT. Then in fine registration step a large number of aligned CPs are obtained with the combination of Harris and NCC. Finally, CP dispersion is carried out to select “good” CPs from them.

The remainder of this paper is organized as follows. In Sect. 2, the proposed CP dispersion method is presented. Then the coarse-to-fine matching approach is described in Sect. 3. Section 4 presents the experiments and discussion, whereas conclusions are drawn in Sect. 5.

2 Control Point Dispersion

To figure out how the aligning errors contained in the CPs may affect the final registration result, some previous theoretical analysis is introduced at first.

Suppose the transformation between two images is only rotation with an angle of \( \theta \), let \( {\mathbf{x}}_{1} \) and \( {\mathbf{x}}_{2} \) be the accurate coordinates of two CPs in the input image and suppose \( {\mathbf{X}}_{1} \) and \( {\mathbf{X}}_{2} \) are their correspondences in the reference image. Then

$$ \cos \left( \theta \right) = \frac{{\left( {{\mathbf{x}}_{2} - {\mathbf{x}}_{1} } \right)}}{{\left\| {{\mathbf{x}}_{2} - {\mathbf{x}}_{1} } \right\|}} \cdot \frac{{\left( {{\mathbf{X}}_{2} - {\mathbf{X}}_{1} } \right)}}{{\left\| {{\mathbf{X}}_{2} - {\mathbf{X}}_{1} } \right\|}} $$
(1)

where \( \left\| {} \right\| \) donates vector norm and \( \cdot \) donates inner product. Assuming that the CP pairs \( \left( {{\mathbf{x}}_{1} ,{\mathbf{X}}_{1} } \right) \) and \( \left( {{\mathbf{x}}_{2} ,{\mathbf{X}}_{2} } \right) \) are detected with errors \( \left( {d{\mathbf{x}},d{\mathbf{X}}} \right) \), Fonseca and Kenney [14] have proven that perturbation in \( \cos \left( \theta \right) \) varies to first order with the errors divided by the interpoint distances (shown in (2)):

$$ d\left( {\cos \left( \theta \right)} \right) \le 2\left( {\frac{{\left\| {d{\mathbf{x}}} \right\|}}{{\left\| {{\mathbf{x}}_{2} - {\mathbf{x}}_{1} } \right\|}} + \frac{{\left\| {d{\mathbf{X}}} \right\|}}{{\left\| {{\mathbf{X}}_{2} - {\mathbf{X}}_{1} } \right\|}}} \right) $$
(2)

Moreover, they claimed that similar result produced when calculating the scaling parameter between images.

According to the previous discussion, it can be seen that the distribution of CPs has significant impact on the estimation of transformation parameters. In the case of two CPs, the estimation error of transformation parameters is approximately linearly correlated with the aligning errors of CPs divided by the distance between them.

Consequently, assuming that adequate matched CPs have been obtained, the aim of CP dispersion is to select a subset of them with the highest accuracy. Meanwhile, the distance between any two CPs is larger than a certain size. If a constant distance threshold is adopted to exclude CPs, many high-precision ones may be discarded, let alone the risk of too sparse CPs. Taking into account of the relationship described in Eq. 2, we suggest using the aligning error of each CP as weight to adjust a constant interpoint distance threshold.

Supposing an initial transform model \( {\mathbf{H}} \) is estimated using all the CPs (in this study, quadratic polynomial model is adopted for its simplicity), the aligning error of the ith CP is computed by (3) and the adaptive distance threshold is defined in (4)

$$ e_{i} = \left\| {{\mathbf{Hx}}_{i} - {\mathbf{X}}_{i} } \right\| $$
(3)
$$ T_{{d_{i} }} = e_{i} T $$
(4)

where \( T_{{d_{i} }} \) represents the minimum Euclidean distance from the ith CP to all the selected CPs. \( T \) is a constant, which denotes the ‘base’ interpoint distance. It is determined empirically according to the excepted number of CPs: the larger \( T \) is, the less CPs are retained. If \( e_{i} \) is zero, \( T_{{d_{i} }} \) is also equal to zero and thus the ith CP is ought to be selected. In the meantime, for a CP located distant from others, even if its aligning error is a little larger, it could be selected as well.

A novel CP dispersion method based on the proposed adaptive distance threshold is proposed. Inspired by the minimum spanning tree (MST) problem in graph theory, a simple and fast greedy approach revised from Prim algorithm is designed. During the algorithm, we maintain an output tree which stands for a group of CPs already selected. At the beginning, the tree is empty. At each iteration, we choose a new CP and determine whether to add it to the output tree. The proposed approach is described as follows.

  • Step 1. First, compute the aligning errors \( \left\{ {e_{i} } \right\},\,i = 1,2, \ldots ,N \) and adaptive distance thresholds \( \left\{ {T_{{d_{i} }} } \right\} \) of all the CPs, where N stands for the total number of CPs. Then sort the CPs in ascending order according to their errors. Let j be the index of the first unselected CP, which is set to 1.

  • Step 2. Add the jth CP to the output tree. Calculate and record the distance between each of the other CP and the jth CP. Set j = j +1.

  • Step 3. Move to the jth CP. If its distance to the output tree is less than \( T_{{d_{j} }} \), it is regarded as ‘disconnected’ with the output tree. Then set j = j +1 and go on to step 4 directly. Otherwise, this CP is viewed as ‘connected’ and is added to the output tree. Then calculate the distances from the remaining CPs to the newly added one. For each remaining CP, if its distance to the jth CP is less than its recorded distance, update the distance to be equal to the smaller one. Otherwise, keep it unchanged. Let j = j +1.

  • Step 4. Repeat Step 3 until all the CPs have been traversed.

3 Registration with Control Point Dispersion

We employ a coarse-to-fine matching strategy to match images with large local distortions. The proposed approach can be summarized as three stages, i.e. pre-registration, dense CP generation and CP dispersion, described in more detail below.

  • Stage 1. Pre-Registration

In this study, the same pre-registration process based on SIFT algorithm is carried out [7]. First, SIFT feature detection is conducted in both input and reference images to extract candidate keypoints. Then a 128-dimension vector is assigned to each keypoint based on its neighbor subregions. After SIFT descriptors are generated, the nearest neighbor distance ratio matching strategy is used to find potential correspondences. Finally, RANdom Sample Consensus (RANSAC) is adopted to remove outliers. Then the founded CPs are used to estimate an initial homography matrix. The detail of SIFT algorithm is discussed in [16].

  • Stage 2. Dense CP Generation

The CPs acquired in the first stage is often insufficient for the requirement of image registration. To obtain more CPs, Harris corner detection is conducted in the input image. For each corner point, a template matching procedure is used to search for its correspondence in the reference image. We adopt NCC as the similarity measure. Assuming a template window is defined in the input image, center of the window is the Harris corner point. Each template window is mapped to a corresponding window in the reference image. NCC similarity measure is defined as

$$ \rho \left( {x,y} \right) = \frac{{\sum\limits_{x} {\sum\limits_{y} {\left( {I\left( {x,y} \right) - \bar{I}} \right)\left( {I'\left( {x',y'} \right) - \bar{I}'} \right)} } }}{{\sqrt {\sum\limits_{x} {\sum\limits_{y} {\left( {I\left( {x,y} \right) - \bar{I}} \right)^{2} } } } \sqrt {\sum\limits_{x'} {\sum\limits_{y'} {\left( {I'\left( {x',y'} \right) - \bar{I}'} \right)^{2} } } } }} $$
(5)

where \( I\left( {x,y} \right) \) and \( I'\left( {x',y'} \right) \) are gray values in the template window and the corresponding window respectively. \( \bar{I} \) and \( \bar{I}' \) are the mean gray values in the two windows. The corresponding window is moving within a searching radius in the reference image. When the maximum value of NCC is achieved and larger than \( T_{n} \), the center of the corresponding window is treated as a correspondence. In this study, the radiuses of the template window and the search window are set to 15 and 21 respectively. \( T_{n} \) is set to 0.8.

To achieve sub-pixel precision in NCC, we adopted the correlation interpolation method, which uses bi-cubic convolution to refine the locations of the CPs. A two-dimensional cubic convolution of the correlation coefficients is applied to the neighbor 4 × 4 pixels of each correspondence. The peak is set as its refined location. The advantage of this method is that it can achieve desirable result with a relatively lower computational amount comparing to the other sub-pixel approaches [2].

  • Stage 3. Control Point Dispersion

Once enough CPs have been obtained, their aligning errors are re-examined to eliminate false matches using least squares iteration fitting. Then the proposed CP dispersion is employed to select well-distributed ones. Finally, the transform model is re-estimated. Some previous studies suggested using non-rigid transformation for the registration of images with large local distortions [7, 8], which is beyond the scope of this paper.

4 Experiments and Discussion

In the experiments, we applied the coarse-to-fine matching algorithm to high-resolution optical satellite images with simulated distortions and real local distortions. The performance of the proposed CP dispersion method is also compared with the correspondence error checking (CEC) method suggested in [2] and the subdivision method [8]. CEC is a traditional and commonly used method for CP exclusion which iteratively discards CPs with the largest aligning errors and re-estimates the transform model until either all the CPs’ RMSEs are less than a threshold (0.5 pixel in this experiment) or the number of CPs is less than three. Subdivision method is the foundation of a group of existing CP dispersion methods. The main difference between subdivision method and the proposed method is that the latter uses adaptive distance threshold. For subdivision method, the expected number of subimages was set to 256, and for the proposed method the base distance \( T \) was set to 20 pixels in our experiment.

To assess the performance quantitatively, the RMSE of the test points (TPs) is used to evaluate the registration accuracy. The TPs were uniformly generated or manually selected spread out the whole image. In addition, the number of CPs is also considered as an important measure.

4.1 Images with Simulated Distortion

Two SPOT-5 panchromatic images (geometrically and radiometrically preprocessed at Level 2B in WGS 84) taken on October 1, 2002 and October 16, 2007 were used as test dataset, which have been precisely aligned. Two pairs of test images of size 512 × 512 pixels were clipped from both images [see Fig. 1(a), (b)]. In order to simulate local image distortion, we transformed one image in a pair by the following sinusoidal function:

Fig. 1.
figure 1

Test image pairs. (a), (b) SPOT-5 images taken on 2002 (left) and 2007 (right). (c) SPOT-5 images taken on 2006 (left) and 2007 (right). (d) ZY1-02C images taken on 2013 (left) and 2012 (right).

$$ \begin{aligned} x' = x - 2\sin \left( \frac{y}{32} \right) \hfill \\ y' = y + 2\sin \left( \frac{x}{32} \right) \hfill \\ \end{aligned} $$
(6)

where \( \left( {x,y} \right) \) and \( \left( {x',y'} \right) \) are pixel locations in the original image and the transformed image respectively. 16 × 16 TPs evenly spaced in the reference image were generated. Their correspondences in the transformed image were computed by (8).

After the dense CP generation stage, 282 pairs of CPs were found in dataset 1 and 198 pairs were found in dataset 2. The final experiment results are shown in Table 1, from which we can see that the proposed method clearly outperformed its competitors with the lowest RMSEs and moderate number of CPs. Note that in dataset 2, the RMSE of CEC was extremely large. It was because that all the CPs located in the homogenous farmland area were pruned to minimize the least square error, which resulted in serious CP concentration.

Table 1. Performance comparison on images with simulated local distortion

4.2 Remote Sensing Images with Local Distortion

To further demonstrate the feasibility of the proposed method in real applications, two representative image pairs with significant terrain relief were chosen. One was located in the mountainous region and the other was in densely built city area. In dataset 1, two SPOT-5 panchromatic images were captured on December 1, 2006 and March 20, 2007, with sizes of 554 × 640 and 710 × 905 respectively [see Fig. 1(c)]. In dataset 2, two ZY1-02C panchromatic images taken on August 18, 2013 and October 24, 2012, with sizes of 684 × 703 and 787 × 865 respectively were chosen [see Fig. 1(d)]. 20 TPs were selected manually for each dataset.

After the dense CP generation stage, 367 CPs were found in dataset 1 and 653 were found in dataset 2. The performance results were listed in Table 2, from which we can see that the registration accuracy of the proposed method is higher than those of the other methods. In addition, it also provided sufficient number of CPs for rigid or non-rigid transformation estimation. Figure 2 demonstrated the CP location in both cases. We can see that the CPs selected by CEC were rather spatial concentrated in Fig. 2(b), since it pruned CPs by looking only at their aligning errors. Though it eliminated most of the inferior matches, some correct CPs located in the area with local distortions were also removed. On the contrary, subdivision method over pruned some CPs with high precision by only retaining the strongest one in each subimage, as shown in Fig. 2(c). It should be noticed that its registration accuracy in case 2 is even lower than CEC, which may due to its excessive pursuit of “even distribution” at the cost of allowing some low-precision CPs. In contrast, the proposed method tended to select more high-precision CPs in the residential area with complex texture while also retained some CPs located in the mountainous regions and cultivated lands at the same time.

Table 2. Performance comparison on images with real distortion
Fig. 2.
figure 2

CPs selected by different approaches in case 1 (above) and case 2 (below). (a) The proposed method. (b) CEC method. (c) Subdivision method.

5 Conclusion

An efficient approach for CP dispersion is proposed in the paper for remote sensing image registration. This method balances the impact of distances between CPs and their aligning errors by using adaptive distance thresholds to select high-precision and spatially well-distributed CPs. In addition, a coarse-to-fine registration approach is introduced to achieve registration result robust to local image distortion. Experimental results for images with simulated distortion and real local distortions show that proposed method significantly improves the matching accuracy, which contribute to the higher accuracy of subsequent image stitching and analysis.