Introduction

Image registration is a complex task in image processing, which refers to match different images of the same scene taken at multiple times, in multiple viewpoints or with multiple sensors [21, 22]. Remote sensing image registration methods proposed in literature consist of two categories: feature-based registration methods and intensity-based registration methods. The feature-based registration method extracts the prominent features of an image, which is considered as control points. In general, the point is the shape contours represented by edge points, or point features represented by the positions of interest points. The registration problem is to find the correct correspondence between two sets of points extracted from the input data and restore the underlying spatial mapping at the same time, which can also be considered a point set registration problem. The widely used solutions, such as scale invariant feature transform (SIFT) [19, 20], speeded up robust features (SURF) [2], and shape context (SC) [3], are still hot spots in image registration. The second is intensity-based registration method [7, 27]. In this method, the intensity of images is used as a measure of similarity between images. Intensity-based method usually contains two steplike components: similarity measurement and algorithm optimization. Similarity measurement is a key step in intensity-based registration method and appropriate similarity measurement directly affects the registration results. Many similarity measurements have been proposed, such as sum of squared differences (SSD) [9], cross correlation (CC) [44], normalization mutual information (NMI) [4].

The evolutionary algorithm is easy to find the best corresponding point or transformation parameter from image registration problem, which derived many other important algorithms such as genetic algorithm (GA) [1], particle swarm optimization (PSO) [11], and differential evolution algorithm (DE) [6]. Evolutional algorithm is able to solve various problems with improved strategies like mixed-variable problem [37] and nonlinear programming [35], and dynamic multi-objective optimization problems [36] with improved Prediction strategies. Several improved algorithms were proposed to get the best multi-modal problems. Qu et al. [25] put forward a neighborhood mutation strategy which is also called niche technology, and proposed CDE [29], SDE [18], NCDE [25], NSDE [25] etc. by combining them with DE to solve multi-modal problems. ADE and SaEPSDE were proposed by Lacca et al. [10] besides NSHDE [25] are a multi-strategy DE, which improves the population diversity by using multiple evolution strategies. Li et al. [13] proposed R2PSO, R3PSO, R2PSOLHC, R3PSOLHC and Ye et al.  [41] proposed MO-Ring-PSO-SCD, and all these algorithms used the PSO algorithm with ring topology to divide subpopulation, maintaining the population diversity. Wang et al. [34] proposed an enhance DE with niching technique and adaptive learning strategy can be used to enhance the exploration ability of the algorithm.

To seek and locate multiple optimal solutions, niche technology [24, 31] is used in the multi-modal optimization problems. Niche technology is to divide the population of each generation into several sub-populations, the crossover and mutation in the sub-populations or among different sub-populations produce a new generation of populations. At the same time, the selection strategies of pre-selection, exclusion, and sharing are used to select individuals that will be retained to the next generation. The evolutionary algorithm combined with niche technology can maintain population diversity and has a high global optimization capability and convergence speed. Hence, it is especially suitable for solving MMOP problems. In the last decades, lost of scholars have proposed many technologies about niches, including crowding [32], speciation [12], fitness sharing [5], valleys [33], and clustering [25, 42] etc. The main process is to divide the entire population into several niches and seek optima in each niche. For the simple and effective of DE, several improved DE combined niche technology were proposed in recent years. Wang et al. [39] proposed a niche DE based on the minimum spanning tree (MSTDE). In this method, the MST in each iteration is constructed based on the distance information between individuals, and the population is divided into several niches based on the M maximal weighted edges of MST. Besides, a dynamic pruning ratio (DPR) strategy is used to determine the size of the niche M to improve the performance of the algorithm. Experiments show that the performance of MSTDE is better than other state-of-art multi-modal optimization algorithms when evaluated on the benchmark test functions from CEC2013. Poláková et al. [23] proposed an adaptive method of population size during population evolution in the same year. This method dynamically reduces or increases the size of the population during the evolution process by detecting changes in the diversity of the population. Experimental results show that DE with this new self-adaptive variant has greatly improved the efficiency of the algorithm and is proved effective in optimizing more complex problems such as multi-modal, mixed, or combined problems. Zhao et al. [45] proposed a DE algorithm based on local binary pattern(LBPADE). LBP can extract relevant information among individuals, and find multiple optima in MMOP. Zhao et al. [45] proposes an adaptive algorithm based on LBP, which uses local binary operators to find multiple optima and divides niches based on these optima. In addition, to improve the exploration and exploitation capabilities of the algorithm, the mutation strategy and parameter strategy of the algorithm are improved, hence, niche and global interaction (NGI) mutation strategy and adaptive parameter strategy (APS) are proposed. The NGI mutation strategy uses the information of the niche and the global space to further explore the current search space, while the APS adjusts the parameters according to the LBP information of individuals to let individuals close to the optima. Results on the MMOP test functions show that the performance of LBPADE is superior to most state-of-art algorithms.

To improve the efficiency, some novel methods are employed for registration. Knowledge fusion, developed from information fusion, is a process where knowledge from different sources interacts to form new knowledge [28, 43]. It is commonly used in many engineering areas [26]. It contains the process of abstracting, summarizing, and classifying real-world information and raising them to the aspect of knowledge. Knowledge fusion is applied to enhanced the differential evolution algorithm to get the best solution [38].

To improve the performance of the DE, a novel algorithm, based on niche technology, three different strategies are designed according to three knowledge of the population, are proposed. Besides, a multi-modal DE algorithm (SGD-DE) with a species gradient descent method is proposed. To evaluate the performance of our proposed algorithm, UCAS-ADO dataset and the remote sensing image of Guangxi University as registration image data. Experimental results show that the algorithm proposed in this paper can accelerate the convergence speed of DE and has better performance than traditional DE and other multi-modal optimization algorithms in terms of solution quality and success rate.

The rest of this paper is organized as follows. In Related work section, image registration and optimization techniques are discussed. In Knowledge fusion based DE section, the proposed Knowledge fusion based DE for image registration is presented. Experimental results are described in Experiment section. Finally, conclusions are drawn in Conclusion section.

Fig. 1
figure 1

Left is Guangxi University in 2011, and right is Guangxi University in 2013

Related work

Image registration problem

Image registration refers to register two or more images collected at different times, in different spaces, and with different equipments into a clear image with objects all focused. which is denoted as

$$\begin{aligned} T=\underset{T}{{\text {argmin}}}(E(R-T \otimes S)), \end{aligned}$$
(1)

where T is the affine transformation parameter of the image, \(T \otimes S\) represents the affine transformation operation performed on the image S, T is the affine transformation matrix. S is the image to be registered, and R is the sample image. \(E(R-T \otimes S)\) represents the similarity measurement function of the two images. According to different functions, it can be divided into two cases: maximum problem and minimum problem. For example, the higher the image similarity, the larger the function value of MI and NMI [8], but DTV [16] is the opposite. The purpose of image registration is to find the transformation parameter that optimizes the similarity measurement function between images. DTV [16] is an optimal similarity measurement method based on the gradient domain [17] of total variation. DTV is designed as a similarity measure to match the edges of two images. When the edge features of the two images correspond, DTV can be expressed by calculating the image gradients. The size and position of the image gradient should be similar, and the gradient of the residual image formed after registration should be more sparse because any registration error may produce ghost images and increase the sparsity of the residual image. The DTV function used for image registration is expressed as:

$$\begin{aligned} \min _{T}(E(T))=\Vert \nabla R-\nabla S(T)\Vert , \end{aligned}$$
(2)

where \(\nabla R=\sqrt{\left( \nabla _{x} R\right) ^{2}+\left( \nabla _{y} R\right) ^{2}}\) represents the image gradient in the two spatial directions,\(\nabla _{x} R\) and \(\nabla _{y} R\) represents the positive finite-difference operator of x and y coordinates. DTV is one of the most advanced methods for processing image registration, it has lower computational complexity and is more accurate and robust than MI and NMI. Figure 2 shows two images at the same location and different time in Fig. 1. After slightly changing the transformed parameters, it can be seen that the fluctuation of DTV is greater than the fluctuation of NMI and MI. That is to say, DTV behaves more sensitive than the other two similarity functions for slight changes in transformed parameters.

Fig. 2
figure 2

The changes of the DTV value and the NMI value according to different rotation angles of the two overlapping pictures in Fig. 1

From Fig. 2, it can be seen that one-dimensional DTV functions have multiple local optima, but the dimension in practical application is greater than or equal to three. Therefore, image registration is a multi-modal problem. The differential evolution algorithm based on niche technology solve the problem efficiently.

Differential evolution algorithm

DE [30, 40] is a population-based algorithm for optimization problems proposed in 1997 by Storn and Price, DE has relatively high search accuracy, robustness, and good convergence speed. It can be used to find the optimal solutions of nonlinear, non-differentiable, and multi-modal continuous space functions with real-valued parameters. It has the same structure and steps as classical evolutional algorithms, such as population initialization, crossover, mutation, and selection, however, when generating new candidate solutions, DE uses differential evolution operators. For each generation, operators continue working until the pre-defined terminating condition is met. DE algorithm needs to define control parameters (i.e., population size NP, scaling factor F, and crossover probability CR, searching space \(\varOmega \) In this paper, we suppose that the object function to be minimized is \(\mathrm {F}\left( X_{i}\right) \). Firstly, a feasible searching space \(\varOmega \) is defined, \(X_{i}=\left\{ x_{i, 1}, x_{i, 2}, \ldots , x_{i, D}\right\} \) is randomly generated in \(\varOmega \), \(X^\mathrm{{U}}\) and \(X^\mathrm{{L}}\) are the upper and lower bounds of searching space.

Population initialization The initial population consists of NP decision vectors(individuals) that are generated by assigning values to each component of all vectors. An optimization problem with D-dimension can be denoted as a vector with D-dimension. DE is based on the differences of individuals and each individual weights can be determined as:

$$\begin{aligned} X_{i, j}=X_{j}^\mathrm{{U}}+{\text {rand}}(0,1) *\left( X_{j}^\mathrm{{U}}-X_{j}^\mathrm{{L}}\right) , \end{aligned}$$
(3)

\(X_{1},X_{2}, \ldots X_{NP}\) are generated individuals, where each \(X_{i}\) consists of D vectors, \(X_{i}=\left\{ x_{i, 1}, x_{i, 2}, \ldots , x_{i, D}\right\} \). Mutation operation: After initialization, DE performs mutation operation for each individual \(X_{i}\) Each individual has a correspond mutation individual \(V_{i}\). In this paper, DE/rand/1 is conducted to generate mutant vectors. Each mutation individual’s weight can be determined as:

$$\begin{aligned} V_{i}=X_{r 1}+F *\left( X_{r 2}-X_{r 3}\right) , \end{aligned}$$
(4)

where \(V_{i}\) is the mutant individual, \(X_{r1}\), \(X_{r2}\) and \(X_{r3}\) are three mutually different individuals which are randomly selected from the whole population, r1, r2 and r3 are random integers in the range of \(\{1,2, \ldots , \mathrm {NP}\}\) and should be different from the running index i. Hence, the number of population or niche should be greater than 3. F is the scaling factor that lies in the range of [0,1] for scaling the difference vectors. If the F value is lower, the convergence speed is faster, while the larger the value, the greater the population diversity.

Crossover operation To increase population diversity, DE utilizes crossover operation to integrate mutant individuals and successful individuals reserved from the last generation, trial vectors are selected from mutant vectors and target vectors according to the following formula:

$$\begin{aligned} u_{i, j}=\left\{ \begin{array}{l}v_{i, j}, \text{ if } {\text {rand}}(0,1) \le C R \\ x_{i, j}, \text{ otherwise } \end{array}\right. \end{aligned}$$
(5)

Where CR is the crossover probability, which is usually within the range of [0,1], and \(u_{i,j}\) can be determined whether to assign the mutant vector \(v_{i, j}\) to the trial vector \(u_{i, j}\) by the comparison result between the CR and a random number j generated from the range of [0,1].

Selection operation: The selection process is the simple competition between offspring and corresponding parents. To confirm whether offspring individuals are reserved in the next generation, the greedy criterion is used to make the comparison, and those who have better fitness values are retained to the next generation. If and only if, the trial vector \(U_{i, G+1}\) yields a fitness value no more than the target vector \(X_{i, G}, U_{i, G+1}\) is set to replace \(X_{i, G+1}\), otherwise \(X_{i, G}\) is kept to the next generation.

$$\begin{aligned} X_{i, G+1}=\left\{ \begin{array}{ll}U_{i, G+1}, &{} \text{ if } f\left( U_{i, G+1}\right) \\ &{}\qquad \le f\left( X_{i, G}\right) \quad i=1,2, \ldots , N P \\ X_{i, G}, &{} \text{ otherwise } \end{array}\right. \nonumber \\ \end{aligned}$$
(6)

Knowledge fusion based DE

In this section, the proposed multi-modal knowledge fusion based algorithm for image registration is described in detail. First, the Knowledge representation based population model is introduced, and then the four main steplike procedures is described. The DTV function is used as the object function to find the best image registration solution. The proposed method achieve a good balance between exploration and exploitation through techniques such as niche technology, Gradient descent, multiple operators selection scheme and self-adaptive population updating strategies.

Knowledge represent population

In order to make full use of the fitness knowledge of DTV function, niche technology is applied to divide the initialized population into several niches. Each niche searches for its own optimal value to avoid the entire population falling into local optima. Firstly, the proposed algorithm uses the clustering framework of species formation in the niche dividing stage after initialization and combines three different types of knowledge to fuse with other information. Based on that, three new strategies are designed. In the mutation stage, traditional DE operators and the transition probability are utilized, and the state of the neighboring niche is regarded as the first knowledge to guide the fusion of information among niches; In the selection stage, the rank of the best individual’s fitness value of every niche in the entire population is regarded as the second knowledge to select the appropriate selection strategy; In the stage of adjusting the parameters of niches, the distances of central value among niches are regarded as third knowledge to adjust the size of each niche.

Fig. 3
figure 3

Division of niches

Initialization with niche technology

After population initialization, the clustering method is used to divide the niches. This paper uses the clustering framework of species formation as the method of dividing the niche. NP is the population size, NS is the initial size of the niche, and each niche constructs NS new solutions in each iteration. The Fig. 3 shows the division of niches.

The first step is to sort the current population according to the fitness value, and the individual with the best fitness value is set as the seed of the first niche; the second step is to select NS-1 individuals closest to the seed to form a new niche; Finally, remove these NS individuals from the population, and repeat the above three steps until no remaining individuals in the population.

Mutation with gradient descent

After initializing the population and dividing the niches, ideally, the entire population can roughly cover the entire solution space, and all the niches are evenly distributed in the entire solution space. In every iteration, each niche evolves independently. However, not every niche has local optima. If the niche is closed to evolve, it is not conducive to the evolution of the entire population. Therefore, a mutation strategy is used to promote communications among niches, which is to say, the state of the neighboring niche is regarded as the first knowledge to guide the fusion of information among niches. Firstly, a static transition probability PC is defined.

Start looping from the first individual, and when a random number generated from the range of [0, 1] is larger than PC, that is rand > PC, the current individual uses a search strategy with gradient descent in the mutation stage

$$\begin{aligned} \mathrm{{dis}}_m= & {} \lambda *\left( X_{\text{ best-nearest }}(t)-X_i\right) \end{aligned}$$
(7)
$$\begin{aligned} V_m= & {} X_i+\mathrm{{dis}}_{m}, \end{aligned}$$
(8)

dis is the step distance, \(\lambda \) is set as 0.1, i is the loop number of gradient descent and \(V_i\) is the intermediate vector produced in the ith descend. \(X_{\text{ best-nearest }}\) is the optimal solution from the nearest niche that has better optimal fitness than the current niche. The replacement strategy is used to update the current individual in every iteration:

$$\begin{aligned} X_{i}=\left\{ \begin{array}{lr} V_{i}, &{} \text{ if } f\left( V_{i}\right) \le f(X_{i}) \\ X_{i}, &{} \text{ otherwise } \end{array}\right. \end{aligned}$$
(9)

When \(f(f(V_{i}) > f(X_{i}))\) , let \(\lambda =\alpha * \lambda \) , \(\alpha \) is a value less than 1. Increase i in every iteration until it reaches the maximum iteration. When the random number is less than PC, that is \(rand(0,1) < PC\), the current individual is denoted as follows:

$$\begin{aligned} v_{i}=x_{r 1, i}+F *\left( x_{r 2, i}-x_{r 3, i}\right) , \end{aligned}$$
(10)

where r1, r2, r3 are random numbers from 1 to NS and F is the scaling factor. An intermediate vector is generated following the same formula as the standard DE algorithm. The selection of the intermediate vector also follows the same process as the standard DE algorithm, and new content is added. Details are presented in the following sections.

Through this mutation strategy, individuals in the niche are more likely to be close to the optima of the nearest niche.

Dual-selection method

After dividing the niches, evolutionary operations are performed in each niche to generate offspring. On the basis of that, selection operations are performed to chose individuals that can be retained to the next generation, and form new niche groups. So far, there are mainly two selection operators, and both have been widely used in multi-modal algorithms. One is combination selection that first combines NP parents with NP offspring (NP is the population size), and then selects the best N individuals from 2NP individuals. Another is the one-by-one selection which compares the fitness value of each offspring with its nearest parent individual, generally using Euclidean distance to measure. If the offspring has better fitness value, then replace the parent with the corresponding individual, which is to say, the rank of the best individual’s fitness value of every niche in the entire population is regarded as the second knowledge to select the appropriate selection strategy;

The two selection operators are respectively beneficial to the evolution of the entire population. The one-by-one selection operator, which selects the parent with the most similar genes to the offspring, and then compares these two individuals and replaces the poorer one with the best one, which can maintain the population diversity and enhance the exploration capabilities; while the combination selection operators, mixing offspring and parent individuals, select the best NP individual to enter the next generation, which can fasten the convergence speed of the population, further improve the accuracy of the optimal solution, and improve the exploitation capabilities of the algorithm. Therefore, the strategy proposed in this paper is to select different selection operators according to the fitness value of the current niche.

Firstly, each niche is divided into two parts according to the fitness value of the optimal individual. The first part, called the superior niche, consists of the niches where the current optimal solutions of the entire population is located, and the second part, called the inferior niche, consists of the rest niches. The goals of the two parts of the niche are different, and the replacement strategy is selected based on this.

Secondly, If the current individual belongs to the superior niche , such as individual a in niche A, it should further explore its neighboring individuals to accelerate convergence speed, and use the combination selection strategy for the offspring of this niche: select N best individuals from 2N individuals (N parent individuals, N offspring individuals) to improve the accuracy of the solution. Conversely, if the current individual belongs to inferior niche, such as individual b in niche B, use a one-by-one selection operator for its evolved offspring: by comparing the fitness value of offspring individual and parent individual with the closest Euclidean distance from the offspring, and replace the parent individual that is inferior to the offspring, which enables the niche to further explore optima in the search space and maintain the population diversity.

The selection strategy is described in Algorithm 1. If the optimal individual in the niche is equal to the global optimal individual, the combination selection operator is used, the parent and the offspring are mixed, and the optimal NS individuals are selected after sorting; otherwise, the one-by-one selection is used, comparing the offspring with the parent individual which is the closest to the offspring according to the Euclidean distance and select the better individual to be retained to the next generation.

figure a

Self-adaptive updation

To maintain the diversity of each niche, the merge operation is conducted between two niches that are too close to each other. For the g-th niche, where the individuals are \(x_1^g,x_2^g,\ldots x_{NS}^g\), firstly calculate the center point \(C^g\) of the NS individuals in each niche:

$$\begin{aligned} c_{j}^{g}=\frac{\sum _{i=1}^{N S} x_{i, j}^{g}}{N S}, \end{aligned}$$
(11)

where \(j=1,2, \ldots ,D, c_j^g\) represents the center point vector of the jth variable.

Then, the Euclidean distances among center points of each niche are denoted as follows:

$$\begin{aligned} d_{g, g^{\prime }}=\sqrt{\sum _{j=1}^{D}\left( c_{j}^{g}-c_{j}^{g \prime }\right) ^{2}} \end{aligned}$$
(12)

When \(d_{g,g' < \tau }\), If the distance between the niches are too close, the two niches will be merged. Because another evolution strategy is used in the mutation step, in this strategy, the niche with poor fitness will move closer to the nearest neighbor with high fitness. Hence, when the distance between the two niches is small enough, merge the two niches, which can ensure the population diversity of each niche. If the current niche falls into a local optimum, the size of the cluster is more likely to increase so that more accurate solutions can be obtained or avoid getting trapped into the local optima.

The overall procedure of SGD-DE algorithm

The steps of the SGD-DE algorithm are as follows, Algorithm 2 is the pseudocode of the algorithm.The first step is to initialize the population randomly; the second step is to divide the niches according to the principle of species formation;the third step is to evolution operation, individuals are randomly selected based on probability, part of individuals are generated from the local niche, and another part of individuals generate offspring of the next generation according to the standard DE algorithm; the fourth step: selection operation.

figure b

Experiment

Parameter settings

In this experiment for the proposed algorithm, we use the fixed parameters NP (population size) = 200, F (scaling factor) = 0.5, CR (crossover rate) = 0.9, and the initial niche size is set to 10, besides, the iterations are no more than 150. The image datasets in this paper are collected from the UCAS-AOD Dataset and the remote sensing images of Guangxi University. The CEC2013 multi-modal benchmark functions [15] which contain 12 multi-modal test functions are used to test the proposed algorithm.

Table 1 The PR values of the 14 algorithms in CEC2013 when \(\varepsilon = \)1.0e\(-\)01
Table 2 PR values of 14 algorithms in CEC2013 when \(\varepsilon \) = 1.0e\(-\)02
Table 3 PR values of 14 algorithms in CEC2013 when \(\varepsilon \) = 1.0e\(-\)05

Evaluation index

Peak ratio (PR), representing the average percentage of global peaks found in multiple runs, is used as a evaluation index in this paper and is denoted as follows:

$$\begin{aligned} \mathrm{{PR}}=\frac{\sum _{i=1}^\mathrm{{N R}} \mathrm{{N P F}}_i}{\mathrm{{T N P}}*\mathrm{{N R}}}, \end{aligned}$$
(13)

where NR is the number of runs, \(\mathrm{{NPF}}_i\) is the number of global peaks found in the ith run, and TNP is the total number of global peaks in the optimization problem.

Comparisons with state-of-the-art multimodal algorithms

Comparison at accuracy

In order to test the multi-modal performance of the two algorithms proposed in this paper, the CEC2013 multi-modal benchmark function is used to evaluate the performance of SGD-DE in solving MMOPs.

In this section, two experiments are conducted. The algorithms involved in the experiment are CDE, SDE, NCDE, NSDE, NSHDE, and the particle swarm optimization algorithm with ring topology: R2PSO, R3PSO, R2PSOLHC, R3PSOLHC, FERPSO [13]. The total number of individuals for all algorithms is set to 200. The other parameters of the algorithm used for comparison are based on the corresponding references.

CEC2013 multi-modal functions are used to test the multi-modal capabilities of the SGD-DE algorithm, DE number multimodal algorithm, PSO with ring topology, and the mainstream multi-strategy adaptive DE algorithm. The experimental results show in Table 1. In the table, if the PR value is higher than other algorithms or equal to the other algorithms, the result will be highlighted in bold, and the total times of the best results got in each algorithm are counted at the endline of the table.

As shown in Table 1, the best results of SGD-DE were obtained on F1–F5 and F11 when \(\varepsilon = \)1.0e\(-\)01, the results on F6, F8 and F10 are not the best but very close to the best; the results on the remaining functions differ from the best results. Tables 2 and 3 represent the search results at the accuracy of 1e\(-\)2 and 1e\(-\)5, and SGD-DE has the best overall optimization results on F1–F5, F11, F13–F20; the results on F6 and F10 are very close to the best, but the results are different in the remaining functions.

When the accuracy is higher, higher development capability of the algorithm is required. The results in Tables 2 and 3 show that the gradient descent based local search strategy proposed in this paper effectively improves the exploitation ability of the algorithm, and the optimal solution found at lower precision can improve the algorithm’s search accuracy (Table 4), which is shown in Table 4 where bolded font indicate better search accuracy.

Effect of the mutation strategy on performance

The niche technology are used in SGD-DE to generate populations, and the local search strategy of gradient descent is utilized to further exploit the solution space between the niches while sub-populations communicate, thus enhancing the searching ability of the algorithm. Without the local search strategy, the algorithm is just a simple species-based DE. Table 1 shows that the performance of the proposed algorithm is significantly superior to that of Species-based DE; Without the niche technology, the algorithm degenerates into a traditional DE algorithm with better exploration capabilities, but it can still get trapped in the local optima. Therefore, those two improved strategies are indispensable for SGD-DE.

Table 4 Influence of local search on algorithm accuracy
Table 5 Experimental results in PR on CEC 2013 problems at accuracy level \(\varepsilon \) = 1.0e\(-\)01
Table 6 Experimental results in PR on CEC 2013 problems at accuracy level \(\varepsilon \) = 1.0e\(-\)05
Table 7 Experimental results of PR on niche size parameter at the accuracy level \(\varepsilon \) = 1.0e\(-\)04
Fig. 4
figure 4

The UCAS-AOD data set

Effect of the selection strategy on performance

In this section, CEC2013 dataset is used to test the effectiveness of our hybrid selection strategy. The two algorithms used for comparison are the SGD-DE with elite strategy, and the SGD-DE with crowding strategy. For convenience, we only put results of the lowest accuracy and the highest accuracy.

Tables 5 and 6 describe the experimental results of SGD-DE using the hybrid selection strategy and the other two single selection strategies, with the bolded font indicating the better experimental results. It can be seen that SGD-DE is better than the other two algorithms overall, especially better than the SGD-DE with single elite strategy, which is because the crowding strategy can better ensure the population diversity so that shows better performance on F6, F7, and F8 when solving the optimal problems, but the results of SGD-DE with a mixed selection strategy on these problems are almost the same as SGD-DE with single crowding strategy. Meanwhile, SGD-DE with single crowding strategy also performs well on the low-dimensional functions.

Fig. 5
figure 5

The registration results for the UCAS-AOD dataset

Effect of the niche size on performance

SGD-DE is mainly affected by the niche size. In this section, we tested the effects of different niche sizes on CEC2013 test results. The results of NS = 20 and NS = 50 in CEC2013 test set are tested. Table 7 shows that different niche sizes have great influence on the results in different functions.

Fig. 6
figure 6

The image of Guangxi University in 2019

Fig. 7
figure 7

The image of Guangxi University in 2018

Application for image registration

Registration results

Firstly, the proposed method is applied to the registration problem with images that have little difference. Dataset of Object Detection in Aerial Images from the University of Chinese Academy of Sciences (UCAS-AOD)are tested, and the deviation of the images is mainly translation and rotation. The UCAS-AOD data set is shown in Fig. 4, and the image registration result of UCAS-AOD registration is shown in Fig. 5.

the above two images overlapped after registration, and the corresponding optimal solutions can be found with our proposed algorithm.

Then the aerial images from Guangxi University are tested to detect the registration performance of our method on images that have large differences. Figures 6 and 7 show the images shot at different times and devices, the registration result is shown in Fig. 8.

The registration results of two images of Guangxi University shot in different years show that the aligned image produces almost no ghosting, and the corresponding optimal solutions can still be found with the algorithm proposed in this paper even if there are some significant differences between two images.

Registration convergence speed

In the matching of remote sensing images of Guangxi University, this algorithm is compared with the traditional optimization algorithm and the current advanced multi-modal optimization algorithm such as DE, and four PSO algorithms using ring topology [14].

Fig. 8
figure 8

The result of remote sensing image registration of Guangxi University

Figure 9 shows that the DE algorithm is easy to fall into the local optimum when registering complex image problems. SGD-DE converges earliest and fastest at the beginning, and when trapped in a local optimum, it takes less iterations to jump out of the local optimum, and SGD-DE is more exploitable and exploratory. Our algorithm is faster and more accurate in image registration than the advanced multi-modal PSO algorithm. Because we not only use niche technique to increase diversity but also add local search strategy to improve accuracy.

Registration accuracy

Compare the DTV function value of SGD-DE with R2PSO, R3PSO, R2PSOLHC, R3PSOLHC, NCDE, FERPSO and NSDE after each image registration experiment. According to the definition of the function, the smaller the DTV value, the smaller the difference between the overlapping parts of the image, and the more accurate the registration result will be.

In this experiment, it mainly compares the search capability of the algorithm and the ability to avoid falling into local optimum. From the image of the DTV function, it can be seen that the function has many local optima, and the optimization algorithm may stagnate. As is shown in Table 8, the bolded font indicate better mean value and standard deviation. It can be seen from the average and standard deviation of the experiment results that SGD-DE is superior to other multi-modal optimization algorithms.

Fig. 9
figure 9

Iterative data of various algorithms

In experiment one and experiment two, we tested the SGD-DE algorithm that optimizes the NMI function and the SGD-DE algorithm that optimizes the DTV function. And the NMI value and DTV value of the two registration results are compared. From the comparison results we can see which image registration function is more accurate.

Table 8 Mean value and standard deviation of DTV function

DTV is the minimization function and NMI is the maximum function. It can be seen from the table that the registration result optimized by DTV function is more accurate than other methods. Therefore, it is concluded that the DTV method is more precise and robust, and the image registration results are described in Table 9.

Table 9 Image registration results of Guangxi University

Among them, DTV is the minimization function, and NMI is the maximization function. It can be seen from the table that the accuracy of the registration result optimized by the DTV function is higher than that of other methods, so the DTV method is more accurate and robust.

Conclusion

This paper proposed an SGD-DE for tackling the remote sensing image registration problem, which enhances the capability of exploration and exploitation. Experiments have shown that our algorithm can achieve a promising performance when finding the optimal solutions to the remote sensing image registration problem, and the registration result is relatively accurate. In terms of some performance indicators, our algorithm is superior to the current advanced ones. At present, the algorithm still needs improvement, such as niche division and parameter adaptation, so we will further improve our algorithm with the clustering method in the future.

In the future research, we would like to do further research on metrics for comparison of rendered and reference images, focusing on topological similarities between the phenotype and a reference image (e.g. number of subbranches and their lengths). Multiple metrics could be used and combined with the use of multi-objective search, and possibly combined with interactive methods for optimization. One of the ideas for future research may also include another encoding aspect of the procedural model using evolution of line segments for vector parameters. In our future work, we aim to apply the proposed algorithm to improve the capability of finding optimal feasible solutions in large scale problems.