1 Introduction

Remote sensing refers to comprehensive information science technology that realizes detection and analysis at a long distance. Through sensors, we do not directly touch the surface of an object. Remote sensing technology is an important data source for military information and intelligence acquisition [1, 2]. With the continuous enrichment of remote sensing image data resources [3], the efficiency of remote sensing image processing has become an important factor restricting the development of remote sensing technology. With the continuous improvement of aerospace technologies such as data communication and sensors, the acquisition of data in the field of remote sensing has gradually developed towards multi-source [4]. Therefore, to comprehensively utilize the information of heterogeneous remote sensing images, image registration is the first condition [5].

The Internet of things is the Internet that connects "things", and its essence is mainly reflected in three aspects: Internet communication characteristics, that is, people or things are connected to the Internet to achieve interconnection [6]; identification characteristics, that is, it can automatically identify people or things connected to the network; Intelligent features, that is, the network system can self-feedback and intelligent control [7].

In recent years, with the continuous development of sensor technology, the resolution of remote sensing images has been continuously improved, and the application of multi-source remote sensing information for the target level has received more and more attention. Melonakos et al. [8] proposed an image segmentation technique based on directional information augmented conformal active contour frame. Wei et al. [9] proposed a ship detection method based on the local power spectrum of SAR images. The core idea of this method is to detect ships through the power spectrum distortion of the local area of ​​the SAR image. Although the energy of ships is distributed on the SAR intensity image, their spectral energy was quite concentrated, which may cause the power spectrum of the local area of ​​the SAR image to deviate from the sea background, and analyze the local power spectrum of the moving target on the SAR image. The method of obtaining the detection threshold through the probability density function of the power spectrum is presented. An et al. [10] used land masking strategy, appropriate sea clutter model and neural network as a recognition scheme to detect ships in SAR images, and used a fully convolutional network to separate the ocean from the land. By analyzing the distribution of sea clutter in the SAR image, based on the comprehensive consideration of sea clutter modeling accuracy and computational complexity, the probability distribution model of the constant false alarm rate detector was selected from the three aspects of K distribution, gamma distribution and Rayleigh distribution, and the neural network was used to re-check the result as the recognition result. Yue et al. [11] studied the detection of special targets in the visual image of low-pixel surveillance systems. This method was used to detect special targets contained in the low-pixel surveillance system; Fan [12] studied the multi-target recognition of fuzzy remote sensing images based on Euclidean feature matching. The algorithm applied the European feature matching method to the multi-target recognition of remote sensing images. It has high recognition accuracy, but it cannot adapt to the complex and changeable detection environment. Therefore, in this paper, the target recognition algorithm of multi-source remote sensing image based on iVIOT is researched. Infrared sensors and SAR radar sensors are applied to the iVIOT to realize the accurate target recognition of multi-source remote sensing image. The experimental verification of the research method has high remote sensing image target recognition effectiveness and high applicability, improves the recognition accuracy, and reduces the time consuming, which plays an important role in the development of multi-source remote sensing image target recognition.

2 Multi-source remote sensing image target recognition algorithm based on IoT vision

2.1 The logical design of the iVIOT

The intelligent vison IoT consists of four levels: visual perception layer, network layer and application layer, and storage layer. The structure model of the iVIOT is shown in Fig. 1.

Fig. 1
figure 1

The structure model of the iVIOT

The iVIOT uses wireless transmission and cloud storage technology to set infrared sensors and SAR radars in the visual perception layer of the iVIOT. The image information collected by the visual perception layer is transmitted to the application layer through the wireless network, and the application layer realizes accurate recognition of multi-source remote sensing image targets through three parts: feature extraction, image registration and target recognition of the data processing module [13]. During data processing, the data is synchronized to the network hard disk and the cloud storage of the iVIOT is realized. Users can access the stored data anytime and anywhere through various intelligent terminals to prevent the leakage and loss of target recognition data of remote sensing image, and improve the security and convenience of the iVIOT.

The visual perception layer mainly solves the data collection problem of the external physical world through various sensor devices. The perception layer of iVIOT acquires information captured by sensors through various images. Wi-Fi wireless network camera IPCAM is used as the information collection point, it is a device that transmits dynamic remote sensing images through the network [14], which can transmit local dynamic remote sensing images to the Internet via Wi-Fi, and which is convenient for users to view at any time.

IPCAM is a camera that collects and transmits dynamic video through a wireless network. It is designed based on user-friendly ideas. It is a new generation of video recording products that combine traditional cameras and network video technologies. It integrates video servers and cameras, wireless transmission and other technologies. It has built-in server and GUI, supports IE browsing mode, and can transmit video images based on TCP/IP protocol. Users can easily install at home, office, factory and any other places, access, configure, maintain and supervise through client video management software or log in to Web pages; where there is network coverage in the world, users can view the monitored target environment at any time through the local area network or the Internet. Without being restricted by time and space, users can view the dynamic remote sensing images of the target environment captured by infrared sensors and SAR radar anytime and anywhere.

2.2 Feature extraction of multi-source remote sensing image

Whether it is an optical remote sensing image or a SAR remote sensing image, the image is often deformed due to external factors, which will cause certain external interference to the target. When constructing multi-source and multi-feature vectors for target detection, some features that are not affected by noise, light, shadow, deformation, etc. should be added [15]. By extracting seven Hu invariant moments and three affine invariant moments, a feature vector of moment invariants is constructed for target detection in remote sensing images. Discretization defines the digital image as follows:

$${m}_{pq}=\sum_{x=1}^{M}\sum_{y=1}^{N}{x}^{p}{y}^{q}{I}^{^{\prime}}\left(x,y\right)$$
(1)
$${\mu }_{pq}=\sum_{x=1}^{M}\sum_{y=1}^{N}{\left(x-\overline{x }\right)}^{p}{\left(y-\overline{y }\right)}^{q}{I}^{^{\prime}}\left(x,y\right)$$
(2)

where, \(p,q=\mathrm{0,1},2,\cdots\). The moment invariant feature \({m}_{pq}\) will change with the change of the image, and the affine coefficient \({\mu }_{pq}\) will change with the rotation of the image.

In order to prevent the moment feature from changing with the image, the idea of normalized central moment is introduced.

$${y}_{pq}=\frac{{\mu }_{pq}}{{\mu }_{00}^{r}}$$
(3)

where, \(r=\frac{p+q+2}{2}\), \(p+q=\mathrm{2,3},\cdots\).

Seven Hu invariant moments are constructed by using the second-order normalized central moment and the third-order normalized central moment, which are invariant to translation, scaling and rotation. The specific definition is as follows:

$${I}_{1}={y}_{20}+{y}_{02}$$
(4)
$${I}_{2}={\left({y}_{20}+{y}_{02}\right)}^{2}+4{y}_{11}^{2}$$
(5)
$${I}_{3}={\left({y}_{30}+3{y}_{12}\right)}^{2}+{\left({y}_{03}+3{y}_{21}\right)}^{2}$$
(6)
$${I}_{4}={\left({y}_{30}+{y}_{12}\right)}^{2}+{\left({y}_{21}+{y}_{30}\right)}^{2}$$
(7)
$$\begin{array}{c}{I}_{5}=\left({y}_{30}-{y}_{12}\right)\left({y}_{30}+{y}_{12}\right)\left[{\left({y}_{30}+{y}_{12}\right)}^{2}-3{\left({y}_{21}+{y}_{03}\right)}^{2}\right]\\ +\left(3{y}_{31}-{y}_{03}\right)\left({y}_{31}+{y}_{30}\right)\left[3{\left({y}_{30}+{y}_{12}\right)}^{2}-{\left({y}_{21}+{y}_{03}\right)}^{2}\right]\end{array}$$
(8)
$${I}_{6}=\left({y}_{20}-{y}_{02}\right)\left[{\left({y}_{30}+{y}_{12}\right)}^{2}-{\left({y}_{21}+{y}_{03}\right)}^{2}\right]+4{y}_{11}\left({y}_{30}+{y}_{12}\right)\left({y}_{21}+{y}_{03}\right)$$
(9)
$$\begin{array}{c}{I}_{7}=\left(3{y}_{21}+{y}_{03}\right)\left({y}_{03}+{y}_{12}\right)\left[{\left({y}_{30}+{y}_{12}\right)}^{2}-3{\left({y}_{21}+{y}_{03}\right)}^{2}\right]\\ +\left({y}_{30}-3{y}_{12}\right)\left({y}_{21}+{y}_{30}\right)\left[3{\left({y}_{30}+{y}_{12}\right)}^{2}-{\left({y}_{21}+{y}_{03}\right)}^{2}\right]\end{array}$$
(10)

When the image is distorted due to different shooting angles, the translation, scale and rotation invariance of Hu moment invariants can not meet the actual requirements, and a moment invariance under the condition of affine transformation of the target is needed to deal with distortion and other deformations [16].

Three affine invariant moments are defined as follows:

$${A}_{1}=\frac{\left({\mu }_{20}{\mu }_{02}-{\mu }_{11}^{2}\right)}{{\mu }_{00}^{4}}$$
(11)
$${A}_{2}=\frac{\left({\mu }_{30}^{2}{\mu }_{03}^{2}-6{\mu }_{30}{\mu }_{03}{\mu }_{21}{\mu }_{12}+4{\mu }_{03}{\mu }_{12}^{3}\right)}{{\mu }_{00}^{10}}$$
(12)
$${A}_{3}=\frac{\left({\mu }_{20}\left({\mu }_{21}{\mu }_{03}-{\mu }_{12}^{2}\right)-{\mu }_{11}\left({\mu }_{30}{\mu }_{03}-{\mu }_{21}{\mu }_{12}\right)+{\mu }_{02}\left({\mu }_{12}{\mu }_{30}-{\mu }_{21}^{2}\right)\right)}{{\mu }_{00}^{7}}$$
(13)

Through the above process, the feature extraction of infrared remote sensing image and SAR remote sensing image is realized, the code is as follows (Fig. 2).

Fig. 2
figure 2

Feature extraction part of the code

2.3 Design of automatic registration method for multi-source remote sensing image features

2.3.1 Multi-scale and multi-directional feature fusion

Contourlet two-level decomposition is performed on the image after feature extraction, to obtain multi-scale \({\sigma }_{1}\), \({\sigma }_{2}\) low-frequency sub-bands and \({d}_{1}\)-\({d}_{12}\)-multi-directional high-frequency sub-bands. The Gaussian kernel function is added to the moment definition, and \(\sigma\) is the scale factor.

The discrete features of order \(p+q\) of the image are defined as follows:

$${m}_{pq}=\frac{4}{{\left(k-1\right)}^{2}}\sum_{x=0}^{m-1}\sum_{y=0}^{n-1}{\left(\frac{x}{\sigma }\right)}^{p}{\left(\frac{y}{\sigma }\right)}^{q}\mathrm{exp}\left(-\frac{{x}^{2}+{y}^{2}}{2{\sigma }^{2}}\right)f\left(x,y\right)$$
(14)

The formula of discrete feature center distance is as follows:

$${\mu }_{pq}=\frac{4}{{\left(k-1\right)}^{2}}\sum_{x=0}^{m-1}\sum_{y=0}^{n-1}{\left(\frac{x-\overline{x}}{\sigma }\right)}^{p}{\left(\frac{y-\overline{y}}{\sigma }\right)}^{q}\mathrm{exp}\left(-\frac{{\left(x-\overline{x }\right)}^{2}+{\left(y-\overline{y }\right)}^{2}}{2{\sigma }^{2}}\right)f\left(x,y\right)$$
(15)

where, \(\left(\overline{x },\overline{y }\right)\) is the center coordinate of the \(\left(n\times n\right)\) window.

the moment eigenvector of the secondary low-frequency subband is \({f}_{L}=\left[{\zeta }_{1}^{1},{\zeta }_{2}^{1},{\zeta }_{3}^{1},{\zeta }_{1}^{2},{\zeta }_{2}^{2},{\zeta }_{2}^{3}\right]\), \({\zeta }_{1}\), \({\zeta }_{2}\) and \({\zeta }_{3}\) are used to represent the Gaussian combined invariant moments. For multi-directional high-frequency sub-bands, the four parameters of energy \({f}_{ene}\), contrast \({f}_{con}\), correlation \({f}_{cor}\) and entropy \({f}_{ent}\) of the structural texture feature are extracted, and \(T\left(i,j\right)\) is the gray-level co-occurrence matrix. The gray-level co-occurrence moment feature vector of the high-frequency sub-band is \({f}_{H}=\left[{f}_{ene},{f}_{con},{f}_{cor},{f}_{ent}\right]\), the weighting coefficients of the four parameters of the high-frequency sub-band are calculated according to the contrast sensitivity function of the spatial activity degree [17], and the weighted high-frequency sub-band gray-level co-occurrence can be obtained. The moment feature vector is \({{f}^{^{\prime}}}_{H}=\left[{{f}^{^{\prime}}}_{ene},{{f}^{^{\prime}}}_{con},{{f}^{^{\prime}}}_{cor},{{f}^{^{\prime}}}_{ent}\right]\).

2.3.2 Primary-fineness feature registration

A two-step method of primary-fineness is used to achieve feature matching. Firstly, the 6-dimensional moment feature vector \({f}_{L}=\left[{\zeta }_{1}^{1},{\zeta }_{2}^{1},{\zeta }_{3}^{1},{\zeta }_{1}^{2},{\zeta }_{2}^{2},{\zeta }_{2}^{3}\right]\) of low-frequency subband is used for the initial matching of the similarity measure, and on this basis, the weighted high-frequency subband \({{f}^{^{\prime}}}_{H}=\left[{{f}^{^{\prime}}}_{ene},{{f}^{^{\prime}}}_{con},{{f}^{^{\prime}}}_{cor},{{f}^{^{\prime}}}_{ent}\right]\) is used for the second fine matching. The matching formula is as follows:

$${S}_{ij}=\mathrm{exp}\left(-\left|{f}_{L1}\left(i\right)-{f}_{L2}\left(j\right)\right|\right)$$
(16)
$${S}_{ij}=\mathrm{exp}\left(-\left|{{f}^{^{\prime}}}_{H1}\left(i\right)-{{f}^{^{\prime}}}_{H2}\left(j\right)\right|\right)$$
(17)

2.3.3 Random sampling consensus algorithm

Random sampling consensus algorithm is used to eliminate wrong matches as the final correct matching pair. The random sampling consensus algorithm divides the feature points into correct matching point pairs and incorrect matching point pairs [18]. The random sampling consensus algorithm calculates the coordinate conversion relationship between the feature points of the reference image and the corresponding feature points of the image to be matched, that is, the transformation matrix \(H\). Four pairs of matching points are randomly selected from the initial matching point pairs, and the transformation matrix \(H\) is calculated. The \(H{X}_{i}\) value of the reference image point \({X}_{i}\left(x,y\right)\) in the remaining matching pairs is calculated, and the \({d}_{i}\) between this value and the matched point \({{X}^{^{\prime}}}_{i}\left({x}^{^{\prime}},{y}^{^{\prime}}\right)\) in the image to be matched is calculated. If \({d}_{i}\) is less than the preset threshold \(T\), the feature point is regarded as a correct match, otherwise it is regarded as an error point. Four pairs of matching pairs are re-selected randomly, to repeat the above steps, and the corresponding calculated transformation matrix \(H\) of the most correct matching pairs is used as the final transformation matrix to eliminate the wrong matching point pairs.

2.4 Design of target recognition algorithm of multi-source remote sensing image

BVM target detection operator is used to identify multi-source remote sensing image targets. The result image obtained by the target detection operator should be able to distinguish the background and target information significantly, that is, the probability of the target or abnormal situation is large, so that the image information tends to be highly certain, and the target information is prominent and easy to distinguish [19]. According to Shannon's definition of information, information is a description of the uncertainty of the movement state or the way of existence of things, so the self-information of the image can reflect the uncertainty of the information, and the self-information \({I}_{i}\) is small, indicating that the resulting image is less uncertain, that is, the detected target is prominent. The rest of the background is suppressed.

\(L\) operators are used to detect the target in \(n\) bands of hyperspectral image, to obtain \(L\) target detection images. Assuming that the total variance is constant, the normalized variance coefficient of each image is calculated [20]. \({\sigma }_{i}^{2}\) is used to represent a monotonically increasing function, that is, the variance \({\sigma }_{i}^{2}\) is the minimum, \({\rho }_{i}\) is the maximum, and \({I}_{i}\) is the minimum. The target can be detected based on the smallest variance of the detection result image.

If \(\left\{{r}_{1},{r}_{2},\cdots ,{r}_{N}\right\}\) is the pixel vector in the remote sensing image, \(N\) is the total number of pixels in the image, and each pixel \({r}_{i}={\left[{r}_{i1},{r}_{i2},\cdots {r}_{iL}\right]}^{T}\) is the \(L\)-dimensional column vector, where \(L\) is the number of bands and \(1\le i\le N\).

Assuming that the priori information \(d\) is the spectrum signal of the target to be detected, and the vector \(w\) passed through the target detection operator corresponds to the output of the input pixel \({r}_{i}\) as \({y}_{i}\), namely:

$${y}_{i}=\sum_{l=1}^{L}{w}_{i}{r}_{ij}={w}^{T}{r}_{i}={r}_{i}^{T}w$$
(18)

The original image covariance matrix is expressed as \(\Sigma\), and the resulting image variance is:

$$v\left({w}^{T}\circ {r}_{i}\right)={w}^{T}\Sigma w$$
(19)

The filter vector \(w\) of the BVM operator needs to meet the following conditions:

$$\left\{\begin{array}{c}min\left({w}^{T}\Sigma w\right)\\ {d}^{T}w=1\end{array}\right.$$
(20)

The auxiliary function \(\varphi (w)\) is constructed as (\(\lambda\) is called the Lagrangian multiplier):

$$\varphi (w)={w}^{T}\Sigma w+\lambda \left({d}^{T}w-1\right)$$
(21)

Lagrangian multiplier method is used to make the partial derivative of \(w\) as zero, and obtain the solution formula as follows:

$$w=-\frac{1}{2}\lambda {\Sigma }^{-1}d$$
(22)

At this time, the covariance matrix is a symmetric matrix, and the optimal solution is obtained as follows:

$${w}^{BVM}=\frac{{\Sigma }^{-1}d}{{d}^{T}{\Sigma }^{-1}d}$$
(23)

The eigenvector product of BVM constraint operator and the known spectral is 1. The BVM operator is based on the smallest variance. After data processing is performed using the covariance matrix, the small target and the background are easier to separate, and accurate multi-source image target recognition results can be obtained.

3 Experimental results and analysis

In order to verify the effectiveness and practicability of the proposed method, a port is taken as the research object and experimental comparative analysis is carried out through various methods [21]. Firstly, a port target recognition based on Internet of things visual recognition remote sensing image infrared image and SAR image is studied, and infrared sensor and SAR iVIOT are selected to be used.Radar is used as the sensor of the visual perception layer to collect remote sensing image information, and the data processing module of the iVIOT is used to realize the processing of image target recognition. Figure 3 shows the results of infrared remote sensing images collected by the iVIOT.

Fig. 3
figure 3

Infrared remote sensing image results

The results of SAR remote sensing images collected by the iVIOT are shown in Fig. 4.

Fig. 4
figure 4

SAR remote sensing image results

Using the method of this paper, the method of literature [11] and the method of literature [12], the feature points contained in the remote sensing images in Fig. 3 and Fig. 4 are extracted, and the comparative analysis results are shown in Fig. 5

Fig. 5
figure 5

Feature point extraction analysis comparison

According to Fig. 5, compared with the literature method, the method in this paper can extract more feature points in a very short time. When iterating about 70 times, basically all the feature points are proposed, and the number of feature points extracted is far Much higher than the literature method.

The collected infrared remote sensing images and SAR remote sensing images are used to implement image recognition. The image result after the recognition is shown in Fig. 6.

Fig. 6
figure 6

The recognition result of the method in this paper

It can be seen from the experimental results in Fig. 6 that the method in this paper can be used to achieve high-efficiency recognition of infrared remote sensing images and SAR remote sensing images, and the use of recognition remote sensing images can improve the accuracy of target detection.

The results of using the method in this paper to identify ship targets in multi-source remote sensing images based on iVIOT are shown in Fig. 7.

Fig. 7
figure 7

Target recognition results

The experimental target recognition results in Fig. 7 show that the method used in this paper can achieve effective recognition of targets in remote sensing images. The method in this paper uses iVIOT to effectively process the target recognition information of remote sensing image and efficiently obtain target recognition results of remote sensing image. The target in the remote sensing image is small, the method in this paper can accurately identify the small target in the remote sensing image, and verify the effectiveness of the target recognition of the research method.

The communication overhead of the method in this paper using the iVIOT to identify multi-source remote sensing image targets is made statistics. The iVIOT uses cloud computing technology to realize data transmission, processing and storage. In the different remote sensing image sizes, different environmental conditions, the comparison result of the communication overhead in target recognition of multi-source remote sensing image is shown in Fig. 8.

Fig. 8
figure 8

Comparison of communication overhead

It can be seen from Fig. 8 that when the detection environment does not change, that is, when the target recognition environment is a normal environment, the iVIOT is used to realize the target recognition of remote sensing image, which has lower computational complexity and storage complexity. When the target recognition of multi-source remote sensing image is performed in a complex environment, because the need to verify and store data involved in operations such as updating data, modifying data, and deleting data, high computational complexity is required, and when the size of the remote sensing image is 128 MB, the communication overhead is small in different environments. When the size of the remote sensing image exceeds 128 MB, the communication overhead increases rapidly, showing a linear growth trend. The iVIOT adopts cloud computing technology to protect user privacy information, has good computing performance, and is of great significance for improving remote sensing image target recognition.

Suppose the imaging interval of multi-source remote sensing images is 0.32 h, and the results of ship motion state obtained by the proposed method at each imaging time are shown in Table 1.

Table 1 Recognition results of ship motion state

From the experimental results in Table 1, it can be seen that the method in this paper can effectively obtain the motion state of the target in the remote sensing image. In order to verify the effectiveness of the method in identifying the remote sensing image target, the method in this paper used to track the position mean square and speed root mean square error of the target ship 1 and the target ship 2 is made statistics. The results are shown in Fig. 9.

Fig. 9
figure 9

Root mean square error results of target position and velocity

The experimental results in Fig. 9 show that the method in this paper makes full use of the target features of remote sensing image recognition, adopts primary-fine feature registration and random sampling consensus algorithm to achieve the association of target data, has strong recognition stability, and can obtain better targets. The recognition result meets the requirements of moving target tracking of multi-source remote sensing image.

The special target detection method (Reference [11]) and the Euclidean feature matching method (Reference [12]) are selected as the comparison methods, and the correct recognition rate of the three methods used to register the remote sensing image under different target recognition environments is carried out statistics. The statistical results are shown in Fig. 10.

Fig. 10
figure 10

Comparison results of correct recognition rate

Three methods are used for the statistics of the time required to register the remote sensing image under normal environment, lighting environment, noise environment, mobile environment and rotating environment. The statistical results are shown in Fig. 11.

Fig. 11
figure 11

Comparison of registration time

Analyzing the experimental results of Fig. 8 and Fig. 11, we can see that using the method in this paper to identify remote sensing image targets can get a higher correct matching rate and higher matching accuracy. The total recognition time of the method in this paper is significantly higher than that of the special target detection method and the Euclidean feature matching method. Using this method to identify multi-source remote sensing image targets, the correct matching rate in different environments is higher than 99%, the recognition time is less than 10 s, the correct matching rate and the recognition time are significantly better than the other two methods, verifying that the method in this paper has high recognition performance. It can provide a good foundation for target recognition of multi-source remote sensing image.

The method in this paper is used to identify remote sensing image targets, and the statistics are based on the target recognition accuracy and recognition recall rate using only infrared remote sensing images, SAR remote sensing images, and multi-source remote sensing images. The statistical results are shown in Fig. 12 and Fig. 13.

Fig. 12
figure 12

Comparison of recognition accuracy

Fig. 13
figure 13

Comparison of recognition recall rates

Analyzing the detection results of Figs. 12 and 13, it can be seen that compared with the pure use of optical remote sensing images and SAR radar remote sensing image extraction features for target detection, the target recognition method of multi-source remote sensing image proposed in this paper can greatly improve the accuracy of target detection and reduce the possibility of false targets being detected, on the basis of a certain improvement in the target detection recall rate. The method in this paper applies the iVIOT to target recognition of multi-source remote sensing image, uses efficient computing performance to achieve accurate target detection, and has high applicability.

4 Discussion

The application range of remote sensing images is extremely wide, and the accurate targets detection of remote sensing image can enhance its application range. Research on target detection based on optical and SAR remote sensing images is carried out, and the high computing performance of the iVIOT is used to improve the accuracy of remote sensing image target detection. Because of their different imaging principles, the two have their own advantages in earth observation. SAR sensors have all-weather and all-weather detection capabilities, can penetrate clouds, fog and are not affected by shadow occlusion and light time, but their texture and ground object radiation information are not enough, and it is difficult to interpret. Optical remote sensing images can intuitively reflect information on texture, color, and shape to users, but due to the limitations of light and weather, the ability to acquire data is limited. Optical remote sensing images can extract rich spectral information in radiation characteristics, which is more beneficial for classification and interpretation. Different types of remote sensing data such as optics and SAR are increasing at a rate of thousands of GB every day, which provides a rich source of data for multi-source processing of remote sensing images. How to achieve the interpretation of the specified target from the massive high-resolution remote sensing images and fully excavate the multi-source information will become a key link in the application of remote sensing information. Carrying out target interpretation of multi-source remote sensing image fusion is not only of great significance for the development of multi-source remote sensing image fusion and target interpretation processing theory, but also conducive to the full mining of massive remote sensing data and the realization of target-level multi-source information interpretation, to provide target information support in military fields such as military strikes and intelligence analysis, and civilian fields such as urban planning, aviation control, and traffic navigation.

5 Conclusion

Remote sensing images contain a large amount of target information. In order to accurately identify targets in remote sensing images, a target recognition algorithm of multi-source remote sensing image based on IoT vision is studied, and iVIOT structures are used to achieve accurate targets recognition of remote sensing image. After the feature extraction from the multi-source remote sensing image is completed, a robust and stable multi-source remote sensing image registration method is used to achieve accurate remote sensing image registration. The feature registration method combining elementary-precision is used to realize the registration of multi-source remote sensing images affected by noise and illumination changes. Experimental results show that this method is robust to noise and illumination changes, and has a good application prospect for tracking moving targets in remote sensing images. IVIOT is usually used in occasions with high real-time requirements. The target recognition of remote sensing image usually includes large-scale matrix decomposition, repeated convolution, solving large-scale equations, and many non-linear optimization problems, which are computationally intensive and time-consuming. IVIOT can effectively solve the above problems, improve the computing performance of remote sensing image target detection, and has extremely high applicability. Due to the limitation of time, the research on image blur in this paper is not in-depth enough. In the future, further research will be made on image blur processing, so as to improve target recognition in multi-source remote sensing images.