1 Introduction

Heterogeneous sensor networks consist of multimodal sensors [1, 2] such as radar, optical, infrared, and acoustic. Infrared and visible image fusion in heterogeneous sensor networks is a hot research topic in the field of image processing [36]. The advantages of infrared and visible band imaging complement each other. The fused image contains two bands of information, which is widely used in visual monitoring, industrial testing, and many other fields [7, 8]. With the rapid improvement of image sensor production, high resolution cameras can show images quite clearly, but it is really a challenge to transmit and store so huge amount of image data. So in this paper, the image fusion algorithm based on compressive sensing domain is proposed [9, 10], which can reduce the amount of data storage and transmission and meanwhile lower the complexity of computation. So the overall system data processing efficiency can be improved.

2 Image sparse representation

The commonly used image sparse representation transforms are wavelet, curvelet, contourlet, and so on [1113]. Shearlet transform is a new transform developed in recent years, which has more advantages than wavelet transform in high-dimensional data sparse representation, since it can express the singularity of space geometry much better [14, 15]. So we use shearlet transform and wavelet transform successively for image sparse representation. The advantage of this method is that it can combine the advantages of two kinds of multi-resolution analysis tools in image processing [16, 17] and increase the sparsity of the signals further, so we can use less data to complete the image fusion efficiently.

The original images can be decomposed into high-frequency coefficients and low-frequency coefficients after shearlet transform. Then, the coefficients will be processed by wavelet transform separately and four different frequency components of the coefficients can be obtained (shown in Fig. 1). The sparsity of different components also shows a big difference. Based on the sparsity of various frequency components, the coefficients are sampled at different sampling rates. Image sparse representation model is shown in Fig. 1.

Fig. 1
figure 1

The original image sparse representation model

The sparsity of the low-frequency coefficients is relatively low, so it is of little significance to compress the wavelet low-frequency coefficients of shearlet low-frequency components ( LL). Therefore, the LL components are not compressively sampled, but fused directly. The wavelet low-frequency coefficients of shearlet high-frequency components (LH) and the wavelet high-frequency coefficients of shearlet low-frequency components (HL) are sparser than LL components, so they can be compressively sampled and fused in the compressive sensing domain. The highest frequency components are wavelet high-frequency coefficients of shearlet high-frequency components (HH), which have the largest sparsity, so they can be compressively sampled with a lower sampling rate than the HL and LH components.

3 Image fusion rules

The image fusion rule determines the retention degree of each original image in the fused image. In this paper, the fusion rules are designed for the components of different frequency coefficients after compression.

The low-frequency coefficients (LL components) represent the approximation of the original image. The brightness and contrast of the fused image are mainly determined by the fusion rules of the LL components. The imaging effects of infrared and visible cameras are quiet different, because of their different imaging principle. In some environments which are favorable for visible light imaging, the visible image has large amount of information and the texture is rich while in the dark, foggy environments, the infrared imaging has its advantages of clear, stable and can indicate thermal information. For this part of the low-frequency coefficients, the local spatial frequency-based weighted fusion rule will be suitable, that is, the image with rich local spatial information will be set to a larger weight in the fusion process. Formula (1) shows the definition of local spatial frequency.

$$ \mathrm{R}\mathrm{F}\left( x, y\right)=\sqrt{{\displaystyle \sum_{m\in W, n\in W}{\left( f\left( x+ m, y+ n\right)- f\left( x+ m, y-1+ n\right)\right)}^2}/\left( w\times w\right)} $$
(1)
$$ C F\left( x, y\right)=\sqrt{{\displaystyle \sum_{m\in W, n\in W}{\left( f\left( x+ m, y+ n\right)- f\left( x-1+ m, y+ n\right)\right)}^2}/\left( w\times w\right)} $$
$$ S F\left( x, y\right)=\sqrt{RF{\left( x, y\right)}^2+ CF{\left( x, y\right)}^2} $$

In the above formula, RF (x, y) represents the local (in the window with the dimension of W × W)) spatial frequency of the pixel (x, y) in row direction, and CF (x, y) represents the local spatial frequency in column direction; SF (x, y) is the local spatial frequency of the point (x, y).

The LH and HL image components correspond to the boundary of the smooth part of the image. In order to highlight the feature of the target region and ensure that the local region with large energy can be clearly reflected in the fused image, the fusion rule will be based on the regional energy feature. The regional energy feature is defined as follows:

$$ E\left( x, y\right)={\displaystyle \sum_{m\in M, n\in N}{\left( f\left( x+ m, y+ n\right)\right)}^2} $$
(2)

The fusion rule of the LH and HL components pays much attention to retain the coefficients with large regional energy, so it is defined as follows:

$$ F\left( x, y\right)=\left\{\begin{array}{cc}\hfill I\left( x, y\right)\hfill & \hfill if\kern1em {E}_{\mathrm{I}}\left( x, y\right)>{E}_V\left( x, y\right)\hfill \\ {}\hfill V\left( x, y\right)\hfill & \hfill if\kern1em {E}_V\left( x, y\right)>{E}_I\left( x, y\right)\hfill \end{array}\right. $$
(3)

The HH components are the highest frequency components after decomposition, corresponding to the image detail textures. So the fusion rule of these parts will adopt the absolute maximum fusion rule so that the fused image can keep rich texture information.

4 Experiment and data analysis

In order to verify the fusion effects of our algorithm, two sets of infrared and visible images are selected to perform experiments at different compressive sampling rates. One group is the cup images taken close-up indoor, and the other is the outdoor scene images taken in the yard. Each set of images consists of strictly registered infrared and visible images, and all the images are of the size of 512 × 512 (shown in Fig. 2).

Fig. 2
figure 2

The original images for fusion, a Infrared image of Cup, b Visible image of Cup, c Infrared image of Yard d Visible image of Yard

Figures 3 and 4 are composed of six graphs, in which Fig. 3a and Fig. 4a are the fusion of the two original images after shearlet and wavelet transforms without compressive sampling, and the coefficients of each scale are fused according to the rule of absolute maximum. Figure 3b and Fig. 4b are the fused effects according to the fusion rule proposed in the paper without compressive sampling. The other figures in Figs. 3 and 4 use the principle of compressive sensing to compress all the coefficients except LL components. Among them, the coefficients of each layer in Fig. 3c and Fig. 4c are compressed to 300 pixels in each row and each column, that is, the coefficients sampling rate is 58.6%; the coefficients of each layer in Fig. 3d and Fig. 4d are compressed to 200 pixels in each row and each column, the coefficients sampling rate is 39.0% while the coefficients of each layer in Fig. 3e and Fig. 4e are compressed to 100 pixels in each row and each column, coefficient sampling rate is 19.5%; Fig. 3f and Fig. 4f use different sampling rates to compress HL, LH, and HH components. The sampling rate of HL and LH components is 39.0%, while the sampling rate of HH components is 19.5%, just the same as Fig. 3e and Fig. 4e.

Fig. 3
figure 3

The fusion effects of cup a The absolute maximum fusion rule, sampling rate =100%, b The proposed fusion rule, sampling rate =100%, c The proposed fusion rule, sampling rate =58.6%, d The proposed fusion rule, sampling rate =39.0%, e The proposed fusion rule, sampling rate =19.5%, f The proposed fusion rule, different sampling rate for different components

Fig. 4
figure 4

The fusion effects of yard a The absolute maximum fusion rule, sampling rate =100%, b The proposed fusion rule, sampling rate =100%, c The proposed fusion rule, sampling rate =58.6%, d The proposed fusion rule, sampling rate =39.0%, e The proposed fusion rule, sampling rate =19.5%, f The proposed fusion rule, different sampling rate for different components

From the subjective perception of the human eyes, the fusion rule proposed in this paper can take into account the advantages of two-band imaging, not only retains the rich texture information of visible light images but also the thermal information of infrared images, and the image brightness is uniform.

While the sampling rates decrease gradually, the quality of the fusion image shows a decreasing trend, which is mainly manifested in the blurring of the texture detail. In particular, the clarity of Fig. 3e and Fig. 4e drops a lot compared with Fig. 3c, d and Fig. 4c, d. But there is little difference between Fig. 3f, Fig. 4f and Fig. 3d, Fig. 4d because the sampling rate of Fig. f is different for different frequency components.

Piella method is an authoritative fusion image evaluation method. Q, Q W, and Q E are used to evaluate the image fusion performance [18]. Q is the fusion quality index, Q W is the weighted fusion quality index, and Q E is the edge-dependent fusion quality index. The range of the index is in [−1, 1], the closer to 1, the better the fusion effect. We use the Piella parameters to evaluate the performance of our algorithm. The results are shown in Table 1.

Table 1 Image fusion objective evaluation parameters of the two sets of images

It can be concluded from the Piella parameters that the fusion method proposed in this paper has good image fusion effects, especially in the edge processing of the fused images. With the decrease of the image compressive rates, the image fusion effects decrease significantly, which is consistent with the subjective feelings of human eyes. The evaluation parameters of Fig. e vary a lot with that of Fig. d, since the sampling rate of Fig. e is quite low. Figure f uses different sampling rates for different frequency components, and its evaluation result is very close to Fig. d. So it can be explained that using different sampling rates and fusion rules for different frequency components is a good balance between fusion effects and image compressive ratio.

5 Conclusions

An image fusion algorithm in heterogeneous sensor networks of compressive sensing domain is proposed. Shearlet transform and wavelet transform are used successively to sparse represent the infrared and visible images, which can increase the signal sparsity obviously. Experiments show that the proposed algorithm can achieve a good balance between the fusion effects and the amount of data processing, because different sampling rates and fusion rules are applied to different frequency components of coefficients.